API - 损失函数¶
为了尽可能地保持TensorLayer的简洁性,我们最小化损失函数的数量。
因此我们鼓励直接使用TensorFlow官方的函数,比如你可以通过 tf.nn.l2_loss
,
tf.contrib.layers.l1_regularizer
, tf.contrib.layers.l2_regularizer
and
tf.contrib.layers.sum_regularizer
来实现L1, L2 和 sum 规则化, 参考 TensorFlow API。
自定义损失函数¶
TensorLayer提供一个简单的方法来创建您自己的损失函数。 下面以多层神经网络(MLP)为例:
network = tl.InputLayer(x, name='input_layer')
network = tl.DropoutLayer(network, keep=0.8, name='drop1')
network = tl.DenseLayer(network, n_units=800, act = tf.nn.relu, name='relu1')
network = tl.DropoutLayer(network, keep=0.5, name='drop2')
network = tl.DenseLayer(network, n_units=800, act = tf.nn.relu, name='relu2')
network = tl.DropoutLayer(network, keep=0.5, name='drop3')
network = tl.DenseLayer(network, n_units=10, act = tl.activation.identity, name='output_layer')
那么其模型参数为 [W1, b1, W2, b2, W_out, b_out]
,
这时,你可以像下面的例子那样实现对前两个weights矩阵的L2规则化。
cost = tl.cost.cross_entropy(y, y_, name = 'cost')
cost = cost + tf.contrib.layers.l2_regularizer(0.001)(network.all_params[0]) + tf.contrib.layers.l2_regularizer(0.001)(network.all_params[2])
此外,TensorLayer 提供了通过给定名称,很方便地获取参数列表的方法,所以您可以如下对某些参数执行L2规则化。
l2 = 0
for w in tl.layers.get_variables_with_name('W_conv2d', train_only=True, printable=False):#[-3:]:
l2 += tf.contrib.layers.l2_regularizer(1e-4)(w)
cost = tl.cost.cross_entropy(y, y_, name = 'cost') + l2
权值的正则化¶
在初始化变量之后,网络参数的信息可以使用 network.print.params()
来获得。
sess.run(tf.initialize_all_variables())
network.print_params()
param 0: (784, 800) (mean: -0.000000, median: 0.000004 std: 0.035524)
param 1: (800,) (mean: 0.000000, median: 0.000000 std: 0.000000)
param 2: (800, 800) (mean: 0.000029, median: 0.000031 std: 0.035378)
param 3: (800,) (mean: 0.000000, median: 0.000000 std: 0.000000)
param 4: (800, 10) (mean: 0.000673, median: 0.000763 std: 0.049373)
param 5: (10,) (mean: 0.000000, median: 0.000000 std: 0.000000)
num of params: 1276810
网络的输出是 network.outputs
,那么交叉熵的可以被如下定义。
另外,要正则化权重, network.all_params
要包含网络的所有参数。
在这种情况下根据 network.print_params()
所展示的参数 0,1,...,5的值, network.all_params = [W1, b1, W2, b2, Wout, bout]
然后对W1和W2的最大范数正则化可以按如下进行:
y = network.outputs
# Alternatively, you can use tl.cost.cross_entropy(y, y_, name = 'cost') instead.
cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(y, y_))
cost = cross_entropy
cost = cost + tl.cost.maxnorm_regularizer(1.0)(network.all_params[0]) +
tl.cost.maxnorm_regularizer(1.0)(network.all_params[2])
另外,所有的TensorFlow的正则化函数,像 tf.contrib.layers.l2_regularizer
在TensorLayer中也能使用。
激活输出(Activation outputs)的规则化¶
实例方法 network.print_layers()
整齐地打印不同层的所有输出。
为了实现对激活输出的正则化,您可以使用 network.all_layers
,它包含了不同层的所有输出。
如果您想对第一层隐藏层的激活输出使用L1惩罚,仅仅需要添加
tf.contrib.layers.l2_regularizer(lambda_l1)(network.all_layers[1])
到成本函数中。
network.print_layers()
layer 0: Tensor("dropout/mul_1:0", shape=(?, 784), dtype=float32)
layer 1: Tensor("Relu:0", shape=(?, 800), dtype=float32)
layer 2: Tensor("dropout_1/mul_1:0", shape=(?, 800), dtype=float32)
layer 3: Tensor("Relu_1:0", shape=(?, 800), dtype=float32)
layer 4: Tensor("dropout_2/mul_1:0", shape=(?, 800), dtype=float32)
layer 5: Tensor("add_2:0", shape=(?, 10), dtype=float32)
|
Softmax cross-entropy operation, returns the TensorFlow expression of cross-entropy for two distributions, it implements softmax internally. |
|
Sigmoid cross-entropy operation, see |
|
Binary cross entropy operation. |
|
Return the TensorFlow expression of mean-square-error (L2) of two batch of data. |
|
Return the TensorFlow expression of normalized mean-square-error of two distributions. |
|
Return the TensorFlow expression of absolute difference error (L1) of two batch of data. |
|
Soft dice (Sørensen or Jaccard) coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. |
|
Non-differentiable Sørensen–Dice coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. |
|
Non-differentiable Intersection over Union (IoU) for comparing the similarity of two batch of data, usually be used for evaluating binary image segmentation. |
|
Returns the expression of cross-entropy of two sequences, implement softmax internally. |
|
Returns the expression of cross-entropy of two sequences, implement softmax internally. |
|
Cosine similarity [-1, 1]. |
|
Li regularization removes the neurons of previous layer. |
|
Lo regularization removes the neurons of current layer. |
|
Max-norm regularization returns a function that can be used to apply max-norm regularization to weights. |
|
Max-norm output regularization removes the neurons of current layer. |
|
Max-norm input regularization removes the neurons of previous layer. |
Softmax cross entropy¶
-
tensorlayer.cost.
cross_entropy
(output, target, name=None)[源代码]¶ Softmax cross-entropy operation, returns the TensorFlow expression of cross-entropy for two distributions, it implements softmax internally. See
tf.nn.sparse_softmax_cross_entropy_with_logits
.- 参数
output (Tensor) -- A batch of distribution with shape: [batch_size, num of classes].
target (Tensor) -- A batch of index with shape: [batch_size, ].
name (string) -- Name of this loss.
实际案例
>>> import tensorlayer as tl >>> ce = tl.cost.cross_entropy(y_logits, y_target_logits, 'my_loss')
引用
About cross-entropy: https://en.wikipedia.org/wiki/Cross_entropy.
The code is borrowed from: https://en.wikipedia.org/wiki/Cross_entropy.
Sigmoid cross entropy¶
-
tensorlayer.cost.
sigmoid_cross_entropy
(output, target, name=None)[源代码]¶ Sigmoid cross-entropy operation, see
tf.nn.sigmoid_cross_entropy_with_logits
.- 参数
output (Tensor) -- A batch of distribution with shape: [batch_size, num of classes].
target (Tensor) -- A batch of index with shape: [batch_size, ].
name (string) -- Name of this loss.
Binary cross entropy¶
-
tensorlayer.cost.
binary_cross_entropy
(output, target, epsilon=1e-08, name='bce_loss')[源代码]¶ Binary cross entropy operation.
- 参数
output (Tensor) -- Tensor with type of float32 or float64.
target (Tensor) -- The target distribution, format the same with output.
epsilon (float) -- A small value to avoid output to be zero.
name (str) -- An optional name to attach to this function.
引用
Mean squared error (L2)¶
-
tensorlayer.cost.
mean_squared_error
(output, target, is_mean=False, axis=-1, name='mean_squared_error')[源代码]¶ Return the TensorFlow expression of mean-square-error (L2) of two batch of data.
- 参数
output (Tensor) -- 2D, 3D or 4D tensor i.e. [batch_size, n_feature], [batch_size, height, width] or [batch_size, height, width, channel].
target (Tensor) -- The target distribution, format the same with output.
is_mean (boolean) --
- Whether compute the mean or sum for each example.
If True, use
tf.reduce_mean
to compute the loss between one target and predict data.If False, use
tf.reduce_sum
(default).
axis (int or list of int) -- The dimensions to reduce.
name (str) -- An optional name to attach to this function.
引用
Normalized mean square error¶
-
tensorlayer.cost.
normalized_mean_square_error
(output, target, axis=-1, name='normalized_mean_squared_error_loss')[源代码]¶ Return the TensorFlow expression of normalized mean-square-error of two distributions.
- 参数
output (Tensor) -- 2D, 3D or 4D tensor i.e. [batch_size, n_feature], [batch_size, height, width] or [batch_size, height, width, channel].
target (Tensor) -- The target distribution, format the same with output.
axis (int or list of int) -- The dimensions to reduce.
name (str) -- An optional name to attach to this function.
Absolute difference error (L1)¶
-
tensorlayer.cost.
absolute_difference_error
(output, target, is_mean=False, axis=-1, name='absolute_difference_error_loss')[源代码]¶ Return the TensorFlow expression of absolute difference error (L1) of two batch of data.
- 参数
output (Tensor) -- 2D, 3D or 4D tensor i.e. [batch_size, n_feature], [batch_size, height, width] or [batch_size, height, width, channel].
target (Tensor) -- The target distribution, format the same with output.
is_mean (boolean) --
- Whether compute the mean or sum for each example.
If True, use
tf.reduce_mean
to compute the loss between one target and predict data.If False, use
tf.reduce_sum
(default).
axis (int or list of int) -- The dimensions to reduce.
name (str) -- An optional name to attach to this function.
Dice coefficient¶
-
tensorlayer.cost.
dice_coe
(output, target, loss_type='jaccard', axis=(1, 2, 3), smooth=1e-05)[源代码]¶ Soft dice (Sørensen or Jaccard) coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. labels are binary. The coefficient between 0 to 1, 1 means totally match.
- 参数
output (Tensor) -- A distribution with shape: [batch_size, ....], (any dimensions).
target (Tensor) -- The target distribution, format the same with output.
loss_type (str) --
jaccard
orsorensen
, default isjaccard
.axis (tuple of int) -- All dimensions are reduced, default
[1,2,3]
.smooth (float) --
- This small value will be added to the numerator and denominator.
If both output and target are empty, it makes sure dice is 1.
If either output or target are empty (all pixels are background), dice =
`smooth/(small_value + smooth)
, then if smooth is very small, dice close to 0 (even the image values lower than the threshold), so in this case, higher smooth can have a higher dice.
实际案例
>>> import tensorlayer as tl >>> outputs = tl.act.pixel_wise_softmax(outputs) >>> dice_loss = 1 - tl.cost.dice_coe(outputs, y_)
引用
Hard Dice coefficient¶
-
tensorlayer.cost.
dice_hard_coe
(output, target, threshold=0.5, axis=(1, 2, 3), smooth=1e-05)[源代码]¶ Non-differentiable Sørensen–Dice coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. labels are binary. The coefficient between 0 to 1, 1 if totally match.
- 参数
output (tensor) -- A distribution with shape: [batch_size, ....], (any dimensions).
target (tensor) -- The target distribution, format the same with output.
threshold (float) -- The threshold value to be true.
axis (tuple of integer) -- All dimensions are reduced, default
(1,2,3)
.smooth (float) -- This small value will be added to the numerator and denominator, see
dice_coe
.
引用
IOU coefficient¶
-
tensorlayer.cost.
iou_coe
(output, target, threshold=0.5, axis=(1, 2, 3), smooth=1e-05)[源代码]¶ Non-differentiable Intersection over Union (IoU) for comparing the similarity of two batch of data, usually be used for evaluating binary image segmentation. The coefficient between 0 to 1, and 1 means totally match.
- 参数
output (tensor) -- A batch of distribution with shape: [batch_size, ....], (any dimensions).
target (tensor) -- The target distribution, format the same with output.
threshold (float) -- The threshold value to be true.
axis (tuple of integer) -- All dimensions are reduced, default
(1,2,3)
.smooth (float) -- This small value will be added to the numerator and denominator, see
dice_coe
.
提示
IoU cannot be used as training loss, people usually use dice coefficient for training, IoU and hard-dice for evaluating.
Cross entropy for sequence¶
-
tensorlayer.cost.
cross_entropy_seq
(logits, target_seqs, batch_size=None)[源代码]¶ Returns the expression of cross-entropy of two sequences, implement softmax internally. Normally be used for fixed length RNN outputs, see PTB example.
- 参数
logits (Tensor) -- 2D tensor with shape of [batch_size * n_steps, n_classes].
target_seqs (Tensor) -- The target sequence, 2D tensor [batch_size, n_steps], if the number of step is dynamic, please use
tl.cost.cross_entropy_seq_with_mask
instead.batch_size (None or int.) --
- Whether to divide the cost by batch size.
If integer, the return cost will be divided by batch_size.
If None (default), the return cost will not be divided by anything.
实际案例
>>> import tensorlayer as tl >>> # see `PTB example <https://github.com/tensorlayer/tensorlayer/blob/master/example/tutorial_ptb_lstm.py>`__.for more details >>> # outputs shape : (batch_size * n_steps, n_classes) >>> # targets shape : (batch_size, n_steps) >>> cost = tl.cost.cross_entropy_seq(outputs, targets)
Cross entropy with mask for sequence¶
-
tensorlayer.cost.
cross_entropy_seq_with_mask
(logits, target_seqs, input_mask, return_details=False, name=None)[源代码]¶ Returns the expression of cross-entropy of two sequences, implement softmax internally. Normally be used for Dynamic RNN with Synced sequence input and output.
- 参数
logits (Tensor) -- 2D tensor with shape of [batch_size * ?, n_classes], ? means dynamic IDs for each example. - Can be get from DynamicRNNLayer by setting
return_seq_2d
to True.target_seqs (Tensor) -- int of tensor, like word ID. [batch_size, ?], ? means dynamic IDs for each example.
input_mask (Tensor) -- The mask to compute loss, it has the same size with target_seqs, normally 0 or 1.
return_details (boolean) --
- Whether to return detailed losses.
If False (default), only returns the loss.
If True, returns the loss, losses, weights and targets (see source code).
实际案例
>>> import tensorlayer as tl >>> import tensorflow as tf >>> import numpy as np >>> batch_size = 64 >>> vocab_size = 10000 >>> embedding_size = 256 >>> ni = tl.layers.Input([batch_size, None], dtype=tf.int64) >>> net = tl.layers.Embedding( ... vocabulary_size = vocab_size, ... embedding_size = embedding_size, ... name = 'seq_embedding')(ni) >>> net = tl.layers.RNN( ... cell =tf.keras.layers.LSTMCell(units=embedding_size, dropout=0.1), ... return_seq_2d = True, ... name = 'dynamicrnn')(net) >>> net = tl.layers.Dense(n_units=vocab_size, name="output")(net) >>> model = tl.models.Model(inputs=ni, outputs=net) >>> input_seqs = np.random.randint(0, 10, size=(batch_size, 10), dtype=np.int64) >>> target_seqs = np.random.randint(0, 10, size=(batch_size, 10), dtype=np.int64) >>> input_mask = np.random.randint(0, 2, size=(batch_size, 10), dtype=np.int64) >>> outputs = model(input_seqs, is_train=True) >>> loss = tl.cost.cross_entropy_seq_with_mask(outputs, target_seqs, input_mask)
Cosine similarity¶
规则化函数¶
更多 tf.nn.l2_loss
, tf.contrib.layers.l1_regularizer
, tf.contrib.layers.l2_regularizer
与
tf.contrib.layers.sum_regularizer
, 请见 TensorFlow API.
Maxnorm¶
-
tensorlayer.cost.
maxnorm_regularizer
(scale=1.0)[源代码]¶ Max-norm regularization returns a function that can be used to apply max-norm regularization to weights.
More about max-norm, see wiki-max norm. The implementation follows TensorFlow contrib.
- 参数
scale (float) -- A scalar multiplier Tensor. 0.0 disables the regularizer.
- 返回
- 返回类型
A function with signature mn(weights, name=None) that apply Lo regularization.
:raises ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.:
Special¶
-
tensorlayer.cost.
li_regularizer
(scale, scope=None)[源代码]¶ Li regularization removes the neurons of previous layer. The i represents inputs. Returns a function that can be used to apply group li regularization to weights. The implementation follows TensorFlow contrib.
- 参数
scale (float) -- A scalar multiplier Tensor. 0.0 disables the regularizer.
scope (str) -- An optional scope name for this function.
- 返回
- 返回类型
A function with signature li(weights, name=None) that apply Li regularization.
:raises ValueError : if scale is outside of the range [0.0, 1.0] or if scale is not a float.:
-
tensorlayer.cost.
lo_regularizer
(scale)[源代码]¶ Lo regularization removes the neurons of current layer. The o represents outputs Returns a function that can be used to apply group lo regularization to weights. The implementation follows TensorFlow contrib.
- 参数
scale (float) -- A scalar multiplier Tensor. 0.0 disables the regularizer.
- 返回
- 返回类型
A function with signature lo(weights, name=None) that apply Lo regularization.
:raises ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.:
-
tensorlayer.cost.
maxnorm_o_regularizer
(scale)[源代码]¶ Max-norm output regularization removes the neurons of current layer. Returns a function that can be used to apply max-norm regularization to each column of weight matrix. The implementation follows TensorFlow contrib.
- 参数
scale (float) -- A scalar multiplier Tensor. 0.0 disables the regularizer.
- 返回
- 返回类型
A function with signature mn_o(weights, name=None) that apply Lo regularization.
:raises ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.:
-
tensorlayer.cost.
maxnorm_i_regularizer
(scale)[源代码]¶ Max-norm input regularization removes the neurons of previous layer. Returns a function that can be used to apply max-norm regularization to each row of weight matrix. The implementation follows TensorFlow contrib.
- 参数
scale (float) -- A scalar multiplier Tensor. 0.0 disables the regularizer.
- 返回
- 返回类型
A function with signature mn_i(weights, name=None) that apply Lo regularization.
:raises ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.: