API - 损失函数

为了尽可能地保持TensorLayer的简洁性,我们最小化损失函数的数量。 因此我们鼓励直接使用TensorFlow官方的函数,比如你可以通过 tf.nn.l2_loss, tf.contrib.layers.l1_regularizer, tf.contrib.layers.l2_regularizer and tf.contrib.layers.sum_regularizer 来实现L1, L2 和 sum 规则化, 参考 TensorFlow API

自定义损失函数

TensorLayer提供一个简单的方法来创建您自己的损失函数。 下面以多层神经网络(MLP)为例:

network = tl.InputLayer(x, name='input_layer')
network = tl.DropoutLayer(network, keep=0.8, name='drop1')
network = tl.DenseLayer(network, n_units=800, act = tf.nn.relu, name='relu1')
network = tl.DropoutLayer(network, keep=0.5, name='drop2')
network = tl.DenseLayer(network, n_units=800, act = tf.nn.relu, name='relu2')
network = tl.DropoutLayer(network, keep=0.5, name='drop3')
network = tl.DenseLayer(network, n_units=10, act = tl.activation.identity, name='output_layer')

那么其模型参数为 [W1, b1, W2, b2, W_out, b_out], 这时,你可以像下面的例子那样实现对前两个weights矩阵的L2规则化。

cost = tl.cost.cross_entropy(y, y_, name = 'cost')
cost = cost + tf.contrib.layers.l2_regularizer(0.001)(network.all_params[0]) + tf.contrib.layers.l2_regularizer(0.001)(network.all_params[2])

此外,TensorLayer 提供了通过给定名称,很方便地获取参数列表的方法,所以您可以如下对某些参数执行L2规则化。

l2 = 0
for w in tl.layers.get_variables_with_name('W_conv2d', train_only=True, printable=False):#[-3:]:
    l2 += tf.contrib.layers.l2_regularizer(1e-4)(w)
cost = tl.cost.cross_entropy(y, y_, name = 'cost') + l2

权值的正则化

在初始化变量之后,网络参数的信息可以使用 network.print.params() 来获得。

sess.run(tf.initialize_all_variables())
network.print_params()
param 0: (784, 800) (mean: -0.000000, median: 0.000004 std: 0.035524)
param 1: (800,) (mean: 0.000000, median: 0.000000 std: 0.000000)
param 2: (800, 800) (mean: 0.000029, median: 0.000031 std: 0.035378)
param 3: (800,) (mean: 0.000000, median: 0.000000 std: 0.000000)
param 4: (800, 10) (mean: 0.000673, median: 0.000763 std: 0.049373)
param 5: (10,) (mean: 0.000000, median: 0.000000 std: 0.000000)
num of params: 1276810

网络的输出是 network.outputs ,那么交叉熵的可以被如下定义。 另外,要正则化权重, network.all_params 要包含网络的所有参数。 在这种情况下根据 network.print_params() 所展示的参数 0,1,...,5的值, network.all_params =  [W1, b1, W2, b2, Wout, bout] 然后对W1和W2的最大范数正则化可以按如下进行:

y = network.outputs
# Alternatively, you can use tl.cost.cross_entropy(y, y_, name = 'cost') instead.
cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(y, y_))
cost = cross_entropy
cost = cost + tl.cost.maxnorm_regularizer(1.0)(network.all_params[0]) +
          tl.cost.maxnorm_regularizer(1.0)(network.all_params[2])

另外,所有的TensorFlow的正则化函数,像 tf.contrib.layers.l2_regularizer 在TensorLayer中也能使用。

激活输出(Activation outputs)的规则化

实例方法 network.print_layers() 整齐地打印不同层的所有输出。 为了实现对激活输出的正则化,您可以使用 network.all_layers ,它包含了不同层的所有输出。 如果您想对第一层隐藏层的激活输出使用L1惩罚,仅仅需要添加 tf.contrib.layers.l2_regularizer(lambda_l1)(network.all_layers[1]) 到成本函数中。

network.print_layers()
layer 0: Tensor("dropout/mul_1:0", shape=(?, 784), dtype=float32)
layer 1: Tensor("Relu:0", shape=(?, 800), dtype=float32)
layer 2: Tensor("dropout_1/mul_1:0", shape=(?, 800), dtype=float32)
layer 3: Tensor("Relu_1:0", shape=(?, 800), dtype=float32)
layer 4: Tensor("dropout_2/mul_1:0", shape=(?, 800), dtype=float32)
layer 5: Tensor("add_2:0", shape=(?, 10), dtype=float32)
cross_entropy(output, target[, name]) Softmax cross-entropy operation, returns the TensorFlow expression of cross-entropy for two distributions, it implements softmax internally.
sigmoid_cross_entropy(output, target[, name]) Sigmoid cross-entropy operation, see tf.nn.sigmoid_cross_entropy_with_logits.
binary_cross_entropy(output, target[, ...]) Binary cross entropy operation.
mean_squared_error(output, target[, ...]) Return the TensorFlow expression of mean-square-error (L2) of two batch of data.
normalized_mean_square_error(output, target) Return the TensorFlow expression of normalized mean-square-error of two distributions.
absolute_difference_error(output, target[, ...]) Return the TensorFlow expression of absolute difference error (L1) of two batch of data.
dice_coe(output, target[, loss_type, axis, ...]) Soft dice (Sørensen or Jaccard) coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e.
dice_hard_coe(output, target[, threshold, ...]) Non-differentiable Sørensen–Dice coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e.
iou_coe(output, target[, threshold, axis, ...]) Non-differentiable Intersection over Union (IoU) for comparing the similarity of two batch of data, usually be used for evaluating binary image segmentation.
cross_entropy_seq(logits, target_seqs[, ...]) Returns the expression of cross-entropy of two sequences, implement softmax internally.
cross_entropy_seq_with_mask(logits, ...[, ...]) Returns the expression of cross-entropy of two sequences, implement softmax internally.
cosine_similarity(v1, v2) Cosine similarity [-1, 1].
li_regularizer(scale[, scope]) Li regularization removes the neurons of previous layer.
lo_regularizer(scale) Lo regularization removes the neurons of current layer.
maxnorm_regularizer([scale]) Max-norm regularization returns a function that can be used to apply max-norm regularization to weights.
maxnorm_o_regularizer(scale) Max-norm output regularization removes the neurons of current layer.
maxnorm_i_regularizer(scale) Max-norm input regularization removes the neurons of previous layer.

Softmax cross entropy

tensorlayer.cost.cross_entropy(output, target, name=None)[源代码]

Softmax cross-entropy operation, returns the TensorFlow expression of cross-entropy for two distributions, it implements softmax internally. See tf.nn.sparse_softmax_cross_entropy_with_logits.

参数:
  • output (Tensor) -- A batch of distribution with shape: [batch_size, num of classes].
  • target (Tensor) -- A batch of index with shape: [batch_size, ].
  • name (string) -- Name of this loss.

Examples

>>> ce = tl.cost.cross_entropy(y_logits, y_target_logits, 'my_loss')

References

Sigmoid cross entropy

tensorlayer.cost.sigmoid_cross_entropy(output, target, name=None)[源代码]

Sigmoid cross-entropy operation, see tf.nn.sigmoid_cross_entropy_with_logits.

参数:
  • output (Tensor) -- A batch of distribution with shape: [batch_size, num of classes].
  • target (Tensor) -- A batch of index with shape: [batch_size, ].
  • name (string) -- Name of this loss.

Binary cross entropy

tensorlayer.cost.binary_cross_entropy(output, target, epsilon=1e-08, name='bce_loss')[源代码]

Binary cross entropy operation.

参数:
  • output (Tensor) -- Tensor with type of float32 or float64.
  • target (Tensor) -- The target distribution, format the same with output.
  • epsilon (float) -- A small value to avoid output to be zero.
  • name (str) -- An optional name to attach to this function.

References

Mean squared error (L2)

tensorlayer.cost.mean_squared_error(output, target, is_mean=False, name='mean_squared_error')[源代码]

Return the TensorFlow expression of mean-square-error (L2) of two batch of data.

参数:
  • output (Tensor) -- 2D, 3D or 4D tensor i.e. [batch_size, n_feature], [batch_size, height, width] or [batch_size, height, width, channel].
  • target (Tensor) -- The target distribution, format the same with output.
  • is_mean (boolean) --
    Whether compute the mean or sum for each example.
    • If True, use tf.reduce_mean to compute the loss between one target and predict data.
    • If False, use tf.reduce_sum (default).
  • name (str) -- An optional name to attach to this function.

References

Normalized mean square error

tensorlayer.cost.normalized_mean_square_error(output, target, name='normalized_mean_squared_error_loss')[源代码]

Return the TensorFlow expression of normalized mean-square-error of two distributions.

参数:
  • output (Tensor) -- 2D, 3D or 4D tensor i.e. [batch_size, n_feature], [batch_size, height, width] or [batch_size, height, width, channel].
  • target (Tensor) -- The target distribution, format the same with output.
  • name (str) -- An optional name to attach to this function.

Absolute difference error (L1)

tensorlayer.cost.absolute_difference_error(output, target, is_mean=False, name='absolute_difference_error_loss')[源代码]

Return the TensorFlow expression of absolute difference error (L1) of two batch of data.

参数:
  • output (Tensor) -- 2D, 3D or 4D tensor i.e. [batch_size, n_feature], [batch_size, height, width] or [batch_size, height, width, channel].
  • target (Tensor) -- The target distribution, format the same with output.
  • is_mean (boolean) --
    Whether compute the mean or sum for each example.
    • If True, use tf.reduce_mean to compute the loss between one target and predict data.
    • If False, use tf.reduce_sum (default).
  • name (str) -- An optional name to attach to this function.

Dice coefficient

tensorlayer.cost.dice_coe(output, target, loss_type='jaccard', axis=(1, 2, 3), smooth=1e-05)[源代码]

Soft dice (Sørensen or Jaccard) coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. labels are binary. The coefficient between 0 to 1, 1 means totally match.

参数:
  • output (Tensor) -- A distribution with shape: [batch_size, ....], (any dimensions).
  • target (Tensor) -- The target distribution, format the same with output.
  • loss_type (str) -- jaccard or sorensen, default is jaccard.
  • axis (tuple of int) -- All dimensions are reduced, default [1,2,3].
  • smooth (float) --
    This small value will be added to the numerator and denominator.
    • If both output and target are empty, it makes sure dice is 1.
    • If either output or target are empty (all pixels are background), dice = `smooth/(small_value + smooth), then if smooth is very small, dice close to 0 (even the image values lower than the threshold), so in this case, higher smooth can have a higher dice.

Examples

>>> outputs = tl.act.pixel_wise_softmax(network.outputs)
>>> dice_loss = 1 - tl.cost.dice_coe(outputs, y_)

References

Hard Dice coefficient

tensorlayer.cost.dice_hard_coe(output, target, threshold=0.5, axis=(1, 2, 3), smooth=1e-05)[源代码]

Non-differentiable Sørensen–Dice coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. labels are binary. The coefficient between 0 to 1, 1 if totally match.

参数:
  • output (tensor) -- A distribution with shape: [batch_size, ....], (any dimensions).
  • target (tensor) -- The target distribution, format the same with output.
  • threshold (float) -- The threshold value to be true.
  • axis (tuple of integer) -- All dimensions are reduced, default (1,2,3).
  • smooth (float) -- This small value will be added to the numerator and denominator, see dice_coe.

References

IOU coefficient

tensorlayer.cost.iou_coe(output, target, threshold=0.5, axis=(1, 2, 3), smooth=1e-05)[源代码]

Non-differentiable Intersection over Union (IoU) for comparing the similarity of two batch of data, usually be used for evaluating binary image segmentation. The coefficient between 0 to 1, and 1 means totally match.

参数:
  • output (tensor) -- A batch of distribution with shape: [batch_size, ....], (any dimensions).
  • target (tensor) -- The target distribution, format the same with output.
  • threshold (float) -- The threshold value to be true.
  • axis (tuple of integer) -- All dimensions are reduced, default (1,2,3).
  • smooth (float) -- This small value will be added to the numerator and denominator, see dice_coe.

Notes

  • IoU cannot be used as training loss, people usually use dice coefficient for training, IoU and hard-dice for evaluating.

Cross entropy for sequence

tensorlayer.cost.cross_entropy_seq(logits, target_seqs, batch_size=None)[源代码]

Returns the expression of cross-entropy of two sequences, implement softmax internally. Normally be used for fixed length RNN outputs, see PTB example.

参数:
  • logits (Tensor) -- 2D tensor with shape of [batch_size * n_steps, n_classes].
  • target_seqs (Tensor) -- The target sequence, 2D tensor [batch_size, n_steps], if the number of step is dynamic, please use tl.cost.cross_entropy_seq_with_mask instead.
  • batch_size (None or int.) --
    Whether to divide the cost by batch size.
    • If integer, the return cost will be divided by batch_size.
    • If None (default), the return cost will not be divided by anything.

Examples

>>> see `PTB example <https://github.com/tensorlayer/tensorlayer/blob/master/example/tutorial_ptb_lstm_state_is_tuple.py>`__.for more details
>>> input_data = tf.placeholder(tf.int32, [batch_size, n_steps])
>>> targets = tf.placeholder(tf.int32, [batch_size, n_steps])
>>> # build the network
>>> print(net.outputs)
(batch_size * n_steps, n_classes)
>>> cost = tl.cost.cross_entropy_seq(network.outputs, targets)

Cross entropy with mask for sequence

tensorlayer.cost.cross_entropy_seq_with_mask(logits, target_seqs, input_mask, return_details=False, name=None)[源代码]

Returns the expression of cross-entropy of two sequences, implement softmax internally. Normally be used for Dynamic RNN with Synced sequence input and output.

参数:
  • logits (Tensor) -- 2D tensor with shape of [batch_size * ?, n_classes], ? means dynamic IDs for each example. - Can be get from DynamicRNNLayer by setting return_seq_2d to True.
  • target_seqs (Tensor) -- int of tensor, like word ID. [batch_size, ?], ? means dynamic IDs for each example.
  • input_mask (Tensor) -- The mask to compute loss, it has the same size with target_seqs, normally 0 or 1.
  • return_details (boolean) --
    Whether to return detailed losses.
    • If False (default), only returns the loss.
    • If True, returns the loss, losses, weights and targets (see source code).

Examples

>>> batch_size = 64
>>> vocab_size = 10000
>>> embedding_size = 256
>>> input_seqs = tf.placeholder(dtype=tf.int64, shape=[batch_size, None], name="input")
>>> target_seqs = tf.placeholder(dtype=tf.int64, shape=[batch_size, None], name="target")
>>> input_mask = tf.placeholder(dtype=tf.int64, shape=[batch_size, None], name="mask")
>>> net = tl.layers.EmbeddingInputlayer(
...         inputs = input_seqs,
...         vocabulary_size = vocab_size,
...         embedding_size = embedding_size,
...         name = 'seq_embedding')
>>> net = tl.layers.DynamicRNNLayer(net,
...         cell_fn = tf.contrib.rnn.BasicLSTMCell,
...         n_hidden = embedding_size,
...         dropout = (0.7 if is_train else None),
...         sequence_length = tl.layers.retrieve_seq_length_op2(input_seqs),
...         return_seq_2d = True,
...         name = 'dynamicrnn')
>>> print(net.outputs)
(?, 256)
>>> net = tl.layers.DenseLayer(net, n_units=vocab_size, name="output")
>>> print(net.outputs)
(?, 10000)
>>> loss = tl.cost.cross_entropy_seq_with_mask(net.outputs, target_seqs, input_mask)

Cosine similarity

tensorlayer.cost.cosine_similarity(v1, v2)[源代码]

Cosine similarity [-1, 1].

参数:v2 (v1,) -- Tensor with the same shape [batch_size, n_feature].
返回:a tensor of shape [batch_size].
返回类型:Tensor

References

规则化函数

更多 tf.nn.l2_loss, tf.contrib.layers.l1_regularizer, tf.contrib.layers.l2_regularizertf.contrib.layers.sum_regularizer, 请见 TensorFlow API.

Maxnorm

tensorlayer.cost.maxnorm_regularizer(scale=1.0)[源代码]

Max-norm regularization returns a function that can be used to apply max-norm regularization to weights.

More about max-norm, see wiki-max norm. The implementation follows TensorFlow contrib.

参数:scale (float) -- A scalar multiplier Tensor. 0.0 disables the regularizer.
返回:
返回类型:A function with signature mn(weights, name=None) that apply Lo regularization.
Raises:ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.

Special

tensorlayer.cost.li_regularizer(scale, scope=None)[源代码]

Li regularization removes the neurons of previous layer. The i represents inputs. Returns a function that can be used to apply group li regularization to weights. The implementation follows TensorFlow contrib.

参数:
  • scale (float) -- A scalar multiplier Tensor. 0.0 disables the regularizer.
  • scope (str) -- An optional scope name for this function.
返回:

返回类型:

A function with signature li(weights, name=None) that apply Li regularization.

Raises:

ValueError : if scale is outside of the range [0.0, 1.0] or if scale is not a float.

tensorlayer.cost.lo_regularizer(scale)[源代码]

Lo regularization removes the neurons of current layer. The o represents outputs Returns a function that can be used to apply group lo regularization to weights. The implementation follows TensorFlow contrib.

参数:scale (float) -- A scalar multiplier Tensor. 0.0 disables the regularizer.
返回:
返回类型:A function with signature lo(weights, name=None) that apply Lo regularization.
Raises:ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.
tensorlayer.cost.maxnorm_o_regularizer(scale)[源代码]

Max-norm output regularization removes the neurons of current layer. Returns a function that can be used to apply max-norm regularization to each column of weight matrix. The implementation follows TensorFlow contrib.

参数:scale (float) -- A scalar multiplier Tensor. 0.0 disables the regularizer.
返回:
返回类型:A function with signature mn_o(weights, name=None) that apply Lo regularization.
Raises:ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.
tensorlayer.cost.maxnorm_i_regularizer(scale)[源代码]

Max-norm input regularization removes the neurons of previous layer. Returns a function that can be used to apply max-norm regularization to each row of weight matrix. The implementation follows TensorFlow contrib.

参数:scale (float) -- A scalar multiplier Tensor. 0.0 disables the regularizer.
返回:
返回类型:A function with signature mn_i(weights, name=None) that apply Lo regularization.
Raises:ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.