BN和dropout在预测和训练时的区别。

Batch Normalization和Dropout是深度学习模型中常用的结构。但BN和dropout在训练和测试时使用却不相同。

Batch Normalization

BN在训练时是在每个batch上计算均值和方差来进行归一化,每个batch的样本量都不大,所以每次计算出来的均值和方差就存在差异。预测时一般传入一个样本,所以不存在归一化,其次哪怕是预测一个batch,但batch计算出来的均值和方差是偏离总体样本的,所以通常是通过滑动平均结合训练时所有batch的均值和方差来得到一个总体均值和方差。以tensorflow代码实现为例:

def bn_layer(self, inputs, training, name="bn", epsilon=1e-5, decay=0.9):
        """
        bn层
        inputs: [batch, height, width, channel]

        """

        with tf.variable_scope(name):
            channel = inputs.get_shape().as_list()[-1]

            scale = tf.get_variable('scale', [channel], initializer=tf.constant_initializer(1.0))
            offset = tf.get_variable('offset', [channel], initializer=tf.constant_initializer(0.0))

            # 用于测试时的均值和方差
            pop_mean = tf.get_variable('pop_mean', [channel], initializer=tf.zeros_initializer, trainable=False)
            pop_var = tf.get_variable('pop_var', [channel], initializer=tf.ones_initializer, trainable=False)

            batch_mean, batch_var = tf.nn.moments(inputs, [0, 1, 2])

            train_mean_op = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))
            train_var_op = tf.assign(pop_var, pop_var * decay + batch_var * (1 - decay))

            def batch_statistics():
                with tf.control_dependencies([train_mean_op, train_var_op]):
                    return tf.nn.batch_normalization(inputs, batch_mean, batch_var, offset, scale, epsilon)

            def population_statistics():
                return tf.nn.batch_normalization(inputs, pop_mean, pop_var, offset, scale, epsilon)

            return tf.cond(training, batch_statistics, population_statistics)

training参数可以通过tf.placeholder传入,这样就可以控制训练和预测时training的值。

self.training = tf.placeholder(tf.bool, name="training")

Dropout

Dropout在训练时会随机丢弃一些神经元,这样会导致输出的结果变小。而预测时往往关闭dropout,保证预测结果的一致性(不关闭dropout可能同一个输入会得到不同的输出,不过输出会服从某一分布。另外有些情况下可以不关闭dropout,比如文本生成下,不关闭会增大输出的多样性)。

为了对齐Dropout训练和预测的结果,通常有两种做法,假设dropout rate = 0.2。一种是训练时不做处理,预测时输出乘以(1 - dropout rate)。另一种是训练时留下的神经元除以(1 - dropout rate),预测时不做处理。以tensorflow为例。

x = tf.nn.dropout(x, self.keep_prob)
self.keep_prob = tf.placeholder(tf.float32, name="keep_prob")

tf.nn.dropout就是采用了第二种做法,训练时除以(1 - dropout rate),源码如下:

binary_tensor = math_ops.floor(random_tensor)
 ret = math_ops.div(x, keep_prob) * binary_tensor
 if not context.executing_eagerly():
   ret.set_shape(x.get_shape())
 return ret

binary_tensor就是一个mask tensor,即里面的值由0或1组成。keep_prob = 1 - dropout rate。

 

上一篇:Java Annotation手册


下一篇:深度学习中Dropout原理解析