深度学习笔记(3)神经网络,学习率,激活函数,损失函数

神经网络(NN)的复杂度

空间复杂度:

计算神经网络的层数时只统计有运算能力的层,输入层仅仅起到将数据传输进来的作用,没有涉及到运算,所以统计神经网络层数时不算输入层

输入层和输出层之间所有层都叫做隐藏层

层数 = 隐藏层的层数 + 1个输出层

总参数个数 = 总w个数 + 总b个数

时间复杂度:

乘加运算次数

学习率以及参数的更新:

\[{w_{t + 1}} = {w_t} - lr*\frac{{\partial loss}}{{\partial {w_t}}} \]

指数衰减学习率的选择及设置

可以先用较大的学习率,快速得到较优解,然后逐步减小学习率,使模型在训练后期稳定。
指数衰减学习率 = 初始学习率 * 学习率衰减率( 当前轮数 / 多少轮衰减一次 )

\[指数衰减学习率 = 初始学习率*{学习率衰减率^{\frac{当前轮数}{多少轮衰减一次}}} \]

#学习率衰减
import tensorflow as tf

w = tf.Variable(tf.constant(5, dtype=tf.float32))
epoch = 40
LR_BASE = 0.2
LR_DECAY = 0.99
LR_STEP = 1


for epoch in range(epoch):  
    lr = LR_BASE * LR_DECAY**(epoch / LR_STEP)
    with tf.GradientTape() as tape:  
        loss = tf.square(w + 1)     
    grads = tape.gradient(loss, w)  

    w.assign_sub(lr * grads)  

    print("After %2s epoch,\tw is %f,\tloss is %f\tlr is %f" % (epoch, w.numpy(), loss,lr))
    
After  0 epoch,	w is 2.600000,	loss is 36.000000	lr is 0.200000
After  1 epoch,	w is 1.174400,	loss is 12.959999	lr is 0.198000
After  2 epoch,	w is 0.321948,	loss is 4.728015	lr is 0.196020
After  3 epoch,	w is -0.191126,	loss is 1.747547	lr is 0.194060
After  4 epoch,	w is -0.501926,	loss is 0.654277	lr is 0.192119
After  5 epoch,	w is -0.691392,	loss is 0.248077	lr is 0.190198
After  6 epoch,	w is -0.807611,	loss is 0.095239	lr is 0.188296
After  7 epoch,	w is -0.879339,	loss is 0.037014	lr is 0.186413
After  8 epoch,	w is -0.923874,	loss is 0.014559	lr is 0.184549
After  9 epoch,	w is -0.951691,	loss is 0.005795	lr is 0.182703
After 10 epoch,	w is -0.969167,	loss is 0.002334	lr is 0.180876
After 11 epoch,	w is -0.980209,	loss is 0.000951	lr is 0.179068
After 12 epoch,	w is -0.987226,	loss is 0.000392	lr is 0.177277
After 13 epoch,	w is -0.991710,	loss is 0.000163	lr is 0.175504
After 14 epoch,	w is -0.994591,	loss is 0.000069	lr is 0.173749
After 15 epoch,	w is -0.996452,	loss is 0.000029	lr is 0.172012
After 16 epoch,	w is -0.997660,	loss is 0.000013	lr is 0.170292
After 17 epoch,	w is -0.998449,	loss is 0.000005	lr is 0.168589
After 18 epoch,	w is -0.998967,	loss is 0.000002	lr is 0.166903
After 19 epoch,	w is -0.999308,	loss is 0.000001	lr is 0.165234
After 20 epoch,	w is -0.999535,	loss is 0.000000	lr is 0.163581
After 21 epoch,	w is -0.999685,	loss is 0.000000	lr is 0.161946
After 22 epoch,	w is -0.999786,	loss is 0.000000	lr is 0.160326
After 23 epoch,	w is -0.999854,	loss is 0.000000	lr is 0.158723
After 24 epoch,	w is -0.999900,	loss is 0.000000	lr is 0.157136
After 25 epoch,	w is -0.999931,	loss is 0.000000	lr is 0.155564
After 26 epoch,	w is -0.999952,	loss is 0.000000	lr is 0.154009
After 27 epoch,	w is -0.999967,	loss is 0.000000	lr is 0.152469
After 28 epoch,	w is -0.999977,	loss is 0.000000	lr is 0.150944
After 29 epoch,	w is -0.999984,	loss is 0.000000	lr is 0.149434
After 30 epoch,	w is -0.999989,	loss is 0.000000	lr is 0.147940
After 31 epoch,	w is -0.999992,	loss is 0.000000	lr is 0.146461
After 32 epoch,	w is -0.999994,	loss is 0.000000	lr is 0.144996
After 33 epoch,	w is -0.999996,	loss is 0.000000	lr is 0.143546
After 34 epoch,	w is -0.999997,	loss is 0.000000	lr is 0.142111
After 35 epoch,	w is -0.999998,	loss is 0.000000	lr is 0.140690
After 36 epoch,	w is -0.999999,	loss is 0.000000	lr is 0.139283
After 37 epoch,	w is -0.999999,	loss is 0.000000	lr is 0.137890
After 38 epoch,	w is -0.999999,	loss is 0.000000	lr is 0.136511
After 39 epoch,	w is -0.999999,	loss is 0.000000	lr is 0.135146

激活函数

sigmoid函数

tf.nn. sigmoid(x)

\[f(x) = \frac{1}{{1 + {e^{ - x}}}} \]

图像:
深度学习笔记(3)神经网络,学习率,激活函数,损失函数

相当于对输入进行了归一化

  • 多层神经网络更新参数时,需要从输出层往输入层方向逐层进行链式求导,而sigmoid函数的导数输出是0到0.25之间的小数,链式求导时,多层导数连续相乘会出现多个0到0.25之间的值连续相乘,结果将趋近于0,产生梯度消失,使得参数无法继续更新
  • 且sigmoid函数存在幂运算,计算比较复杂

tanh函数

tf.math. tanh(x)

\[f(x) = \frac{{1 - {e^{ - 2x}}}}{{1 + {e^{ - 2x}}}} \]

图像:
深度学习笔记(3)神经网络,学习率,激活函数,损失函数

和上面提到的sigmoid函数一样,同样存在梯度消失和幂运算复杂的缺点

Relu函数

tf.nn.relu(x)

\[f(x)=max(x,0)=0(x<0)时或者x(x>=0)时 \]

图像:

深度学习笔记(3)神经网络,学习率,激活函数,损失函数
  • Relu函数在正区间内解决了梯度消失的问题,使用时只需要判断输入是否大于0,计算速度快
  • 训练参数时的收敛速度要快于sigmoid函数和tanh函数
  • 输出非0均值,收敛慢
  • Dead RelU问题:送入激活函数的输入特征是负数时,激活函数输出是0,反向传播得到的梯度是0,致使参数无法更新,某些神经元可能永远不会被激活。但是可以通过改进随机初始化,避免过多的负数特征送入Relu函数,可以通过设置更小的学习率来减小参数分布的巨大变化,来避免训练过程中产生过多负数特征。

Leaky Relu函数

tf.nn.leaky_relu(x)

\[
上一篇:7.21 2020 Multi-University Training Contest 1题解及补题


下一篇:图神经网络(二十四) STRATEGIES FOR PRE-TRAINING GRAPH NEURAL NETWORKS, ICLR 2020