神经网络与深度学习笔记(二)正向传播与反向传播
正向传播
正向传播计算的是神经网络的输出
如上图,就是一次类似的正向传播的过程,正向传播计算最后的输出值。
将%20%3D%203(a%20%2B%20b%20*%20c)#card=math&code=J%28a%2Cb%2Cc%29%20%3D%203%28a%20%2B%20b%20%2A%20c%29)这一个式子用, 来代替:
最后求出的值
反向传播
反向传播计算神经网络的梯度以及微分
如上图是一个类似的反向传播的过程。
对图中右侧的输出 我们有
接着我们从右往左继续计算
对于 有
以及
最后到达
最后就得出了的值
实际上这就是我们高等数学中的链式法则。通过求出上述三个的导数来计算梯度,进行梯度下降来更新 与 等参数
矢量计算
在计算神经网络的时候,我们尽量使用矢量来计算参数。因为numpy
的计算速度一般是比使用for
循环快的。
如上图,我们的输入变量有 。如果使用普通的方法计算就显得十分繁琐
我们应该积极使用矢量化来简化计算速度
对于 有
其中我们知道:
与 都是 维
这是我们可以使用numpy
这样写
z = np.dot(w.T,x) + b
上面的代码段计算 的转置和 的乘积
同样的,对于:
%7D%2Ca%5E%7B(2)%7D%2C%5Ccdots%20a%5E%7B(m)%7D%5D%5C%5C%0AY%20%3D%20%5By%5E%7B(1)%7D%2Cy%5E%7B(2)%7D%2C%5Ccdots%20y%5E%7B(m)%7D%5D%5C%5C%0Adz%20%3D%20%5Bdz%5E%7B(1)%7D%2Cdz%5E%7B(2)%7D%2C%5Ccdots%20dz%5E%7B(m)%7D%5D%20%3D%20A%20-%20Y%20%3D%20%5Ba%5E%7B(1)%7D-y%5E%7B(1)%7D%2C%5Ccdots%20%5Ccdots%5D%5C%5C%0A#card=math&code=A%20%3D%20%5Ba%5E%7B%281%29%7D%2Ca%5E%7B%282%29%7D%2C%5Ccdots%20a%5E%7B%28m%29%7D%5D%5C%5C%0AY%20%3D%20%5By%5E%7B%281%29%7D%2Cy%5E%7B%282%29%7D%2C%5Ccdots%20y%5E%7B%28m%29%7D%5D%5C%5C%0Adz%20%3D%20%5Bdz%5E%7B%281%29%7D%2Cdz%5E%7B%282%29%7D%2C%5Ccdots%20dz%5E%7B%28m%29%7D%5D%20%3D%20A%20-%20Y%20%3D%20%5Ba%5E%7B%281%29%7D-y%5E%7B%281%29%7D%2C%5Ccdots%20%5Ccdots%5D%5C%5C%0A)
也可进行类似的计算
%7D%20%3D%20%5Cfrac%7B1%7D%7Bm%7Dnp.sum(dz)%5C%5C%0A%5C%5C%0Adw%20%3D%20%5Cfrac%7B1%7D%7Bm%7Ddz%5ET%20%3D%20%5Cfrac%7B1%7D%7Bm%7D%20%5Cbegin%7Bbmatrix%7D%0A%5Cvdots%20%20%26%20%5Cvdots%20%26%20%5Cvdots%20%26%20%5Cvdots%20%5C%5C%0A%5Cvdots%20%20%26%20%5Cvdots%20%26%20%5Cvdots%20%26%20%5Cvdots%20%5C%5C%0Ax%5E%7B(1)%7D%20%26%20%5Ccdots%20%26%20%5Ccdots%20%26%20x%5E%7B(m)%7D%20%5C%5C%0A%5Cvdots%20%20%26%20%5Cvdots%20%26%20%5Cvdots%20%26%20%5Cvdots%20%5C%5C%0A%5Cvdots%20%20%26%20%5Cvdots%20%26%20%5Cvdots%20%26%20%5Cvdots%20%5C%5C%0A%5Cend%7Bbmatrix%7D%20%0A%5Cbegin%7Bbmatrix%7D%0A%5Cvdots%5C%5C%0A%5Cvdots%5C%5C%0Adz%5E%7B(i)%7D%5C%5C%0A%5Cvdots%5C%5C%0A%5Cvdots%20%5C%5C%0A%5Cend%7Bbmatrix%7D%20%3D%20%5Cfrac%7B1%7D%7Bm%7D%20%5Cbegin%7Bbmatrix%7D%20x%5E%7B(1)%7Ddz%5E%7B(1)%7D%20%26%20%5Ccdots%20%26%20%5Ccdots%20%26%20x%5E%7B(m)%7Ddz%5E%7B(m)%7D%20%5C%5C%5Cend%7Bbmatrix%7D%0A#card=math&code=db%20%3D%20%5Cfrac%7B1%7D%7Bm%7D%20%5Csum_%7Bi%7D%5Emdz%5E%7B%28i%29%7D%20%3D%20%5Cfrac%7B1%7D%7Bm%7Dnp.sum%28dz%29%5C%5C%0A%5C%5C%0Adw%20%3D%20%5Cfrac%7B1%7D%7Bm%7Ddz%5ET%20%3D%20%5Cfrac%7B1%7D%7Bm%7D%20%5Cbegin%7Bbmatrix%7D%0A%5Cvdots%20%20%26%20%5Cvdots%20%26%20%5Cvdots%20%26%20%5Cvdots%20%5C%5C%0A%5Cvdots%20%20%26%20%5Cvdots%20%26%20%5Cvdots%20%26%20%5Cvdots%20%5C%5C%0Ax%5E%7B%281%29%7D%20%26%20%5Ccdots%20%26%20%5Ccdots%20%26%20x%5E%7B%28m%29%7D%20%5C%5C%0A%5Cvdots%20%20%26%20%5Cvdots%20%26%20%5Cvdots%20%26%20%5Cvdots%20%5C%5C%0A%5Cvdots%20%20%26%20%5Cvdots%20%26%20%5Cvdots%20%26%20%5Cvdots%20%5C%5C%0A%5Cend%7Bbmatrix%7D%20%0A%5Cbegin%7Bbmatrix%7D%0A%5Cvdots%5C%5C%0A%5Cvdots%5C%5C%0Adz%5E%7B%28i%29%7D%5C%5C%0A%5Cvdots%5C%5C%0A%5Cvdots%20%5C%5C%0A%5Cend%7Bbmatrix%7D%20%3D%20%5Cfrac%7B1%7D%7Bm%7D%20%5Cbegin%7Bbmatrix%7D%20x%5E%7B%281%29%7Ddz%5E%7B%281%29%7D%20%26%20%5Ccdots%20%26%20%5Ccdots%20%26%20x%5E%7B%28m%29%7Ddz%5E%7B%28m%29%7D%20%5C%5C%5Cend%7Bbmatrix%7D%0A)
Tips:
A.sum(sxis=0)#按列求和
不要使用秩为1的数组。应类似于(5,1)而不是(5,)
a = np.random.randn(5)#错误
a.shape(5,)
a = np.random.randn(5,1)#正确
- 善于使用
assert
断言assert(a.shape==(5,1))
a.reshape((5,1))
cost function由来
我们知道 cost 函数的式子是下面公式所示:
%20%3D%20-%20%5Cfrac%7B1%7D%7Bm%7D%5Csum%7B1%7D%5Em%20%5Cleft%5B%20y%5E%7B(i)%7Dlog%20%5Chat%20y%5E%7B(i)%7D%2B(1-y%5E%7B(i)%7D)log(1-%5Chat%20y%5E%7B(i)%7D)%5Cright%5D%0A#card=math&code=%5Cjmath%28w%2Cb%29%20%3D%20-%20%5Cfrac%7B1%7D%7Bm%7D%5Csum%7B1%7D%5Em%20%5Cleft%5B%20y%5E%7B%28i%29%7Dlog%20%5Chat%20y%5E%7B%28i%29%7D%2B%281-y%5E%7B%28i%29%7D%29log%281-%5Chat%20y%5E%7B%28i%29%7D%29%5Cright%5D%0A)
那么,为什么要这样定义?
因为我们要实现:
当 时:%3D%5Chat%20y#card=math&code=P%28y%20%7C%20x%29%3D%5Chat%20y)
当 时:%3D1-%5Chat%20y#card=math&code=P%28y%7Cx%29%3D1-%5Chat%20y)
在这种情况下,#card=math&code=P%28y%7Cx%29) 可以写成
%20%3D%20%5Chat%20y%20%5E%7By%7D%20*%20(%201%20-%20%5Chat%20y%20)%5E%7B(1-y)%7D%0A#card=math&code=P%28y%20%7C%20x%29%20%3D%20%5Chat%20y%20%5E%7By%7D%20%2A%20%28%201%20-%20%5Chat%20y%20%29%5E%7B%281-y%29%7D%0A)
两边加对数得:
%20%3D%20ylog%20%5Chat%20y%20%2B(1-y)(log1-%5Chat%20y)%0A#card=math&code=logP%28y%7Cx%29%20%3D%20ylog%20%5Chat%20y%20%2B%281-y%29%28log1-%5Chat%20y%29%0A)
神经网络层每层向量的形状
如上图,我们示例一个2层的神经网络。正如上图所示,神经网络的层数一般不包括输入层。计算层数时,一般将输入层视为第0层,然后依次往下数。在输入层与输出层之间是隐藏层。
接下来我们计算每层在运算时,矢量形状的变化
我们需要的还是以下的方程式
%0A#card=math&code=z%20%3D%20wx%20%2B%20b%5C%5C%0Aa%20%3D%20%5Csigma%20%28z%29%0A)
约定:上标[]
表示不同的层,上标()
表不同的向量,下标()
表向量形状
我们开始计算输入层到隐藏层:
有3个 %7D#card=math&code=x_%7B%28i%29%7D) 变量,故 的形状为(3,1)
隐藏层有4个单元,故 的形状为(4,3)
由 与 相乘的结果知道相乘后的形状为(4,1),故b的形状为(4,1),z的形状为(4,1)
综合有:
%7D%20%3D%20w%5E%7B%5B1%5D%7D%7B(4%2C3)%7Dx%7B(3%2C1)%7D%20%2B%20b%5E%7B%5B1%5D%7D%7B(4%2C1)%7D%5C%5C%0A%0A%5C%5C%0Aa%5E%7B%5B1%5D%7D%7B(4%2C1)%7D%20%3D%20%5Csigma%20(z%5E%7B%5B1%5D%7D%7B(4%2C1)%7D)%0A#card=math&code=z%5E%7B%5B1%5D%7D%7B%284%2C1%29%7D%20%3D%20w%5E%7B%5B1%5D%7D%7B%284%2C3%29%7Dx%7B%283%2C1%29%7D%20%2B%20b%5E%7B%5B1%5D%7D%7B%284%2C1%29%7D%5C%5C%0A%0A%5C%5C%0Aa%5E%7B%5B1%5D%7D%7B%284%2C1%29%7D%20%3D%20%5Csigma%20%28z%5E%7B%5B1%5D%7D_%7B%284%2C1%29%7D%29%0A)
继续到隐藏层到输出层的计算:
由输入层到隐藏层的计算结果可知,隐藏层的输入形状为(4,1)
而输出层只有一个神经元
故有:
%7D%20%3D%20w%5E%7B%5B2%5D%7D%7B(1%2C4)%7Da%5E%7B%5B1%5D%7D%7B(4%2C1)%7D%2Bb%5E%7B%5B2%5D%7D%7B(1%2C1)%7D%0A%5C%5C%0Aa%5E%7B%5B2%5D%7D%7B(1%2C1)%7D%20%3D%20%5Csigma(z%5E%7B%5B2%5D%7D%7B(1%2C1)%7D)%0A#card=math&code=z%5E%7B%5B2%5D%7D%7B%281%2C1%29%7D%20%3D%20w%5E%7B%5B2%5D%7D%7B%281%2C4%29%7Da%5E%7B%5B1%5D%7D%7B%284%2C1%29%7D%2Bb%5E%7B%5B2%5D%7D%7B%281%2C1%29%7D%0A%5C%5C%0Aa%5E%7B%5B2%5D%7D%7B%281%2C1%29%7D%20%3D%20%5Csigma%28z%5E%7B%5B2%5D%7D_%7B%281%2C1%29%7D%29%0A)
因此我们推广开可以得到:
%0A#card=math&code=z%5E%7B%5Bl%5D%7D%20%3D%20w%5E%7B%5Bl%5D%7Da%5E%7B%5Bl-1%5D%7D%2Bb%5E%7B%5Bl%5D%7D%0A%5C%5C%0Aa%5E%7B%5Bl%5D%7D%20%3D%20%5Csigma%28z%5E%7B%5Bl%5D%7D%29%0A)
上面就是前向传播每层单元的关系
我们设 为第 层的单元数
则:
%5C%5C%0Ab%5E%7B%5Bl%5D%7D%E4%B8%8E%20db%20%E7%9A%84%E5%BD%A2%E7%8A%B6%E5%9D%87%E4%B8%BA%EF%BC%9A(n%5E%7B%5Bl%5D%7D%2C1)%0A#card=math&code=w%5E%7B%5Bl%5D%7D%E4%B8%8E%20dw%20%E7%9A%84%E5%BD%A2%E7%8A%B6%E5%9D%87%E4%B8%BA%EF%BC%9A%28n%5E%7B%5Bl%5D%7D%2Cn%5E%7B%5Bl-1%5D%7D%29%5C%5C%0Ab%5E%7B%5Bl%5D%7D%E4%B8%8E%20db%20%E7%9A%84%E5%BD%A2%E7%8A%B6%E5%9D%87%E4%B8%BA%EF%BC%9A%28n%5E%7B%5Bl%5D%7D%2C1%29%0A)