損失函數(shù)
交叉熵的公式如下,其中P(x)為真實分布,Q(x)為預(yù)測分布,x為隨機變量:
![][cross_entropy_equation]
[cross_entropy_equation]: http://latex.codecogs.com/svg.latex?CE\left(P,Q\right)=-P\left(x\right)logQ\left(x\right)
轉(zhuǎn)移到loss function上,對于訓練集中的一條數(shù)據(jù),y表示真實結(jié)果的one-hot vector,y_esti表示預(yù)測的結(jié)果的softmax vector,k表示label(即y中元素等于1的index)。
假設(shè):
y = [0,...,1,...,0]
y_esti = [0.02,...,0.87,...,0.001]
則:
![][cross_entropy_loss]
[cross_entropy_loss]: http://latex.codecogs.com/svg.latex?CE\left(y,y_esti\right)=-ylog\left(y_esti\right)=\sum_{i}^{C}-y_ilog\left(y_esti_i\right)=-1\times\left(0\times0.02+...+1\times0.87+...+0\times0.001\right)=-1\times0.87
也就是說:
![][cross_entropy_loss_short]
[cross_entropy_loss_short]: http://latex.codecogs.com/svg.latex?CE\left(y,y_esti\right)=-y_klog\left(y_esti_k\right)
對于整個訓練集來講,總的loss為:
![][cross_entropy_loss_total]
[cross_entropy_loss_total]: http://latex.codecogs.com/svg.latex?CE_{total}=\sumMCE\left(y,y_esti\right)=\sumM-y_klog\left(y_esti_k\right)
反向求導
現(xiàn)求個簡單的導數(shù)。
假設(shè):
![][assumption1]
[assumption1]: http://latex.codecogs.com/svg.latex?y_esti=softmax\left(\theta\right)
有損失函數(shù):
![][loss1]
[loss1]: http://latex.codecogs.com/svg.latex?CE\left(y,y_esti\right)=-ylog\left(y_esti\right)=\sum_{i}^{C}-y_ilog\left(y_esti_i\right)
求導過程如下:
接著再求一個簡單的三層網(wǎng)絡(luò)的導數(shù)。各參數(shù)的維度如下:
![][dim_1]
[dim_1]: http://latex.codecogs.com/svg.latex?Dim(x)=MD_x
![][dim_2]
[dim_2]: http://latex.codecogs.com/svg.latex?Dim(W1)=D_xH
![][dim_3]
[dim_3]: http://latex.codecogs.com/svg.latex?Dim(b1)=1H
![][dim_4]
[dim_4]: http://latex.codecogs.com/svg.latex?Dim(W2)=HD_y
![][dim_5]
[dim_5]: http://latex.codecogs.com/svg.latex?Dim(b2)=1*D_y
前向傳播:
![][forward_1]
[forward_1]: http://latex.codecogs.com/svg.latex?z1=xW1+b1
![][forward_2]
[forward_2]: http://latex.codecogs.com/svg.latex?h1=sigmoid(z1)
![][forward_3]
[forward_3]: http://latex.codecogs.com/svg.latex?z2=h1W2+b2
![][forward_4]
[forward_4]: http://latex.codecogs.com/svg.latex?y_esti=softmax(z2)
反向傳播:
![][back_1]
[back_1]: http://latex.codecogs.com/svg.latex?\frac{\partial}{\partial{W2}}CE(y,y_esti)=\frac{\partial{CE}}{\partial{z2}}\times\frac{\partial{z2}}{\partial{W2}}=h1^T(y_esti-y)
其中,
![][back_1_1]
[back_1_1]: http://latex.codecogs.com/svg.latex?\frac{\partial{CE}}{\partial{z2}}=y_esti-y
也就是相當于上邊的簡單求導。
這一步最后的結(jié)果是為了對應(yīng)矩陣的維度,所以做了對應(yīng)。
![][back_2]
[back_2]: http://latex.codecogs.com/svg.latex?\frac{\partial}{\partial{b2}}CE(y,y_esti)=\frac{\partial{CE}}{\partial{z2}}\times\frac{\partial{z2}}{\partial{b2}}=y_esti-y
接著再對上一層的參數(shù)求導:
![][back_3]
[back_3]: http://latex.codecogs.com/svg.latex?\frac{\partial}{\partial{W1}}CE(y,y_esti)=\frac{\partial{CE}}{\partial{z2}}\times\frac{\partial{z2}}{\partial{h}}\times\frac{\partial{h}}{\partial{z1}}\times\frac{\partial{z1}}{\partial{W1}}=x^T(y_esti-y)W2T\odot{sigmoid{-1}(z1)}
其中,
![][back_3_1]
[back_3_1]: http://latex.codecogs.com/svg.latex?\frac{\partial{z2}}{\partial{h}}=W2
![][back_3_2]
[back_3_2]: http://latex.codecogs.com/svg.latex?\frac{\partial{h}}{\partial{z1}}=sigmoid^{-1}(z1)
![][back_3_3]
[back_3_3]: http://latex.codecogs.com/svg.latex?\frac{\partial{z1}}{\partial{W1}}=x
最后一個偏導:
![][back_4]
[back_4]: http://latex.codecogs.com/svg.latex?\frac{\partial}{\partial{b1}}CE(y,y_esti)=\frac{\partial{CE}}{\partial{z2}}\times\frac{\partial{z2}}{\partial{h}}\times\frac{\partial{h}}{\partial{z1}}\times\frac{\partial{z1}}{\partial{b1}}=(y_esti-y)W2T\odot{sigmoid{-1}(z1)}
至此為止所有的就都求導完畢了。