【深度学习】BP反向传播算法Python简单实现

妖狐艹你老母 2022-06-05 11:09 287阅读 0赞

  个人觉得BP反向传播是深度学习的一个基础,所以很有必要把反向传播算法好好学一下
  得益于一步一步弄懂反向传播的例子这篇文章,给出一个例子来说明反向传播
  不过是英文的,如果你感觉不好阅读的话,优秀的国人已经把它翻译出来了。
  一步一步弄懂反向传播的例子(中文翻译)
神经网络
  然后我使用了那个博客的图片。这次的目的主要是对那个博客的一个补充。但是首先我觉得先用面向过程的思想来实现一遍感觉会好一点。随便把文中省略的公式给大家给写出来。大家可以先看那篇博文。

∂Etotal∂w5=∂Etotal∂outo1×∂outo1∂neto1×∂neto1∂w5∂Etotal∂w6=∂Etotal∂outo1×∂outo1∂neto1×∂neto1∂w6∂Etotal∂w7=∂Etotal∂outo2×∂outo2∂neto2×∂neto2∂w7∂Etotal∂w8=∂Etotal∂outo2×∂outo2∂neto2×∂neto2∂w8∂Etotal∂w1=∂Etotal∂outh1×∂outh1∂neth1×∂neth1∂w1∂Etotal∂w2=∂Etotal∂outh1×∂outh1∂neth1×∂neth1∂w2∂Etotal∂w3=∂Etotal∂outh2×∂outh2∂neth2×∂neth2∂w3∂Etotal∂w4=∂Etotal∂outh2×∂outh2∂neth2×∂neth2∂w4(1)(2)(3)(4)(5)(6)(7)(8) (1) ∂ E t o t a l ∂ w 5 = ∂ E t o t a l ∂ o u t o 1 × ∂ o u t o 1 ∂ n e t o 1 × ∂ n e t o 1 ∂ w 5 (2) ∂ E t o t a l ∂ w 6 = ∂ E t o t a l ∂ o u t o 1 × ∂ o u t o 1 ∂ n e t o 1 × ∂ n e t o 1 ∂ w 6 (3) ∂ E t o t a l ∂ w 7 = ∂ E t o t a l ∂ o u t o 2 × ∂ o u t o 2 ∂ n e t o 2 × ∂ n e t o 2 ∂ w 7 (4) ∂ E t o t a l ∂ w 8 = ∂ E t o t a l ∂ o u t o 2 × ∂ o u t o 2 ∂ n e t o 2 × ∂ n e t o 2 ∂ w 8 (5) ∂ E t o t a l ∂ w 1 = ∂ E t o t a l ∂ o u t h 1 × ∂ o u t h 1 ∂ n e t h 1 × ∂ n e t h 1 ∂ w 1 (6) ∂ E t o t a l ∂ w 2 = ∂ E t o t a l ∂ o u t h 1 × ∂ o u t h 1 ∂ n e t h 1 × ∂ n e t h 1 ∂ w 2 (7) ∂ E t o t a l ∂ w 3 = ∂ E t o t a l ∂ o u t h 2 × ∂ o u t h 2 ∂ n e t h 2 × ∂ n e t h 2 ∂ w 3 (8) ∂ E t o t a l ∂ w 4 = ∂ E t o t a l ∂ o u t h 2 × ∂ o u t h 2 ∂ n e t h 2 × ∂ n e t h 2 ∂ w 4

  1. import numpy as np
  2. # "pd" 偏导
  3. def sigmoid(x):
  4. return 1 / (1 + np.exp(-x))
  5. def sigmoidDerivationx(y):
  6. return y * (1 - y)
  7. if __name__ == "__main__":
  8. #初始化
  9. bias = [0.35, 0.60]
  10. weight = [0.15, 0.2, 0.25, 0.3, 0.4, 0.45, 0.5, 0.55]
  11. output_layer_weights = [0.4, 0.45, 0.5, 0.55]
  12. i1 = 0.05
  13. i2 = 0.10
  14. target1 = 0.01
  15. target2 = 0.99
  16. alpha = 0.5 #学习速率
  17. numIter = 90000 #迭代次数
  18. for i in range(numIter):
  19. #正向传播
  20. neth1 = i1*weight[1-1] + i2*weight[2-1] + bias[0]
  21. neth2 = i1*weight[3-1] + i2*weight[4-1] + bias[0]
  22. outh1 = sigmoid(neth1)
  23. outh2 = sigmoid(neth2)
  24. neto1 = outh1*weight[5-1] + outh2*weight[6-1] + bias[1]
  25. neto2 = outh2*weight[7-1] + outh2*weight[8-1] + bias[1]
  26. outo1 = sigmoid(neto1)
  27. outo2 = sigmoid(neto2)
  28. print(str(i) + ", target1 : " + str(target1-outo1) + ", target2 : " + str(target2-outo2))
  29. if i == numIter-1:
  30. print("lastst result : " + str(outo1) + " " + str(outo2))
  31. #反向传播
  32. #计算w5-w8(输出层权重)的误差
  33. pdEOuto1 = - (target1 - outo1)
  34. pdOuto1Neto1 = sigmoidDerivationx(outo1)
  35. pdNeto1W5 = outh1
  36. pdEW5 = pdEOuto1 * pdOuto1Neto1 * pdNeto1W5
  37. pdNeto1W6 = outh2
  38. pdEW6 = pdEOuto1 * pdOuto1Neto1 * pdNeto1W6
  39. pdEOuto2 = - (target2 - outo2)
  40. pdOuto2Neto2 = sigmoidDerivationx(outo2)
  41. pdNeto1W7 = outh1
  42. pdEW7 = pdEOuto2 * pdOuto2Neto2 * pdNeto1W7
  43. pdNeto1W8 = outh2
  44. pdEW8 = pdEOuto2 * pdOuto2Neto2 * pdNeto1W8
  45. # 计算w1-w4(输出层权重)的误差
  46. pdEOuto1 = - (target1 - outo1) #之前算过
  47. pdEOuto2 = - (target2 - outo2) #之前算过
  48. pdOuto1Neto1 = sigmoidDerivationx(outo1) #之前算过
  49. pdOuto2Neto2 = sigmoidDerivationx(outo2) #之前算过
  50. pdNeto1Outh1 = weight[5-1]
  51. pdNeto1Outh2 = weight[7-1]
  52. pdENeth1 = pdEOuto1 * pdOuto1Neto1 * pdNeto1Outh1 + pdEOuto2 * pdOuto2Neto2 * pdNeto1Outh2
  53. pdOuth1Neth1 = sigmoidDerivationx(outh1)
  54. pdNeth1W1 = i1
  55. pdNeth1W2 = i2
  56. pdEW1 = pdENeth1 * pdOuth1Neth1 * pdNeth1W1
  57. pdEW2 = pdENeth1 * pdOuth1Neth1 * pdNeth1W2
  58. pdNeto1Outh2 = weight[6-1]
  59. pdNeto2Outh2 = weight[8-1]
  60. pdOuth2Neth2 = sigmoidDerivationx(outh2)
  61. pdNeth1W3 = i1
  62. pdNeth1W4 = i2
  63. pdENeth2 = pdEOuto1 * pdOuto1Neto1 * pdNeto1Outh2 + pdEOuto2 * pdOuto2Neto2 * pdNeto2Outh2
  64. pdEW3 = pdENeth2 * pdOuth2Neth2 * pdNeth1W3
  65. pdEW4 = pdENeth2 * pdOuth2Neth2 * pdNeth1W4
  66. #权重更新
  67. weight[1-1] = weight[1-1] - alpha * pdEW1
  68. weight[2-1] = weight[2-1] - alpha * pdEW2
  69. weight[3-1] = weight[3-1] - alpha * pdEW3
  70. weight[4-1] = weight[4-1] - alpha * pdEW4
  71. weight[5-1] = weight[5-1] - alpha * pdEW5
  72. weight[6-1] = weight[6-1] - alpha * pdEW6
  73. weight[7-1] = weight[7-1] - alpha * pdEW7
  74. weight[8-1] = weight[8-1] - alpha * pdEW8
  75. # print(weight[1-1])
  76. # print(weight[2-1])
  77. # print(weight[3-1])
  78. # print(weight[4-1])
  79. # print(weight[5-1])
  80. # print(weight[6-1])
  81. # print(weight[7-1])
  82. # print(weight[8-1])

  不知道你是否对此感到熟悉一点了呢?反正我按照公式实现一遍之后深有体会,然后用向量的又写了一次代码。
  接下来我们要用向量来存储这些权重,输出结果等,因为如果我们不这样做,你看上面的例子就知道我们需要写很多w1,w2等,这要是参数一多就很可怕。
  这些格式我是参考吴恩达的格式,相关课程资料->吴恩达深度学习视频。
神经网络
我将原文的图片的变量名改成如上
然后正向传播的公式如下:

z[1]1=w[1]1T⋅x+b1,a[1]1=σ(z[1]1)z[1]2=w[1]2T⋅x+b1,a[1]2=σ(z[1]2)z[2]1=w[2]1T⋅a1+b2,a[2]1=σ(z[2]1)z[2]2=w[2]2T⋅a1+b2,a[2]2=σ(z[2]2)(9)(10)(11)(12)(13) (9) z 1 [ 1 ] = w 1 [ 1 ] T ⋅ x + b 1 , a 1 [ 1 ] = σ ( z 1 [ 1 ] ) (10) z 2 [ 1 ] = w 2 [ 1 ] T ⋅ x + b 1 , a 2 [ 1 ] = σ ( z 2 [ 1 ] ) (11) z 1 [ 2 ] = w 1 [ 2 ] T ⋅ a 1 + b 2 , a 1 [ 2 ] = σ ( z 1 [ 2 ] ) (12) z 2 [ 2 ] = w 2 [ 2 ] T ⋅ a 1 + b 2 , a 2 [ 2 ] = σ ( z 2 [ 2 ] ) (13)

其中

w[1]1T=(w1,w2)w[1]2T=(w3,w4)w[2]1T=(w5,w6)w[2]2T=(w7,w8)(14)(15)(16)(17) (14) w 1 [ 1 ] T = ( w 1 , w 2 ) (15) w 2 [ 1 ] T = ( w 3 , w 4 ) (16) w 1 [ 2 ] T = ( w 5 , w 6 ) (17) w 2 [ 2 ] T = ( w 7 , w 8 )

然后反向传播的公式如下:

∂E∂w[2]1=∂E∂a[2]1⋅∂a[2]1∂z[2]1⋅∂z[2]1∂w[2]1∂E∂w[2]1=∂E∂a[2]2⋅∂a[2]2∂z[2]2⋅∂z[2]2∂w[2]2∂E∂w[1]1=∂E∂a[1]1⋅∂a[1]1∂z[1]1⋅∂z[1]1∂w[1]1∂E∂w[1]2=∂E∂a[1]2⋅∂a[1]2∂z[1]2⋅∂z[1]2∂w[1]2(18)(19)(20)(21) (18) ∂ E ∂ w 1 [ 2 ] = ∂ E ∂ a 1 [ 2 ] ⋅ ∂ a 1 [ 2 ] ∂ z 1 [ 2 ] ⋅ ∂ z 1 [ 2 ] ∂ w 1 [ 2 ] (19) ∂ E ∂ w 1 [ 2 ] = ∂ E ∂ a 2 [ 2 ] ⋅ ∂ a 2 [ 2 ] ∂ z 2 [ 2 ] ⋅ ∂ z 2 [ 2 ] ∂ w 2 [ 2 ] (20) ∂ E ∂ w 1 [ 1 ] = ∂ E ∂ a 1 [ 1 ] ⋅ ∂ a 1 [ 1 ] ∂ z 1 [ 1 ] ⋅ ∂ z 1 [ 1 ] ∂ w 1 [ 1 ] (21) ∂ E ∂ w 2 [ 1 ] = ∂ E ∂ a 2 [ 1 ] ⋅ ∂ a 2 [ 1 ] ∂ z 2 [ 1 ] ⋅ ∂ z 2 [ 1 ] ∂ w 2 [ 1 ]

具体地

∂E∂a[2]1=−(y1−a[2]1)∂a[2]1∂z[2]1=a[2]1(1−a[2]1)∂z[2]1∂w[2]1=a[2]1∂E∂a[2]2=−(y2−a[2]2)∂a[2]1∂z[2]1=a[2]2(1−a[2]2)∂z[2]1∂w[2]2=a[2]2∂E∂a[1]1=w[2]1Tδ2∂a[1]1∂z[1]1=a[1]1⋅(1−a[1]1)∂z[1]1∂w[1]1=a[1]1∂E∂a[1]1=w[2]2Tδ2∂a[1]2∂z[1]1=a[1]2⋅(1−a[1]2)∂z[1]1∂w[1]2=a[1]2(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33) (22) ∂ E ∂ a 1 [ 2 ] = − ( y 1 − a 1 [ 2 ] ) (23) ∂ a 1 [ 2 ] ∂ z 1 [ 2 ] = a 1 [ 2 ] ( 1 − a 1 [ 2 ] ) (24) ∂ z 1 [ 2 ] ∂ w 1 [ 2 ] = a 1 [ 2 ] (25) ∂ E ∂ a 2 [ 2 ] = − ( y 2 − a 2 [ 2 ] ) (26) ∂ a 1 [ 2 ] ∂ z 1 [ 2 ] = a 2 [ 2 ] ( 1 − a 2 [ 2 ] ) (27) ∂ z 1 [ 2 ] ∂ w 2 [ 2 ] = a 2 [ 2 ] (28) ∂ E ∂ a 1 [ 1 ] = w 1 [ 2 ] T δ 2 (29) ∂ a 1 [ 1 ] ∂ z 1 [ 1 ] = a 1 [ 1 ] ⋅ ( 1 − a 1 [ 1 ] ) (30) ∂ z 1 [ 1 ] ∂ w 1 [ 1 ] = a 1 [ 1 ] (31) ∂ E ∂ a 1 [ 1 ] = w 2 [ 2 ] T δ 2 (32) ∂ a 2 [ 1 ] ∂ z 1 [ 1 ] = a 2 [ 1 ] ⋅ ( 1 − a 2 [ 1 ] ) (33) ∂ z 1 [ 1 ] ∂ w 2 [ 1 ] = a 2 [ 1 ]

其中

δ2=⎛⎝⎜⎜⎜⎜⎜⎜∂E∂a[2]1⋅∂a[2]1∂z[2]1∂E∂a[2]2⋅∂a[2]2∂z[2]2(34)(35)⎞⎠⎟⎟⎟⎟⎟⎟=(∂E∂a[2]⋅∂a[2]∂z[2]) δ 2 = ( (34) ∂ E ∂ a 1 [ 2 ] ⋅ ∂ a 1 [ 2 ] ∂ z 1 [ 2 ] (35) ∂ E ∂ a 2 [ 2 ] ⋅ ∂ a 2 [ 2 ] ∂ z 2 [ 2 ] ) = ( ∂ E ∂ a [ 2 ] ⋅ ∂ a [ 2 ] ∂ z [ 2 ] )

为啥这样写呢,一开始我也没明白,后来看到 ∂E∂a[2]1⋅∂a[2]1∂z[2]1 ∂ E ∂ a 1 [ 2 ] ⋅ ∂ a 1 [ 2 ] ∂ z 1 [ 2 ] 有好几次重复,且也便于梯度公式的书写。

  1. import numpy as np
  2. def sigmoid(x):
  3. return 1 / (1 + np.exp(-x))
  4. def sigmoidDerivationx(y):
  5. return y * (1 - y)
  6. if __name__ == '__main__':
  7. #初始化一些参数
  8. alpha = 0.5
  9. w1 = [[0.15, 0.20], [0.25, 0.30]] #Weight of input layer
  10. w2 = [[0.40, 0.45], [0.50, 0.55]]
  11. b1 = 0.35
  12. b2 = 0.60
  13. x = [0.05, 0.10]
  14. y = [0.01, 0.99]
  15. #前向传播
  16. z1 = np.dot(w1, x) + b1
  17. a1 = sigmoid(z1)
  18. z2 = np.dot(w2, a1) + b2
  19. a2 = sigmoid(z2)
  20. for n in range(10000):
  21. #反向传播 使用代价函数为C=1 / (2n) * sum[(y-a2)^2]
  22. #分为两次
  23. # 一次是最后一层对前面一层的错误
  24. delta2 = np.multiply(-(y-a2), np.multiply(a2, 1-a2))
  25. # for i in range(len(w2)):
  26. # print(w2[i] - alpha * delta2[i] * a1)
  27. #计算非最后一层的错误
  28. # print(delta2)
  29. delta1 = np.multiply(np.dot(w1, delta2), np.multiply(a1, 1-a1))
  30. # print(delta1)
  31. # for i in range(len(w1)):
  32. # print(w1[i] - alpha * delta1[i] * np.array(x))
  33. #更新权重
  34. for i in range(len(w2)):
  35. w2[i] = w2[i] - alpha * delta2[i] * a1
  36. for i in range(len(w1)):
  37. w1[i] - alpha * delta1[i] * np.array(x)
  38. #继续前向传播,算出误差值
  39. z1 = np.dot(w1, x) + b1
  40. a1 = sigmoid(z1)
  41. z2 = np.dot(w2, a1) + b2
  42. a2 = sigmoid(z2)
  43. print(str(n) + " result:" + str(a2[0]) + ", result:" +str(a2[1]))
  44. # print(str(n) + " error1:" + str(y[0] - a2[0]) + ", error2:" +str(y[1] - a2[1]))

可以看到,用向量来表示的话代码就简短了非常多。但是用了向量化等的方法,如果不太熟,去看吴恩达深度学习的第一部分,再返过来看就能懂了。
下面,来看一个例子。用神经网络实现XOR(01=1,10=1,00=0,11=0)。我们都知道感知机是没法实现异或的,原因是线性不可分。
接下里的这个例子,我是用2个输入结点,3个隐层结点,1个输出结点来实现的。
异或
让我们以一个输入为例。
前向传播:

⎛⎝⎜⎜⎜w[1]11w[1]12w[1]21w[1]22w[1]31w[1]32⎞⎠⎟⎟⎟⋅(x1x2)=⎛⎝⎜⎜⎜z[1]1z[1]2z[1]3⎞⎠⎟⎟⎟ ( w 11 [ 1 ] w 12 [ 1 ] w 21 [ 1 ] w 22 [ 1 ] w 31 [ 1 ] w 32 [ 1 ] ) ⋅ ( x 1 x 2 ) = ( z 1 [ 1 ] z 2 [ 1 ] z 3 [ 1 ] )

⎛⎝⎜⎜⎜a[1]1a[1]2a[1]3⎞⎠⎟⎟⎟=⎛⎝⎜⎜⎜σ(z[1]1)σ(z[1]2)σ(z[1]3)⎞⎠⎟⎟⎟ ( a 1 [ 1 ] a 2 [ 1 ] a 3 [ 1 ] ) = ( σ ( z 1 [ 1 ] ) σ ( z 2 [ 1 ] ) σ ( z 3 [ 1 ] ) )

(w[2]11w[2]12w[2]13)⎛⎝⎜⎜⎜a[1]1a[1]2a[1]3⎞⎠⎟⎟⎟=(z[2]1) ( w 11 [ 2 ] w 12 [ 2 ] w 13 [ 2 ] ) ( a 1 [ 1 ] a 2 [ 1 ] a 3 [ 1 ] ) = ( z 1 [ 2 ] )

(a[2]1)=(σ(z[2]1)) ( a 1 [ 2 ] ) = ( σ ( z 1 [ 2 ] ) )

反向传播:
主要是有2个公式比较重要

δL=∇aC⊙σ′(aL)=−(y−aL)⊙(aL(1−aL)) δ L = ∇ a C ⊙ σ ′ ( a L ) = − ( y − a L ) ⊙ ( a L ( 1 − a L ) )

原理同上

δl=((wl+1)T)⊙σ′(al) δ l = ( ( w l + 1 ) T ) ⊙ σ ′ ( a l )

wl=wl−ηδl(al−1)T w l = w l − η δ l ( a l − 1 ) T

这次省略了偏导,代码如下

  1. import numpy as np
  2. # sigmoid function
  3. def sigmoid(x):
  4. return 1 / (1 + np.exp(-x))
  5. def sigmoidDerivationx(y):
  6. return y * (1 - y)
  7. if __name__ == '__main__':
  8. alpha = 1
  9. input_dim = 2
  10. hidden_dim = 3
  11. output_dim = 1
  12. synapse_0 = 2 * np.random.random((hidden_dim, input_dim)) - 1 #(2, 3)
  13. # synapse_0 = np.ones((hidden_dim, input_dim)) * 0.5
  14. synapse_1 = 2 * np.random.random((output_dim, hidden_dim)) - 1 #(2, 2)
  15. # synapse_1 = np.ones((output_dim, hidden_dim)) * 0.5
  16. x = np.array([[0, 1], [1, 0], [0, 0], [1, 1]]).T #(2, 4)
  17. # x = np.array([[0, 1]]).T #(3, 1)
  18. y = np.array([[1], [1], [0], [0]]).T #(1, 4)
  19. # y = np.array([[1]]).T #(2, 1)
  20. for i in range(2000000):
  21. z1 = np.dot(synapse_0, x) #(3, 4)
  22. a1 = sigmoid(z1) #(3, 4)
  23. z2 = np.dot(synapse_1, a1) #(1, 4)
  24. a2 = sigmoid(z2) #(1, 4)
  25. error = -(y - a2) #(1, 4)
  26. delta2 = np.multiply(-(y - a2) / x.shape[1], sigmoidDerivationx(a2)) #(1, 4)
  27. delta1 = np.multiply(np.dot(synapse_1.T, delta2), sigmoidDerivationx(a1)) #(3, 4)
  28. synapse_1 = synapse_1 - alpha * np.dot(delta2, a1.T) #(1, 3)
  29. synapse_0 = synapse_0 - alpha * np.dot(delta1, x.T) #(3, 2)
  30. print(str(i) + ":", end=' ')
  31. print(a2)

发表评论

表情:
评论列表 (有 0 条评论,287人围观)

还没有评论,来说两句吧...

相关阅读