前言:
文章以Andrew Ng 的 deeplearning.ai 視頻課程為主線,記錄Programming Assignments 的實現過程。相對于斯坦福的CS231n課程,Andrew的視頻課程更加簡單易懂,適合深度學習的入門者系統學習!
這次的作業主要針對的是如何系統構建多層神經網絡,如何實現模塊化編程,從而實現程序的復用,提高使用效率,具有很高的參考價值。
1.1 Outline of the Assignment
首先看一下整個神經網絡的結構,涉及到前向傳播和反向傳播,對神經網絡的訓練過程有一個直觀的認識:
1.2 Initialize L-layer Neural Network
下面是初始化多層神經網絡參數的代碼實現:
def initialize_parameters_deep(layer_dims):
np.random.seed(3)
parameters = {}
L = len(layer_dims)? ? ? ? ?
for l in range(1, L):
parameters['W' + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
parameters['b' + str(l)] = np.zeros((layer_dims[l],1))
assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))
assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))
return parameters
1.3 Forward propagation module
前向過程包括:
1.3.1 LINEAR
1.3.2 LINEAR -> ACTIVATION where ACTIVATION will be either ReLU or Sigmoid.
1.3.3[LINEAR -> RELU]××(L-1) -> LINEAR -> SIGMOID (whole model)
三個部分的代碼如下所示:
1.3.1
def linear_forward(A, W, b):
Z = np.dot(W,A)+b
assert(Z.shape == (W.shape[0], A.shape[1]))
cache = (A, W, b)
return Z, cache
1.3.2
def linear_activation_forward(A_prev, W, b, activation):
if activation == "sigmoid":
Z, linear_cache = linear_forward(A_prev, W, b)
A, activation_cache = sigmoid(Z)
elif activation == "relu":
# Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
Z, linear_cache = linear_forward(A_prev, W, b)
A, activation_cache = relu(Z)
assert (A.shape == (W.shape[0], A_prev.shape[1]))
cache = (linear_cache, activation_cache)
return A, cache
1.3.3
def L_model_forward(X, parameters):
caches = []
A = X
L = len(parameters) // 2? ? ? ? ? ? ? ?
for l in range(1, L):
A_prev = A
A, cache = linear_activation_forward(A_prev, parameters["W"+str(l)], parameters["b"+str(l)], "relu")
caches.append(cache)
AL, cache = linear_activation_forward(A, parameters["W"+str(L)], parameters["b"+str(L)], "sigmoid")
caches.append(cache)
assert(AL.shape == (1,X.shape[1]))
return AL, caches
1.4 Cost Function
def compute_cost(AL, Y):
m = Y.shape[1]
cost = -1/m*np.sum(Y*np.log(AL)+(1-Y)*np.log(1-AL))
cost = np.squeeze(cost)? ? ? # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
assert(cost.shape == ())
return cost
1.5 Backward propagation module
后向過程包括3個部分:Linear Backward,Linear-Activation backward 和 L-Model Backward
1.5.1 Linear Backward
def linear_backward(dZ, cache):
A_prev, W, b = cache
m = A_prev.shape[1]
dW = 1/m*np.dot(dZ,A_prev.T)
db = 1/m*np.sum(dZ,axis=1,keepdims=True)
dA_prev = np.dot(W.T,dZ)
assert (dA_prev.shape == A_prev.shape)
assert (dW.shape == W.shape)
assert (db.shape == b.shape)
return dA_prev, dW, db
1.5.2 Linear-Activation backward
計算公式為:
def linear_activation_backward(dA, cache, activation):
linear_cache, activation_cache = cache
if activation == "relu":
dZ = relu_backward(dA, activation_cache)
dA_prev, dW, db = linear_backward(dZ, linear_cache)
elif activation == "sigmoid":
dZ = sigmoid_backward(dA, activation_cache)
dA_prev, dW, db = linear_backward(dZ, linear_cache)
return dA_prev, dW, db
1.5.3 L-Model Backward
def L_model_backward(AL, Y, caches):
grads = {}
L = len(caches)
m = AL.shape[1]
Y = Y.reshape(AL.shape)
dAL = np.divide(1-Y,1-AL)-np.divide(Y,AL)
current_cache = caches[L-1]
grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, "sigmoid")
for l in reversed(range(L-1)):
current_cache = caches[l]
dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA"+str(l+2)], current_cache, "relu")
grads["dA" + str(l + 1)] = dA_prev_temp
grads["dW" + str(l + 1)] = dW_temp
grads["db" + str(l + 1)] = db_temp
return grads
1.6 Update Parameters
最后是update weight和bias
def update_parameters(parameters, grads, learning_rate):
L = len(parameters) // 2 # number of layers in the neural network
for l in range(L):
parameters["W" + str(l+1)] = parameters["W"+str(l+1)]-learning_rate*grads["dW"+str(l+1)]
parameters["b" + str(l+1)] = parameters["b"+str(l+1)]-learning_rate*grads["db"+str(l+1)]
return parameters
最后附上我作業的得分,表示我程序沒有問題,如果覺得我的文章對您有用,請隨意打賞,我將持續更新Deeplearning.ai的作業!