介紹PyTorch的簡單示例
翻譯自Github地址 jcjohnson/pytorch-examples
增加了個人的理解和注釋,比如反向傳播畫了一張圖來更清晰的表明反向傳播的過程。
將英文翻譯成了中文更方便閱讀。
特別喜歡這個倉庫,從numpy介紹到了tensor,然后從手動實現(xiàn)反向傳播介紹到了如何利用Pytorch提供的自動微分來進行反向傳播,從自己動手實現(xiàn)模型,損失函數(shù),權(quán)重更新到如何利用已有的包自定義模型,調(diào)用損失函數(shù)和優(yōu)化器。
整個倉庫看完應(yīng)該對pytorch的原理掌握的比較透徹了。
Simple examples to introduce PyTorch
該倉庫通過獨立的示例介紹了PyTorch的基本概念。
PyTorch的核心是提供兩個主要功能:
- n維Tensor,類似于numpy,但可以在GPU上運行。
- 自動微分,用于構(gòu)建和訓(xùn)練神經(jīng)網(wǎng)絡(luò)
我們將使用完全連接的ReLU網(wǎng)絡(luò)作為我們的運行示例。該網(wǎng)絡(luò)將具有單個隱藏層,并且將通過最小化網(wǎng)絡(luò)輸出與真實輸出之間的歐幾里德距離來進行梯度下降訓(xùn)練,以適應(yīng)隨機數(shù)據(jù)。
注意:這些示例已針對PyTorch 0.4進行了更新,對核心PyTorch API進行了幾項重大更改。最值得注意的是,在0.4之前,必須將Tensor包裹在Variable對象中才能使用autograd。現(xiàn)在,此功能已直接添加到張量中,并且不建議使用變量。
目錄
- Warm-up: numpy
- PyTorch: Tensors
- PyTorch: Autograd
- PyTorch: Defining new autograd functions
- TensorFlow: Static Graphs
- PyTorch: nn
- PyTorch: optim
- PyTorch: Custom nn Modules
- PyTorch: Control Flow and Weight Sharing
1. Warm-up: numpy
在介紹PyTorch之前,我們將首先使用numpy實現(xiàn)網(wǎng)絡(luò)。
Numpy提供一個n維數(shù)組array,以及許多用于操作這些array的函數(shù)。Numpy是科學(xué)計算的通用框架;它對計算圖形、深度學(xué)習(xí)或梯度一無所知。但是,我們可以使用numpy操作通過網(wǎng)絡(luò)手動實現(xiàn)向前和向后傳遞 forward and backward passes,從而輕松地使用numpy使兩層網(wǎng)絡(luò)適合隨機數(shù)據(jù):
import numpy as np
"""
A fully-connected ReLU network with one hidden layer and no biases, trained to
predict y from x using Euclidean error.
一個全連接網(wǎng)絡(luò)模型,激活函數(shù)是ReLU,具有一個隱藏層且沒有偏差,經(jīng)過訓(xùn)練可以使用歐幾里得誤差根據(jù)x來預(yù)測y。
This implementation uses numpy to manually compute the forward pass, loss, and
backward pass.
該程序?qū)崿F(xiàn)了使用numpy手動計算前向傳播,損失和后向傳播。
A numpy array is a generic n-dimensional array; it does not know anything about
deep learning or gradients or computational graphs, and is just a way to perform
generic numeric computations.
numpy數(shù)組是通用的n維數(shù)組;它對深度學(xué)習(xí),梯度或計算圖一無所知,只是執(zhí)行通用數(shù)值計算的一種方法。
"""
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random input and output data
x = np.random.randn(N, D_in) # 輸入 (64,1000)
y = np.random.randn(N, D_out) # 輸出 (64,10)
# Randomly initialize weights
w1 = np.random.randn(D_in, H) # 輸入層-隱藏層 權(quán)重 (1000,100)
w2 = np.random.randn(H, D_out) # 隱藏層-輸出層 權(quán)重 (100,10)
learning_rate = 1e-6 # 學(xué)習(xí)率
for t in range(500):
# Forward pass: compute predicted y 前向傳播:計算預(yù)測的y
h = x.dot(w1) # 點乘 得到隱藏層 (64,100)
h_relu = np.maximum(h, 0) # 計算relu激活函數(shù)
# np.maximum(X, Y, out=None) X和Y相比取最大值
# np.max(a, axis=None, out=None, keepdims=False) 求序列的最值, axis:默認(rèn)為列向(也即 axis=0),axis = 1 時為行方向的最值
y_pred = h_relu.dot(w2) # 點乘 得到輸出層 (64,10)
# Compute and print loss
loss = np.square(y_pred - y).sum() # .sum()所有元素的總和
print(t, loss) # 目的就是使Loss越來越小
# Backprop to compute gradients of w1 and w2 with respect to loss
# 反向傳播的過程(難點),詳見下圖推導(dǎo)過程
grad_y_pred = 2.0 * (y_pred - y) # (64,10)
grad_w2 = h_relu.T.dot(grad_y_pred) # (64,100)^T dot (64,10) = (100,10)
grad_h_relu = grad_y_pred.dot(w2.T) # (64,100)
grad_h = grad_h_relu.copy() # 深拷貝 (64,100)
grad_h[h < 0] = 0 # Relu反向傳播處理過程
grad_w1 = x.T.dot(grad_h) # (1000,100)
# Update weights 更新權(quán)重
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
這是一個針對前面提到的反向傳播的推導(dǎo)圖
2. PyTorch: Tensors
Numpy是一個很棒的框架,但是它不能利用GPU來加速其數(shù)值計算。對于現(xiàn)代深度神經(jīng)網(wǎng)絡(luò),GPU通常可提供50倍或更高的加速比,因此不幸的是,僅憑numpy不足以實現(xiàn)現(xiàn)代深度學(xué)習(xí)。
在這里,我們介紹最基本的PyTorch概念:Tensor張量。PyTorch的張量在概念上與numpy數(shù)組相同:Tensor是n維數(shù)組,而PyTorch提供了許多在這些張量上運行的函數(shù)。您可能希望使用numpy執(zhí)行的任何計算也可以使用PyTorch的Tensors完成;您應(yīng)該將它們視為科學(xué)計算的通用工具。
但是,與numpy不同,PyTorch張量可以利用GPU加速其數(shù)字計算。要在GPU上運行PyTorch Tensor,請在構(gòu)造Tensor時使用device
參數(shù)將Tensor放置在GPU上。
在這里,我們使用PyTorch張量使兩層網(wǎng)絡(luò)適合隨機數(shù)據(jù)。像上面的numpy示例一樣,我們使用PyTorch張量上的操作來手動實現(xiàn)網(wǎng)絡(luò)的正向和反向傳遞:
import torch
"""
A fully-connected ReLU network with one hidden layer and no biases, trained to
predict y from x by minimizing squared Euclidean distance.
一個全連接網(wǎng)絡(luò)模型,激活函數(shù)是ReLU,具有一個隱藏層且沒有偏差,經(jīng)過訓(xùn)練可以使用歐幾里得誤差根據(jù)x來預(yù)測y。
This implementation uses PyTorch tensors to manually compute the forward pass,
loss, and backward pass.
該程序?qū)崿F(xiàn)使用PyTorch張量手動計算前向傳播,損失和后向傳播。
A PyTorch Tensor is basically the same as a numpy array: it does not know
anything about deep learning or computational graphs or gradients, and is just
a generic n-dimensional array to be used for arbitrary numeric computation.
PyTorch張量基本上與numpy數(shù)組相同:它對深度學(xué)習(xí),計算圖或梯度一無所知,只是用于任意數(shù)值計算的通用n維數(shù)組。
The biggest difference between a numpy array and a PyTorch Tensor is that
a PyTorch Tensor can run on either CPU or GPU. To run operations on the GPU,
just pass a different value to the `device` argument when constructing the
Tensor.
numpy數(shù)組和PyTorch張量之間的最大區(qū)別是PyTorch張量可以在CPU或GPU上運行。要在GPU上運行操作,只需在構(gòu)造Tensor時將不同的值傳遞給device參數(shù)即可。
"""
device = torch.device('cpu') # CPU環(huán)境
# device = torch.device('cuda') # Uncomment this to run on GPU GPU環(huán)境
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random input and output data
x = torch.randn(N, D_in, device=device) # 輸入 (64,1000)
y = torch.randn(N, D_out, device=device) # 輸出 (64,10)
# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device) # 輸入層-隱藏層 權(quán)重 (1000,100)
w2 = torch.randn(H, D_out, device=device) # 隱藏層-輸出層 權(quán)重 (100,10)
learning_rate = 1e-6 # 學(xué)習(xí)率
for t in range(500):
# Forward pass: compute predicted y 前向傳播:計算預(yù)測的y
h = x.mm(w1) # 點乘 得到隱藏層 (64,100)
# torch.mm()矩陣相乘
# torch.mul() 矩陣位相乘
h_relu = h.clamp(min=0) # 計算relu激活函數(shù)
# torch.clamp(input, min, max, out=None) → Tensor 將輸入input張量每個元素的夾緊到區(qū)間 [min,max][min,max],并返回結(jié)果到一個新張量。
y_pred = h_relu.mm(w2) # 點乘 得到輸出層 (64,10)
# Compute and print loss; loss is a scalar標(biāo)量, and is stored in a PyTorch Tensor
# of shape (); we can get its value as a Python number with loss.item().
loss = (y_pred - y).pow(2).sum() # .sum()所有元素的總和 torch.Size([])
print(t, loss.item()) # pytorch中的.item()用于將一個零維張量轉(zhuǎn)換成浮點數(shù)
# Backprop to compute gradients of w1 and w2 with respect to loss
# 反向傳播的過程(難點),具體過程同上,沒有變化
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
# torch.clone()和torch.copy()應(yīng)該沒什么區(qū)別
grad_h[h < 0] = 0
grad_w1 = x.t().mm(grad_h)
# Update weights using gradient descent 更新權(quán)重
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
3. PyTorch: Autograd
在以上示例中,我們必須手動實現(xiàn)神經(jīng)網(wǎng)絡(luò)的前向和后向傳遞。對于小型的兩層網(wǎng)絡(luò)而言,手動實施反向傳遞并不重要,但對于大型的復(fù)雜網(wǎng)絡(luò)而言,可以很快變得非常麻煩。
幸運的是,我們可以使用自動微分來自動計算神經(jīng)網(wǎng)絡(luò)中的反向通過。PyTorch中的autograd包完全提供了此函數(shù)。使用autograd時,網(wǎng)絡(luò)的正向傳遞將定義一個computational graph計算圖;圖中的節(jié)點為張量,邊為從輸入張量產(chǎn)生輸出張量的函數(shù)。然后通過該圖進行反向傳播,可以輕松計算梯度。
這聽起來很復(fù)雜,在實踐中非常簡單。如果我們要針對某個張量計算梯度,那么在構(gòu)造該張量時,我們需要設(shè)置require_grad=True
。該Tensor上的任何PyTorch操作都將導(dǎo)致構(gòu)建計算圖,從而使我們以后可以通過該圖執(zhí)行反向傳播。如果x
是張量為require_grad=True
的張量,那么在反向傳播之后,x.grad
將是另一個張量,它保持x
相對于某個標(biāo)量值的梯度。
有時您可能希望在對require_grad=True
的張量執(zhí)行某些操作時,阻止PyTorch構(gòu)建計算圖。例如,在訓(xùn)練神經(jīng)網(wǎng)絡(luò)時,我們通常不想在權(quán)重更新步驟中向后傳播。在這種情況下,我們可以使用torch.no_grad()
上下文管理器來防止構(gòu)建計算圖。
在這里,我們使用PyTorch張量和autograd來實現(xiàn)我們的兩層網(wǎng)絡(luò)。現(xiàn)在我們不再需要手動通過網(wǎng)絡(luò)實現(xiàn)反向傳遞:
import torch
"""
A fully-connected ReLU network with one hidden layer and no biases, trained to
predict y from x by minimizing squared Euclidean distance.
一個全連接網(wǎng)絡(luò)模型,激活函數(shù)是ReLU,具有一個隱藏層且沒有偏差,經(jīng)過訓(xùn)練可以使用歐幾里得誤差根據(jù)x來預(yù)測y。
This implementation computes the forward pass using operations on PyTorch
Tensors, and uses PyTorch autograd to compute gradients.
該程序?qū)崿F(xiàn)使用PyTorch張量上的運算來計算前向傳播,并使用PyTorch autograd來計算梯度。
When we create a PyTorch Tensor with requires_grad=True, then operations
involving that Tensor will not just compute values; they will also build up
a computational graph in the background, allowing us to easily backpropagate
through the graph to compute gradients of some downstream (scalar) loss with
respect to a Tensor. Concretely if x is a Tensor with x.requires_grad == True
then after backpropagation x.grad will be another Tensor holding the gradient
of x with respect to some scalar value.
當(dāng)我們使用require_grad = True創(chuàng)建一個PyTorch Tensor時,涉及該Tensor的操作將不僅僅計算值;
他們還將在后臺建立一個計算圖,使我們能夠輕松地在該圖中反向傳播,以計算相對于張量的某些下游(標(biāo)量)
損耗的梯度。具體來說,如果x是具有x.requires_grad == True的張量,那么在反向傳播之后x.grad將是
另一個Tensor,它保持x相對于某個標(biāo)量值的梯度。
"""
device = torch.device('cpu') # CPU環(huán)境
# device = torch.device('cuda') # Uncomment this to run on GPU GPU環(huán)境
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs
x = torch.randn(N, D_in, device=device) # 輸入 (64,1000)
y = torch.randn(N, D_out, device=device) # 輸出 (64,10)
# Create random Tensors for weights; setting requires_grad=True means that we
# want to compute gradients for these Tensors during the backward pass.
# 設(shè)置require_grad = True意味著我們要在反向傳播期間為這些張量計算梯度。
w1 = torch.randn(D_in, H, device=device, requires_grad=True) # 輸入層-隱藏層 權(quán)重 (1000,100)
w2 = torch.randn(H, D_out, device=device, requires_grad=True) # 隱藏層-輸出層 權(quán)重 (100,10)
learning_rate = 1e-6 # 學(xué)習(xí)率
for t in range(500):
# Forward pass: compute predicted y using operations on Tensors. Since w1 and
# w2 have requires_grad=True, operations involving these Tensors will cause
# PyTorch to build a computational graph, allowing automatic computation of
# gradients. Since we are no longer implementing the backward pass by hand we
# don't need to keep references to intermediate values.
# 前向傳播:使用張量上的運算來計算預(yù)測的y。由于w1和w2具有require_grad = True,
# 涉及這些張量的操作將使PyTorch構(gòu)建計算圖,從而允許自動計算梯度。
# 由于我們不再手動實現(xiàn)反向傳遞,因此不需要保留對中間值的引用。
y_pred = x.mm(w1).clamp(min=0).mm(w2) # (64,10)
# Compute and print loss. Loss is a Tensor of shape (), and loss.item()
# is a Python number giving its value.
loss = (y_pred - y).pow(2).sum()
print(t, loss.item()) # 損失函數(shù)
# Use autograd to compute the backward pass. This call will compute the
# gradient of loss with respect to all Tensors with requires_grad=True.
# After this call w1.grad and w2.grad will be Tensors holding the gradient
# of the loss with respect to w1 and w2 respectively.
# 使用autograd計算反向傳遞。該調(diào)用將計算所有帶有require_grad = True的張量的損失梯度。
# 在此調(diào)用之后,w1.grad和w2.grad將成為張量,分別保持損失相對于w1和w2的梯度。
loss.backward()
# 也就是說,調(diào)用loss.backward()實際上只是產(chǎn)生了所有需要計算梯度的標(biāo)量,記為w1.grad和w2.grad
# Update weights using gradient descent. For this step we just want to mutate
# the values of w1 and w2 in-place; we don't want to build up a computational
# graph for the update steps, so we use the torch.no_grad() context manager
# to prevent PyTorch from building a computational graph for the updates
# 使用梯度下降更新權(quán)重。對于這一步,我們只想就地改變w1和w2的值;我們不想為更新步驟建立計算圖,
# 因此我們使用torch.no_grad()上下文管理器來防止PyTorch為更新建立計算圖
with torch.no_grad():
w1 -= learning_rate * w1.grad
w2 -= learning_rate * w2.grad
# Manually zero the gradients after running the backward pass
# 向后傳播后手動將梯度歸零
w1.grad.zero_()
w2.grad.zero_()
4. PyTorch: Defining new autograd functions
每個原始的autograd運算符實際上都是在Tensor上運行的兩個函數(shù)。
forward函數(shù)從輸入張量計算輸出張量。
backward函數(shù)接收輸出張量相對于某個標(biāo)量值的梯度,并計算輸入張量相對于相同標(biāo)量值的梯度。
在PyTorch中,我們可以通過定義torch.autograd.Function
的子類并實現(xiàn)forward
和backward
函數(shù)來輕松定義自己的autograd運算符。然后,我們可以通過構(gòu)造實例并像調(diào)用函數(shù)一樣調(diào)用新的autograd運算符,并傳遞包含輸入數(shù)據(jù)的張量。
在此示例中,我們定義了自己的自定義autograd函數(shù)來執(zhí)行ReLU非線性,并使用它來實現(xiàn)我們的兩層網(wǎng)絡(luò):
import torch
"""
A fully-connected ReLU network with one hidden layer and no biases, trained to
predict y from x by minimizing squared Euclidean distance.
一個全連接網(wǎng)絡(luò)模型,激活函數(shù)是ReLU,具有一個隱藏層且沒有偏差,經(jīng)過訓(xùn)練可以使用歐幾里得誤差根據(jù)x來預(yù)測y。
This implementation computes the forward pass using operations on PyTorch
Tensors, and uses PyTorch autograd to compute gradients.
該代碼實現(xiàn)使用PyTorch張量上的運算來計算前向傳播,并使用PyTorch autograd來計算梯度。
In this implementation we implement our own custom autograd function to perform
the ReLU function.
在該代碼中,我們實現(xiàn)了自己的自定義autograd函數(shù)來執(zhí)行ReLU函數(shù)。
"""
# 自定義類并繼承 torch.autograd.Function
class MyReLU(torch.autograd.Function):
"""
We can implement our own custom autograd Functions by subclassing
torch.autograd.Function and implementing the forward and backward passes
which operate on Tensors.
我們可以通過繼承torch.autograd.Function并實現(xiàn)在Tensor上
運行的前向傳播和后向傳播來實現(xiàn)自己的自定義autograd函數(shù)。
"""
@staticmethod
def forward(ctx, x): # 傳入的x是Tensor,ctx是context object
"""
In the forward pass we receive a context object and a Tensor containing the
input; we must return a Tensor containing the output, and we can use the
context object to cache objects for use in the backward pass.
在前向傳遞中,我們收到一個上下文對象和一個包含輸入的張量。
我們必須返回一個包含輸出的Tensor,并且我們可以使用上下文對象來緩存對象以用于向后傳遞。
"""
ctx.save_for_backward(x) # 將輸入保存起來,在backward時使用
return x.clamp(min=0) # 返回relu處理后的輸出,返回的是Tensor
@staticmethod
def backward(ctx, grad_output):
"""
In the backward pass we receive the context object and a Tensor containing
the gradient of the loss with respect to the output produced during the
forward pass. We can retrieve cached data from the context object, and must
compute and return the gradient of the loss with respect to the input to the
forward function.
在后向傳播中,我們接收上下文對象和張量,其中包含相對于前向傳播期間產(chǎn)生的輸出的損耗梯度。
我們可以從上下文對象中檢索緩存的數(shù)據(jù),并且必須計算損失的梯度并將其相對于輸入返回到前向函數(shù)。
"""
x, = ctx.saved_tensors # 得到forward保存的Tensor
grad_x = grad_output.clone() # 深拷貝
grad_x[x < 0] = 0 # 計算Relu的微分的方式
# 類比 grad_output相當(dāng)于grad_h_relu,x相當(dāng)于h,grad_x相當(dāng)于grad_h
# grad_h_relu = grad_y_pred.mm(w2.t())
# grad_h = grad_h_relu.clone()
# grad_h[h < 0] = 0
return grad_x
device = torch.device('cpu') # CPU環(huán)境
# device = torch.device('cuda') # Uncomment this to run on GPU GPU環(huán)境
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and output
x = torch.randn(N, D_in, device=device) # 輸入 (64,1000)
y = torch.randn(N, D_out, device=device) # 輸出 (64,10)
# Create random Tensors for weights.
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
# 輸入層-隱藏層 權(quán)重 (1000,100)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)
# 隱藏層-輸出層 權(quán)重 (100,10)
learning_rate = 1e-6 # 學(xué)習(xí)率
for t in range(500):
# Forward pass: compute predicted y using operations on Tensors; we call our
# custom ReLU implementation using the MyReLU.apply function
# 前向傳播:使用張量上的運算來計算預(yù)測的y;
# 我們使用MyReLU.apply函數(shù)調(diào)用自定義ReLU實現(xiàn)
y_pred = MyReLU.apply(x.mm(w1)).mm(w2) # (64,10) 利用自定義的MyReLU類
# 更新前代碼 y_pred = x.mm(w1).clamp(min=0).mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum()
print(t, loss.item()) # 損失函數(shù)
# Use autograd to compute the backward pass.
loss.backward() # 反向傳播
with torch.no_grad():
# Update weights using gradient descent 更新權(quán)重
w1 -= learning_rate * w1.grad
w2 -= learning_rate * w2.grad
# Manually zero the gradients after running the backward pass
# 向后傳播后手動將梯度歸零
w1.grad.zero_()
w2.grad.zero_()
5. TensorFlow: Static Graphs
PyTorch autograd看起來很像TensorFlow:在兩個框架中我們都定義了一個計算圖,并使用自動微分來計算梯度。兩者之間的最大區(qū)別是TensorFlow的計算圖是靜態(tài)的,而PyTorch使用動態(tài)計算圖。
在TensorFlow中,我們一次定義了計算圖,然后一遍又一遍地執(zhí)行相同的圖,可能將不同的輸入數(shù)據(jù)提供給該圖。在PyTorch中,每個前向傳播都定義一個新的計算圖。
靜態(tài)圖很不錯,因為您可以預(yù)先優(yōu)化圖。例如,框架可能決定融合某些圖操作以提高效率,或者想出一種在多個GPU或許多機器之間分布圖形的策略。如果您要一遍又一遍地重用同一張圖,那么隨著一遍一遍地重復(fù)運行同一張圖,可以分?jǐn)傔@種潛在的昂貴的前期優(yōu)化。
靜態(tài)圖和動態(tài)圖不同的一個方面是控制流。對于某些模型,我們可能希望對每個數(shù)據(jù)點執(zhí)行不同的計算。例如,對于每個數(shù)據(jù)點,循環(huán)網(wǎng)絡(luò)可能會展開不同數(shù)量的時間步長;此展開可以實現(xiàn)為循環(huán)。對于靜態(tài)圖,循環(huán)構(gòu)造必須是圖的一部分;因此,TensorFlow提供了tf.scan
之類的運算符來將循環(huán)嵌入圖形中。使用動態(tài)圖,情況更簡單:由于我們?yōu)槊總€示例動態(tài)生成圖,因此可以使用常規(guī)命令流控制來執(zhí)行針對每個輸入而不同的計算。
與上面的PyTorch autograd示例形成對比,這里我們使用TensorFlow來擬合一個簡單的兩層網(wǎng)絡(luò):
import tensorflow as tf
import numpy as np
"""
A fully-connected ReLU network with one hidden layer and no biases, trained to
predict y from x by minimizing squared Euclidean distance.
一個全連接網(wǎng)絡(luò)模型,激活函數(shù)是ReLU,具有一個隱藏層且沒有偏差,經(jīng)過訓(xùn)練可以使用歐幾里得誤差根據(jù)x來預(yù)測y。
This implementation uses basic TensorFlow operations to set up a computational
graph, then executes the graph many times to actually train the network.
此實現(xiàn)使用基本的TensorFlow操作來設(shè)置計算圖,然后多次執(zhí)行該圖以實際訓(xùn)練網(wǎng)絡(luò)。
One of the main differences between TensorFlow and PyTorch is that TensorFlow
uses static computational graphs while PyTorch uses dynamic computational
graphs.
TensorFlow和PyTorch之間的主要區(qū)別之一是TensorFlow使用靜態(tài)計算圖,而PyTorch使用動態(tài)計算圖。
In TensorFlow we first set up the computational graph, then execute the same
graph many times.
在TensorFlow中,我們首先設(shè)置計算圖,然后多次執(zhí)行同一圖。
"""
# First we set up the computational graph:
# 首先,我們建立計算圖:
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create placeholders for the input and target data; these will be filled
# with real data when we execute the graph.
x = tf.placeholder(tf.float32, shape=(None, D_in)) # 輸入 (64,1000)
y = tf.placeholder(tf.float32, shape=(None, D_out)) # 輸出 (64,10)
# tf.placeholder() 占位符,主要為真實輸入數(shù)據(jù)和輸出標(biāo)簽的輸入,只會分配必要的內(nèi)存,
# 等建立session,在會話中,運行模型的時候通過feed_dict()函數(shù)向占位符喂入數(shù)據(jù)。
# 對比pytorch的語法規(guī)則
# x = torch.randn(N, D_in, device=device)
# y = torch.randn(N, D_out, device=device)
# Create Variables for the weights and initialize them with random data.
# A TensorFlow Variable persists its value across executions of the graph.
w1 = tf.Variable(tf.random_normal((D_in, H))) # 輸入層-隱藏層 權(quán)重 (1000,100)
w2 = tf.Variable(tf.random_normal((H, D_out))) # 隱藏層-輸出層 權(quán)重 (100,10)
# tf.Variable()主要用于定義weights bias等可訓(xùn)練會改變的變量,必須指定初始值。
# tf.constant()創(chuàng)建一個常量。
# 對比pytorch的語法規(guī)則
# w1 = torch.randn(D_in, H, device=device, requires_grad=True)
# w2 = torch.randn(H, D_out, device=device, requires_grad=True)
# Forward pass: Compute the predicted y using operations on TensorFlow Tensors.
# 前向傳播:使用TensorFlow張量上的運算來計算預(yù)測的y。
# Note that this code does not actually perform any numeric operations; it
# merely sets up the computational graph that we will later execute.
# 請注意,此代碼實際上并不執(zhí)行任何數(shù)字運算。它只是建立了我們稍后將執(zhí)行的計算圖。
h = tf.matmul(x, w1) # 點乘 得到隱藏層 (64,100)
h_relu = tf.maximum(h, tf.zeros(1)) # 計算relu激活函數(shù)
y_pred = tf.matmul(h_relu, w2) # 點乘 得到輸出層 (64,10)
# 對比pytorch的語法規(guī)則
# h = x.mm(w1)
# h_relu = h.clamp(min=0)
# y_pred = h_relu.mm(w2)
# Compute loss using operations on TensorFlow Tensors 計算損失
loss = tf.reduce_sum((y - y_pred) ** 2.0)
# 對比pytorch的語法規(guī)則
# loss = (y_pred - y).pow(2).sum()
# Compute gradient of the loss with respect to w1 and w2. 計算梯度
grad_w1, grad_w2 = tf.gradients(loss, [w1, w2])
# 對比pytorch的語法規(guī)則
# loss.backward()
# Update the weights using gradient descent. To actually update the weights
# we need to evaluate new_w1 and new_w2 when executing the graph. Note that
# in TensorFlow the the act of updating the value of the weights is part of
# the computational graph; in PyTorch this happens outside the computational
# graph.
# 使用梯度下降更新權(quán)重。要實際更新權(quán)重,我們需要在執(zhí)行圖形時評估new_w1和new_w2。
# 請注意,在TensorFlow中,權(quán)重值的更新操作是計算圖的一部分;在PyTorch中,這發(fā)生在計算圖之外。
learning_rate = 1e-6 # 學(xué)習(xí)率
new_w1 = w1.assign(w1 - learning_rate * grad_w1)
new_w2 = w2.assign(w2 - learning_rate * grad_w2)
# 對比pytorch的語法規(guī)則
# with torch.no_grad():
# w1 -= learning_rate * w1.grad
# w2 -= learning_rate * w2.grad
# w1.grad.zero_()
# w2.grad.zero_()
# Now we have built our computational graph, so we enter a TensorFlow session to
# actually execute the graph.
# 現(xiàn)在我們已經(jīng)建立了計算圖,因此我們進入一個TensorFlow會話來實際執(zhí)行該圖。
with tf.Session() as sess: # 執(zhí)行Session()
# Run the graph once to initialize the Variables w1 and w2.
# 運行一次圖以初始化變量w1和w2。
sess.run(tf.global_variables_initializer())
# Create numpy arrays holding the actual data for the inputs x and targets y
# 創(chuàng)建numpy數(shù)組來保存輸入x和目標(biāo)y的實際數(shù)據(jù)
x_value = np.random.randn(N, D_in)
y_value = np.random.randn(N, D_out)
for _ in range(500):
# Execute the graph many times. Each time it executes we want to bind
# x_value to x and y_value to y, specified with the feed_dict argument.
# Each time we execute the graph we want to compute the values for loss,
# new_w1, and new_w2; the values of these Tensors are returned as numpy
# arrays.
# 執(zhí)行多次圖。每次執(zhí)行時,我們都希望將x_value綁定到x上,并將y_value綁定到y(tǒng)上
# (由feed_dict參數(shù)指定)。每次執(zhí)行該圖時,我們都要計算損失值new_w1和new_w2;
# 這些張量的值以numpy數(shù)組形式返回。
loss_value, _, _ = sess.run([loss, new_w1, new_w2],
feed_dict={x: x_value, y: y_value})
print(loss_value)
6. PyTorch: nn
計算圖和autograd是定義復(fù)雜運算符并自動采用導(dǎo)數(shù)的非常強大的范例。但是對于大型神經(jīng)網(wǎng)絡(luò),原始的autograd可能會有點太低級了。
在構(gòu)建神經(jīng)網(wǎng)絡(luò)時,我們經(jīng)常考慮將計算分為幾層,其中一些具有可學(xué)習(xí)的參數(shù),這些參數(shù)將在學(xué)習(xí)過程中進行優(yōu)化。
在TensorFlow中,像Keras,TensorFlow-Slim和TFLearn這樣的軟件包在原始計算圖上提供了更高級別的抽象,這些抽象對構(gòu)建神經(jīng)網(wǎng)絡(luò)很有用。
在PyTorch中,nn
包可達(dá)到相同的目的。nn
包定義了一組模塊,這些模塊大致等效于神經(jīng)網(wǎng)絡(luò)層。模塊接收輸入張量并計算輸出張量,但也可以保持內(nèi)部狀態(tài),例如包含可學(xué)習(xí)參數(shù)的張量。nn
包還定義了一組有用的損失函數(shù),這些函數(shù)通常在訓(xùn)練神經(jīng)網(wǎng)絡(luò)時使用。
在此示例中,我們使用nn
包來實現(xiàn)我們的兩層網(wǎng)絡(luò):
import torch
"""
A fully-connected ReLU network with one hidden layer, trained to predict y from x
by minimizing squared Euclidean distance.
一個全連接網(wǎng)絡(luò)模型,激活函數(shù)是ReLU,具有一個隱藏層且沒有偏差,經(jīng)過訓(xùn)練可以使用歐幾里得誤差根據(jù)x來預(yù)測y。
This implementation uses the nn package from PyTorch to build the network.
PyTorch autograd makes it easy to define computational graphs and take gradients,
but raw autograd can be a bit too low-level for defining complex neural networks;
this is where the nn package can help. The nn package defines a set of Modules,
which you can think of as a neural network layer that has produces output from
input and may have some trainable weights or other state.
這個實現(xiàn)使用來自PyTorch的nn包來構(gòu)建網(wǎng)絡(luò)。PyTorch autograd使得定義計算圖和獲取梯度變得容易,
但是原始的autograd對于定義復(fù)雜的神經(jīng)網(wǎng)絡(luò)來說可能太低級了。這是nn軟件包可以提供幫助的地方。
nn包定義了一組模塊,您可以將其視為神經(jīng)網(wǎng)絡(luò)層,該神經(jīng)網(wǎng)絡(luò)層從輸入產(chǎn)生輸出并且可能具有一些
可訓(xùn)練的權(quán)重或其他狀態(tài)。
"""
device = torch.device('cpu') # CPU環(huán)境
# device = torch.device('cuda') # Uncomment this to run on GPU GPU環(huán)境
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in, device=device) # 輸入 (64,1000)
y = torch.randn(N, D_out, device=device) # 輸出 (64,10)
# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
# After constructing the model we use the .to() method to move it to the
# desired device.
# 使用nn包將我們的模型定義為一系列圖層。
# nn.Sequential是一個包含其他模塊的模塊,應(yīng)用這些內(nèi)置模塊可以直接得到其輸出。
# 每個線性模塊都使用線性函數(shù)來計算輸入的輸出,并保留內(nèi)部張量用于其權(quán)重和偏差。
# 構(gòu)建模型后,我們使用.to()方法將其移動到所需的設(shè)備。
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H), # 隱藏層 (64,100)
torch.nn.ReLU(), # 計算relu激活函數(shù) (64,100)
torch.nn.Linear(H, D_out), # 輸出層 (64,10)
).to(device)
# 相當(dāng)于取代了這三行代碼
# h = x.mm(w1)
# h_relu = h.clamp(min=0)
# y_pred = h_relu.mm(w2)
# 寫在for循環(huán)外面,在for循環(huán)里面只需要調(diào)用模型就可以了
# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function. Setting
# reduction='sum' means that we are computing the *sum* of squared errors rather
# than the mean; this is for consistency with the examples above where we
# manually compute the loss, but in practice it is more common to use mean
# squared error as a loss by setting reduction='elementwise_mean'.
# nn包還包含流行的損失函數(shù)的定義;在這種情況下,我們將使用均方誤差(MSE)作為損失函數(shù)。
# 設(shè)置reduction ='sum'意味著我們正在計算平方誤差的* sum *而不是均值;
# 這是為了與上面的示例(我們手動計算損失)保持一致,
# 但是在實踐中,更常見的是通過設(shè)置reducer ='elementwise_mean'將均方誤差用作損失。
loss_fn = torch.nn.MSELoss(reduction='sum')
# 相當(dāng)于取代了 loss = (y_pred - y).pow(2).sum()
# 寫在for循環(huán)外面,在for循環(huán)里面直接調(diào)用損失函數(shù)就可以了
learning_rate = 1e-4 # 學(xué)習(xí)率
for t in range(500):
# Forward pass: compute predicted y by passing x to the model. Module objects
# override the __call__ operator so you can call them like functions. When
# doing so you pass a Tensor of input data to the Module and it produces
# a Tensor of output data.
# 前向傳播:通過將x傳遞給模型來計算預(yù)測的y。模塊對象會覆蓋__call__運算符,
# 因此您可以像調(diào)用函數(shù)一樣調(diào)用它們。這樣做時,您將輸入數(shù)據(jù)的張量傳遞給模塊,
# 它會產(chǎn)生輸出數(shù)據(jù)的張量。
y_pred = model(x) # 直接調(diào)用
# Compute and print loss. We pass Tensors containing the predicted and true
# values of y, and the loss function returns a Tensor containing the loss.
# 計算和打印損失。我們傳遞包含y的預(yù)測值和真實值的Tensor,損失函數(shù)返回包含損失的Tensor。
loss = loss_fn(y_pred, y)
print(t, loss.item())
# Zero the gradients before running the backward pass. 梯度歸零
model.zero_grad()
# 替換原來的兩行代碼
# w1.grad.zero_()
# w2.grad.zero_()
# Backward pass: compute gradient of the loss with respect to all the learnable
# parameters of the model. Internally, the parameters of each Module are stored
# in Tensors with requires_grad=True, so this call will compute gradients for
# all learnable parameters in the model.
# 反向傳播:相對于模型的所有可學(xué)習(xí)參數(shù)計算損耗的梯度。
# 在內(nèi)部,每個模塊的參數(shù)都存儲在Tensors中,
# 其中require_grad = True,因此此調(diào)用將計算模型中所有可學(xué)習(xí)參數(shù)的梯度。
loss.backward()
# 在執(zhí)行l(wèi)oss.backward()之前要先清空梯度
# Update the weights using gradient descent. Each parameter is a Tensor, so
# we can access its data and gradients like we did before.
# 使用梯度下降更新權(quán)重。每個參數(shù)都是一個Tensor,因此我們可以像以前一樣訪問其數(shù)據(jù)和漸變。
with torch.no_grad():
for param in model.parameters(): # 輸出模型的參數(shù)
param.data -= learning_rate * param.grad
7. PyTorch: optim
到目前為止,我們已通過手動更改持有可學(xué)習(xí)參數(shù)的張量來更新模型的權(quán)重。對于像隨機梯度下降這樣的簡單優(yōu)化算法來說,這并不是一個沉重的負(fù)擔(dān),但是在實踐中,我們經(jīng)常使用更復(fù)雜的優(yōu)化器(如AdaGrad,RMSProp,Adam等)來訓(xùn)練神經(jīng)網(wǎng)絡(luò)。
PyTorch中的optim
軟件包抽象了優(yōu)化算法的思想,并提供了常用優(yōu)化算法的實現(xiàn)。
在此示例中,我們將像以前一樣使用nn
包定義模型,但是我們將使用optim
包提供的Adam算法優(yōu)化模型:
import torch
"""
A fully-connected ReLU network with one hidden layer, trained to predict y from x
by minimizing squared Euclidean distance.
一個全連接網(wǎng)絡(luò)模型,激活函數(shù)是ReLU,具有一個隱藏層且沒有偏差,經(jīng)過訓(xùn)練可以使用歐幾里得誤差根據(jù)x來預(yù)測y。
This implementation uses the nn package from PyTorch to build the network.
該實現(xiàn)使用來自PyTorch的nn軟件包來構(gòu)建網(wǎng)絡(luò)。
Rather than manually updating the weights of the model as we have been doing,
we use the optim package to define an Optimizer that will update the weights
for us. The optim package defines many optimization algorithms that are commonly
used for deep learning, including SGD+momentum, RMSProp, Adam, etc.
與其像我們一直在手動更新模型的權(quán)重,不如使用optim包定義一個優(yōu)化器,該優(yōu)化器將為我們更新權(quán)重。
optim軟件包定義了許多深度學(xué)習(xí)常用的優(yōu)化算法,包括SGD + momentum,RMSProp,Adam等。
"""
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs.
x = torch.randn(N, D_in) # 輸入 (64,1000)
y = torch.randn(N, D_out) # 輸出 (64,10)
# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
) # 定義網(wǎng)絡(luò)模型
loss_fn = torch.nn.MSELoss(reduction='sum') # 定義損失函數(shù)
# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
# 使用optim包定義一個優(yōu)化器,該優(yōu)化器將為我們更新模型的權(quán)重。在這里,我們將使用Adam;
# optim程序包包含許多其他優(yōu)化算法。Adam構(gòu)造函數(shù)的第一個參數(shù)告訴優(yōu)化器應(yīng)該更新哪個張量。
learning_rate = 1e-4 # 學(xué)習(xí)率
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# 優(yōu)化器 torch.optim,第一個參數(shù)是需要更新的系數(shù),第二個參數(shù)是學(xué)習(xí)率
for t in range(500):
# Forward pass: compute predicted y by passing x to the model.
y_pred = model(x) # 調(diào)用網(wǎng)絡(luò)模型進行前向傳播
# Compute and print loss.
loss = loss_fn(y_pred, y) # 計算損失
print(t, loss.item())
# Before the backward pass, use the optimizer object to zero all of the
# gradients for the Tensors it will update (which are the learnable weights
# of the model)
optimizer.zero_grad() # 梯度歸零
# Backward pass: compute gradient of the loss with respect to model parameters
loss.backward() # 后向傳播
# Calling the step function on an Optimizer makes an update to its parameters
optimizer.step() # 調(diào)用優(yōu)化器更新參數(shù)
總結(jié)
- 調(diào)用模型前向傳播 y_pred = model(x)
- 調(diào)用損失函數(shù) loss = loss_fn(y_pred, y)
- 梯度歸零 optimizer.zero_grad()
- 后向傳播 loss.backward()
- 調(diào)用優(yōu)化器更新參數(shù) optimizer.step()
8. PyTorch: Custom nn Modules
有時,您將需要指定比一系列現(xiàn)有模塊更復(fù)雜的模型。對于這些情況,您可以通過子類nn.Module
并定義一個forwad
輸入來定義自己的模塊,該前向接收輸入張量并使用其他模塊或在張量上的其他自動轉(zhuǎn)換操作產(chǎn)生輸出張量。
在此示例中,我們將兩層網(wǎng)絡(luò)實現(xiàn)為自定義的Module
子類:
import torch
"""
A fully-connected ReLU network with one hidden layer, trained to predict y from x
by minimizing squared Euclidean distance.
一個全連接網(wǎng)絡(luò)模型,激活函數(shù)是ReLU,具有一個隱藏層且沒有偏差,經(jīng)過訓(xùn)練可以使用歐幾里得誤差根據(jù)x來預(yù)測y。
This implementation defines the model as a custom Module subclass. Whenever you
want a model more complex than a simple sequence of existing Modules you will
need to define your model this way.
此實現(xiàn)將模型定義為自定義Module子類。每當(dāng)您想要一個比現(xiàn)有模塊的簡單序列更復(fù)雜的模型時,
都需要以這種方式定義模型。
"""
# 自定義類,繼承torch.nn.Module
class TwoLayerNet(torch.nn.Module):
def __init__(self, D_in, H, D_out): # 構(gòu)造函數(shù),self是固定的,然后傳入的系數(shù)
"""
In the constructor we instantiate two nn.Linear modules and assign them as
member variables.
在構(gòu)造函數(shù)中,我們實例化兩個nn.Linear模塊并將其分配為成員變量。
"""
super(TwoLayerNet, self).__init__() # 固定寫法
self.linear1 = torch.nn.Linear(D_in, H) # 線性層 1
self.linear2 = torch.nn.Linear(H, D_out) # 線性層 2
def forward(self, x):
"""
In the forward function we accept a Tensor of input data and we must return
a Tensor of output data. We can use Modules defined in the constructor as
well as arbitrary (differentiable) operations on Tensors.
在前向函數(shù)中,我們接受輸入數(shù)據(jù)的張量,并且必須返回輸出數(shù)據(jù)的張量。
我們可以使用構(gòu)造函數(shù)中定義的模塊以及張量上的任意(可微分)操作。
"""
h_relu = self.linear1(x).clamp(min=0)
y_pred = self.linear2(h_relu)
return y_pred
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in) # 輸入 (64,1000)
y = torch.randn(N, D_out) # 輸出 (64,10)
# Construct our model by instantiating the class defined above.
model = TwoLayerNet(D_in, H, D_out) # 實例化模型,對應(yīng)__init__里面的參數(shù)輸入
# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
loss_fn = torch.nn.MSELoss(reduction='sum') # 損失函數(shù)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4) # 優(yōu)化器
for t in range(500):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x) # 調(diào)用模型,對應(yīng)的是forward里面的參數(shù)
# Compute and print loss
loss = loss_fn(y_pred, y) # 調(diào)用損失函數(shù)
print(t, loss.item())
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad() # 梯度歸零
loss.backward() # 后向傳播
optimizer.step() # 更新參數(shù)
自定義類繼承nn.Module
也是最常用的操作了。
一定要搞清楚傳入的參數(shù),什么時候是實例化模型(對應(yīng)init),什么時候是調(diào)用模型(對應(yīng)forward)
9. PyTorch: Control Flow and Weight Sharing
作為動態(tài)圖和權(quán)重共享的示例,我們實現(xiàn)了一個非常奇怪的模型:一個完全連接的ReLU網(wǎng)絡(luò),該網(wǎng)絡(luò)在每個前向傳播中選擇1到4之間的隨機數(shù),并使用那么多隱藏層,多次重復(fù)使用相同的權(quán)重計算最里面的隱藏層。
對于此模型,可以使用常規(guī)的Python流控制來實現(xiàn)循環(huán),并且可以通過在定義前向傳播時簡單地多次重復(fù)使用同一模塊來實現(xiàn)最內(nèi)層之間的權(quán)重共享。
我們可以輕松地將此模型實現(xiàn)為Module子類:
import random
import torch
"""
To showcase the power of PyTorch dynamic graphs, we will implement a very strange
model: a fully-connected ReLU network that on each forward pass randomly chooses
a number between 1 and 4 and has that many hidden layers, reusing the same
weights multiple times to compute the innermost hidden layers.
為了展示PyTorch動態(tài)圖的強大功能,我們將實現(xiàn)一個非常奇怪的模型:一個完全連接的ReLU網(wǎng)絡(luò),
該網(wǎng)絡(luò)在每個前向傳遞上隨機選擇一個1到4之間的數(shù)字,并且具有許多隱藏層,多次重復(fù)使用相同
的權(quán)重計算最里面的隱藏層。
"""
# 自定義神經(jīng)網(wǎng)咯
class DynamicNet(torch.nn.Module):
def __init__(self, D_in, H, D_out): # 初始化
"""
In the constructor we construct three nn.Linear instances that we will use
in the forward pass.
"""
super(DynamicNet, self).__init__() # 固定用法
self.input_linear = torch.nn.Linear(D_in, H) # 輸入層
self.middle_linear = torch.nn.Linear(H, H) # 中間層
self.output_linear = torch.nn.Linear(H, D_out) # 輸出層
def forward(self, x):
"""
For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
and reuse the middle_linear Module that many times to compute hidden layer
representations.
對于模型的前向傳播,我們隨機選擇0、1、2或3,然后多次重復(fù)使用middle_linear模塊
來計算隱藏層表示。
Since each forward pass builds a dynamic computation graph, we can use normal
Python control-flow operators like loops or conditional statements when
defining the forward pass of the model.
由于每個前向傳播都會構(gòu)建一個動態(tài)計算圖,因此在定義模型的前向傳播時,我們可以
使用諸如循環(huán)或條件語句之類的常規(guī)Python控制流運算符。
Here we also see that it is perfectly safe to reuse the same Module many
times when defining a computational graph. This is a big improvement from Lua
Torch, where each Module could be used only once.
在這里,我們還看到,在定義計算圖時,多次重用同一模塊是絕對安全的。
這是對Lua Torch的一項重大改進,Lua Torch的每個模塊只能使用一次。
"""
h_relu = self.input_linear(x).clamp(min=0) # 輸入層
for _ in range(random.randint(0, 3)):
h_relu = self.middle_linear(h_relu).clamp(min=0) # 隨機調(diào)用中間層
y_pred = self.output_linear(h_relu) # 輸出層
return y_pred
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs.
x = torch.randn(N, D_in) # 輸入 (64,1000)
y = torch.randn(N, D_out) # 輸出 (64,10)
# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out) # 實例化模型
# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(reduction='sum') # 損失函數(shù)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9) # 優(yōu)化器
for t in range(500):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x) # 調(diào)用模型
# Compute and print loss
loss = criterion(y_pred, y) # 計算損失
print(t, loss.item())
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad() # 梯度歸零
loss.backward() # 后向傳播
optimizer.step() # 更新權(quán)重