这部分开始利用pytorch手动搭建一个模型
代码基于python3.7, pytorch 1.0,cuda 10.0 .
import torch
batch_n = 100
hidden_layer = 100
input_data = 1000
output_data = 10
x = torch.randn(batch_n, input_data) #100*1000
y = torch.randn(batch_n,output_data)
w1 = torch.randn(input_data, hidden_layer) # 1000*100
w2 = torch.randn(hidden_layer,output_data) # 100*10
epoch_n = 20
learning_rate = 1e-6
for epoch in range(epoch_n):
h1 = x.mm(w1) # (100*1000)*(1000*100)=(100*100)
h1 = h1.clamp(min = 0) # 将小于0 的部分赋为0,效果类似于RELU
y_pred = h1.mm(w2) #(100*100)*(100*10)=(100*10)
loss = (y_pred - y).pow(2).sum()
print('Epoch:{},Loss:{:.4f}'.format(epoch,loss))
grad_y_pred = 2*(y_pred-y)
grad_w2 = h1.t().mm(grad_y_pred) # 100*10
grad_h = grad_y_pred.clone()
grad_h = grad_h.mm(w2.t()) # (100*10)*(10*100) = 100*100
grad_h.clamp_(min = 0) # 将小于0 的部分赋为0,效果类似于RELU
grad_w1 = x.t().mm(grad_h) # (1000*100)*(100*100) =1000*100
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
Epoch:0,Loss:38099608.0000
Epoch:1,Loss:46942804.0000
Epoch:2,Loss:137625872.0000
Epoch:3,Loss:436085024.0000
Epoch:4,Loss:518145728.0000
Epoch:5,Loss:19701044.0000
Epoch:6,Loss:7346181.0000
Epoch:7,Loss:3750945.5000
Epoch:8,Loss:2430628.7500
Epoch:9,Loss:1870779.2500
Epoch:10,Loss:1590965.2500
Epoch:11,Loss:1421984.7500
Epoch:12,Loss:1301277.8750
Epoch:13,Loss:1204701.1250
Epoch:14,Loss:1122543.2500
Epoch:15,Loss:1050199.0000
Epoch:16,Loss:985411.5625
Epoch:17,Loss:926907.2500
Epoch:18,Loss:873855.3750
Epoch:19,Loss:825397.0625
实现自动梯度功能的过程大致为:
先通过输入的Tensor数据类型的变量在神经网络的前向传播过程中生成一张计算图,然后根据这个计算图和输出结果准确计
算出每个参数需要更新的梯度,并通过完成后向传播完成对参数的梯度更新。
在实践中完成自动梯度需要用到torch.autograd包中的Variable类对我们定义的Tensor数据类型变量进行封装,在封装后,
计算图中的各个节点就是一个Variable对象,这样才能应用自动梯度的功能.
如果已经按照如上方式完成了相关操作,则在选中了计算图中的某个节点时,这个节点必定会是一个Variable对象,用X来
代表我们选中的节点,那么X.data代表Tensor数据类型的变量,X.grad也是一个Variable对象,不过它表示的是X的梯度,
在想访问梯度值时需要使用X.grad.data。
import torch
from torch.autograd import Variable
batch_n = 100
hidden_layer = 100
input_data = 1000
output_data = 10
x = Variable(torch.randn(batch_n, input_data), requires_grad = False)
y = Variable(torch.randn(batch_n, output_data), requires_grad = False)
w1 = Variable(torch.randn(input_data, hidden_layer), requires_grad = True)
w2 = Variable(torch.randn(hidden_layer, output_data), requires_grad = True)
epoch_n = 20
learning_rate = 1e-6
for epoch in range(epoch_n):
y_pred = x.mm(w1).clamp(min = 0).mm(w2)
loss = (y_pred - y).pow(2).sum()
print('Epoch:{},Loss:{:.4f}'.format(epoch,loss.item()))
loss.backward()
w1.data -= learning_rate * w1.grad.data
w2.data -= learning_rate * w2.grad.data
w1.grad.data.zero_() # 将得到的各个参数的梯度值通过grad.data.zero_()全部置零,如果不置零,则被一直累加
w2.grad.data.zero_()
Epoch:0,Loss:48882072.0000
Epoch:1,Loss:105768064.0000
Epoch:2,Loss:457181568.0000
Epoch:3,Loss:932364288.0000
Epoch:4,Loss:33815264.0000
Epoch:5,Loss:16858170.0000
Epoch:6,Loss:10913149.0000
Epoch:7,Loss:7842556.5000
Epoch:8,Loss:5988599.5000
Epoch:9,Loss:4755398.5000
Epoch:10,Loss:3882628.5000
Epoch:11,Loss:3236339.0000
Epoch:12,Loss:2742155.2500
Epoch:13,Loss:2354966.5000
Epoch:14,Loss:2044745.1250
Epoch:15,Loss:1791823.8750
Epoch:16,Loss:1582686.3750
Epoch:17,Loss:1407751.0000
Epoch:18,Loss:1259956.5000
Epoch:19,Loss:1133967.8750
利用构建一个继承了torch.nn.Module的新类
用它来完成对前向传播函数和后向传播函数的重写。在这个新类中,我们使用forward作为前向传播函数的关键字,
使用backward作为后向传播函数的关键字
import torch
from torch.autograd import Variable
batch_n = 64
hidden_layer = 100
input_data = 1000
output_data = 10
class Model(torch.nn.Module):
def __init__(self):
super(Model,self).__init__()
def forward(self,input_d,w1,w2):
x = torch.mm(input_d,w1)
x = torch.clamp(x, min = 0)
x = torch.mm(x,w2)
return x
def backward(self):
pass
model = Model()
x = Variable(torch.randn(batch_n, input_data), requires_grad = False)
y = Variable(torch.randn(batch_n, output_data), requires_grad = False)
w1 = Variable(torch.randn(input_data, hidden_layer), requires_grad = True)
w2 = Variable(torch.randn(hidden_layer, output_data), requires_grad = True)
epoch_n = 30
learning_rate = 1e-6
for epoch in range(epoch_n):
y_pred = model(x,w1,w2)
loss = (y_pred - y).pow(2).sum()
print('Epoch:{},Loss:{:.4f}'.format(epoch, loss.item()))
loss.backward()
w1.data -= learning_rate * w1.grad.data
w2.data -= learning_rate * w2.grad.data
w1.grad.data.zero_()
w2.grad.data.zero_()
Epoch:0,Loss:35576216.0000
Epoch:1,Loss:29630258.0000
Epoch:2,Loss:27081266.0000
Epoch:3,Loss:23531582.0000
Epoch:4,Loss:18141770.0000
Epoch:5,Loss:12285347.0000
Epoch:6,Loss:7588976.0000
Epoch:7,Loss:4544390.0000
Epoch:8,Loss:2790854.5000
Epoch:9,Loss:1825117.0000
Epoch:10,Loss:1284005.7500
Epoch:11,Loss:963436.5000
Epoch:12,Loss:759191.1250
Epoch:13,Loss:618769.6250
Epoch:14,Loss:515776.3125
Epoch:15,Loss:436604.2188
Epoch:16,Loss:373610.4375
Epoch:17,Loss:322276.8750
Epoch:18,Loss:279672.4375
Epoch:19,Loss:243915.5000
Epoch:20,Loss:213714.8438
Epoch:21,Loss:187962.9375
Epoch:22,Loss:165859.5938
Epoch:23,Loss:146788.5469
Epoch:24,Loss:130259.7188
Epoch:25,Loss:115903.3594
Epoch:26,Loss:103373.0859
Epoch:27,Loss:92399.1719
Epoch:28,Loss:82779.6094
Epoch:29,Loss:74314.0547