Author: chen_h
Micro Signal & QQ: 862251340
micro-channel public number: coderpai
(C) pytorch study notes
Act quickly erected
Torch provides a lot of convenience in a way, the same neural network can be fast is fast, we look at how to use a simpler way to build the same recurrent neural network.
Quickly build
Let's look at the steps used when writing the neural network before. We net1
on behalf of the neural network built in this way.
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden, n_output):
super(Net, self).__init__()
self.hidden = torch.nn.Linear(n_feature, n_hidden)
self.predict = torch.nn.Linear(n_hidden, n_output)
def forward(self, x):
x = F.relu(self.hidden(x))
x = self.predict(x)
return x
net1 = Net(1, 10, 1) # 这是我们用这种方式搭建的 net1
We use a neural network architecture class inherits a torch, and then made some changes, but there is a move faster, one sentence summarizes all of the contents of the above!
net2 = torch.nn.Sequential(
torch.nn.Linear(1, 10),
torch.nn.ReLU(),
torch.nn.Linear(10, 1)
)
We then compare the structure of the two:
print(net1)
"""
Net (
(hidden): Linear (1 -> 10)
(predict): Linear (10 -> 1)
)
"""
print(net2)
"""
Sequential (
(0): Linear (1 -> 10)
(1): ReLU ()
(2): Linear (10 -> 1)
)
"""
We will find the net2
multi-display some content, which is why? So he put together the activation function also incorporated into, however net1
, the activation function is in fact forward()
the function was only called in. This explains, in comparison net2
, net1
the advantage is, you may need more depending on your personal personalize your own forward propagation process, such as (RNN). but if you do not need 7788 process, I believe that net2
this form is more suitable for you.
Save extract
Well, a training model, we certainly want to save it, to stay the next time to use the direct extraction directly, this is the content of this section of friends. We used regression neural network, for example to achieve conservation extract.
Storage
We quickly build data, to build the network:
torch.manual_seed(1) # reproducible
# 假数据
x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1) # x data (tensor), shape=(100, 1)
y = x.pow(2) + 0.2*torch.rand(x.size()) # noisy y data (tensor), shape=(100, 1)
def save():
# 建网络
net1 = torch.nn.Sequential(
torch.nn.Linear(1, 10),
torch.nn.ReLU(),
torch.nn.Linear(10, 1)
)
optimizer = torch.optim.SGD(net1.parameters(), lr=0.5)
loss_func = torch.nn.MSELoss()
# 训练
for t in range(100):
prediction = net1(x)
loss = loss_func(prediction, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
Next we have two ways to save
torch.save(net1, 'net.pkl') # 保存整个网络
torch.save(net1.state_dict(), 'net_params.pkl') # 只保存网络中的参数 (速度快, 占内存少)
Extraction network
This approach will extract the entire neural network, the network big time may be slower.
def restore_net():
# restore entire net1 to net2
net2 = torch.load('net.pkl')
prediction = net2(x)
Extract only the network parameters
This approach will extract all the parameters, and then put your new network.
def restore_params():
# 新建 net3
net3 = torch.nn.Sequential(
torch.nn.Linear(1, 10),
torch.nn.ReLU(),
torch.nn.Linear(10, 1)
)
# 将保存的参数复制到 net3
net3.load_state_dict(torch.load('net_params.pkl'))
prediction = net3(x)
show result
Call several functions created above, and then plot.
# 保存 net1 (1. 整个网络, 2. 只有参数)
save()
# 提取整个网络
restore_net()
# 提取网络参数, 复制到新网络
restore_params()
Approved training
Torch provides a good thing to help you organize your data structure, called DataLoader
, we can use it to package their data, batch training. Batches of training and can have a variety of ways.
DataLoader
DataLoader
A torch to give you the tools to package your data. So you talk about your own (numpy array or other) data format loaded into Tensor, and then put the wrapper. Use DataLoader
what good is it? Is that they help you efficient iterative data, for example:
import torch
import torch.utils.data as Data
torch.manual_seed(1) # reproducible
BATCH_SIZE = 5 # 批训练的数据个数
x = torch.linspace(1, 10, 10) # x data (torch tensor)
y = torch.linspace(10, 1, 10) # y data (torch tensor)
# 先转换成 torch 能识别的 Dataset
torch_dataset = Data.TensorDataset(data_tensor=x, target_tensor=y)
# 把 dataset 放入 DataLoader
loader = Data.DataLoader(
dataset=torch_dataset, # torch TensorDataset format
batch_size=BATCH_SIZE, # mini batch size
shuffle=True, # 要不要打乱数据 (打乱比较好)
num_workers=2, # 多线程来读数据
)
for epoch in range(3): # 训练所有!整套!数据 3 次
for step, (batch_x, batch_y) in enumerate(loader): # 每一步 loader 释放一小批数据用来学习
# 假设这里就是你训练的地方...
# 打出来一些数据
print('Epoch: ', epoch, '| Step: ', step, '| batch x: ',
batch_x.numpy(), '| batch y: ', batch_y.numpy())
"""
Epoch: 0 | Step: 0 | batch x: [ 6. 7. 2. 3. 1.] | batch y: [ 5. 4. 9. 8. 10.]
Epoch: 0 | Step: 1 | batch x: [ 9. 10. 4. 8. 5.] | batch y: [ 2. 1. 7. 3. 6.]
Epoch: 1 | Step: 0 | batch x: [ 3. 4. 2. 9. 10.] | batch y: [ 8. 7. 9. 2. 1.]
Epoch: 1 | Step: 1 | batch x: [ 1. 7. 8. 5. 6.] | batch y: [ 10. 4. 3. 6. 5.]
Epoch: 2 | Step: 0 | batch x: [ 3. 9. 2. 6. 7.] | batch y: [ 8. 2. 9. 5. 4.]
Epoch: 2 | Step: 1 | batch x: [ 10. 4. 8. 1. 5.] | batch y: [ 1. 7. 3. 10. 6.]
"""
As can be seen, each step 5 data derived learn. And then export the data for each epoch are first disrupted later exported.
This is not really convenient point if we change it. BATCH_SIZE = 8
So that we know step=0
will export 8 data, but the step=1
data is not in the database 8, then how to do it:
BATCH_SIZE = 8 # 批训练的数据个数
...
for ...:
for ...:
...
print('Epoch: ', epoch, '| Step: ', step, '| batch x: ',
batch_x.numpy(), '| batch y: ', batch_y.numpy())
"""
Epoch: 0 | Step: 0 | batch x: [ 6. 7. 2. 3. 1. 9. 10. 4.] | batch y: [ 5. 4. 9. 8. 10. 2. 1. 7.]
Epoch: 0 | Step: 1 | batch x: [ 8. 5.] | batch y: [ 3. 6.]
Epoch: 1 | Step: 0 | batch x: [ 3. 4. 2. 9. 10. 1. 7. 8.] | batch y: [ 8. 7. 9. 2. 1. 10. 4. 3.]
Epoch: 1 | Step: 1 | batch x: [ 5. 6.] | batch y: [ 6. 5.]
Epoch: 2 | Step: 0 | batch x: [ 3. 9. 2. 6. 7. 10. 4. 8.] | batch y: [ 8. 2. 9. 5. 4. 1. 7. 3.]
Epoch: 2 | Step: 1 | batch x: [ 1. 5.] | batch y: [ 10. 6.]
"""
In this case, step=1
it is just to give you return to this epoch in the rest of the data just fine.
Optimizer Optimizer
The figure is that the various section contrast optimizer:
Dummy data
To compare the effects of various optimizer, we need to have some data, today we still own some pseudo-compiled data, these data like this:
import torch
import torch.utils.data as Data
import torch.nn.functional as F
import matplotlib.pyplot as plt
torch.manual_seed(1) # reproducible
LR = 0.01
BATCH_SIZE = 32
EPOCH = 12
# fake dataset
x = torch.unsqueeze(torch.linspace(-1, 1, 1000), dim=1)
y = x.pow(2) + 0.1*torch.normal(torch.zeros(*x.size()))
# plot dataset
plt.scatter(x.numpy(), y.numpy())
plt.show()
# 使用上节内容提到的 data loader
torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(dataset=torch_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2,)
Each optimizer to optimize a neural network
In order to compare each of the optimizer, we give them each to create a neural network, but the neural network are from the same Net
form.
# 默认的 network 形式
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.hidden = torch.nn.Linear(1, 20) # hidden layer
self.predict = torch.nn.Linear(20, 1) # output layer
def forward(self, x):
x = F.relu(self.hidden(x)) # activation function for hidden layer
x = self.predict(x) # linear output
return x
# 为每个优化器创建一个 net
net_SGD = Net()
net_Momentum = Net()
net_RMSprop = Net()
net_Adam = Net()
nets = [net_SGD, net_Momentum, net_RMSprop, net_Adam]
Optimizer Optimizer
Next, create a different optimizer for different training network and create a loss_func
used to calculate the error. We use several common optimizer, SGD
, Momentum
, RMSprop
, Adam
.
# different optimizers
opt_SGD = torch.optim.SGD(net_SGD.parameters(), lr=LR)
opt_Momentum = torch.optim.SGD(net_Momentum.parameters(), lr=LR, momentum=0.8)
opt_RMSprop = torch.optim.RMSprop(net_RMSprop.parameters(), lr=LR, alpha=0.9)
opt_Adam = torch.optim.Adam(net_Adam.parameters(), lr=LR, betas=(0.9, 0.99))
optimizers = [opt_SGD, opt_Momentum, opt_RMSprop, opt_Adam]
loss_func = torch.nn.MSELoss()
losses_his = [[], [], [], []] # 记录 training 时不同神经网络的 loss
Training / showing
Next training and loss drawing.
for epoch in range(EPOCH):
print('Epoch: ', epoch)
for step, (b_x, b_y) in enumerate(loader):
# 对每个优化器, 优化属于他的神经网络
for net, opt, l_his in zip(nets, optimizers, losses_his):
output = net(b_x) # get output for every net
loss = loss_func(output, b_y) # compute loss for every net
opt.zero_grad() # clear gradients for next train
loss.backward() # backpropagation, compute gradients
opt.step() # apply gradients
l_his.append(loss.data.numpy()) # loss recoder
SGD
Is the most common optimizer, it can be said there is no acceleration effect, and Momentum
is SGD
a modified version, which adds momentum principle behind RMSprop
is Momentum
an upgraded version and Adam
is RMSprop
an upgraded version. But from this result we see, Adam
the effect seems more than RMSprop
a bit less. so not the more advanced optimizer, the better the result. we can try different optimization in their own experiments to find the most suitable for your data / network optimizer.
link:
https://morvanzhou.github.io/tutorials/machine-learning/torch/
https://github.com/MorvanZhou/PyTorch-Tutorial/blob/master/tutorial-contents/303_build_nn_quickly.py
https://github.com/MorvanZhou/PyTorch-Tutorial/blob/master/tutorial-contents/304_save_reload.py
https://github.com/MorvanZhou/PyTorch-Tutorial/blob/master/tutorial-contents/305_batch_train.py
https://github.com/MorvanZhou/PyTorch-Tutorial/blob/master/tutorial-contents/306_optimizer.py