Task6.PyTorch understand more neural network optimization method

1. Learn about the different optimizer

2. Write the code optimizer
3.Momentum
4. dimensional optimization, stochastic gradient descent optimization implemented
5.Ada adaptive gradient adjustment method
6.RMSProp
7.Adam
8.PyTorch species selected Optimizer

Gradient descent:

1. Standard gradient descent method: GD
Each sample once lowered, the reference current direction of steepest position towards a local optimum easily obtained, and slow training

2. Batch descent method: BGD
is no longer a sample input adjusted once, but after a batch of data to adjust, adjust model parameters and update all of the input cost function and related samples, that is, before lying down near the master, to choose the most excellent direction.

3. stochastic gradient descent method SGD
randomly selected in a sample in one batch. The blind down, and to walk with a gradient calculated once, always to the base of the mountain. But the introduction of noise may cause weight update down errors. Local optimal solution can not overcome alone.

Momentum Optimization
criteria optimization momentum momentum
to change the current weight value will be changed once the weight on impact was. Under similar rolling the ball was time to bring inertia, accelerate the scrolling.

NAG Newton accelerating gradient

After NAG Newton accelerating gradient applied current speed, momentum is added to a standard correction factor. momentun pellets gradient blindly, but refers nag pellets come slow down its base, generally know where the next position, to update the current location parameter.

Ada adaptive gradient adjustment method : Adagrad: Characteristics of the algorithm the learning rate is automatically adjusted for sparse data. Gradient descent method at each step using the same learning rate for each parameter, such a sweeping approach can not effectively use each data set its own characteristics. Adadelta (Adagrad improved algorithm): One problem is that Adagrad as training progresses, fast learning rate monotonic decay. Adadelta the moving average squared gradient of the square, and instead of the entire history.

RMSProp : RMSprop algorithm is a learning rate adjustment. Adagrad will accumulate all squared gradient before, but only RMSprop calculate the corresponding mean value, thus the algorithm can alleviate learning rate decreased rapidly Adagrad problem.

ADAM : If Adadelta gradient inside the square and as a second moment gradient, the gradient first moment is the sum of itself. In the second moment of Adam algorithm on the basis of Adadelta and the introduction of the first moment. The first moment, in fact, similar to the inside of the momentum method momentum.

 1 import torch
 2 import torch.utils.data as Data
 3 import torch.nn.functional as F
 4 import matplotlib.pyplot as plt
 5 
 6 LR = 0.01
 7 BATCH_SIZE = 32
 8 EPOCH = 12
 9 
10 x = torch.unsqueeze(torch.linspace(-1,1,1000),dim=1)
11 y = x.pow(2) + 0.1*torch.normal(torch.zeros(*x.size()))
12 
13 plt.scatter(x.numpy(),y.numpy())
14 plt.show()
15 
16 torch_dataset = Data.TensorDataset(x,y)
17 loader = Data.DataLoader(dataset=torch_dataset,batch_size=BATCH_SIZE,shuffle=True,num_workers=2)
18 
19 torch_dataset = Data.TensorDataset(x,y)
20 loader = Data.DataLoader(
21     dataset=torch_dataset,
22     batch_size=BATCH_SIZE,
23     shuffle=True,
24     num_workers=2,
25 )
26 
27 class Net(torch.nn.Module):
28     def __init__(self):
29         super(Net,self).__init__()
30         
31         self.hidden = torch.nn.Linear(1,20)
32         self.predict = torch.nn.Linear(20,1)
33         
34     def forward(self,x):
35         x = F.relu(self.hidden(x))
36         x = self.predict(x)
37         return x
38     
39 net_SGD         = Net()
40 net_Momentum    = Net()
41 net_RMSprop     = Net()
42 net_Adam        = Net()
43 nets = [net_SGD, net_Momentum, net_RMSprop, net_Adam]
44 
45 # different optimizers
46 opt_SGD         = torch.optim.SGD(net_SGD.parameters(), lr=LR)
47 opt_Momentum    = torch.optim.SGD(net_Momentum.parameters(), lr=LR, momentum=0.8)
48 opt_RMSprop     = torch.optim.RMSprop(net_RMSprop.parameters(), lr=LR, alpha=0.9)
49 opt_Adam        = torch.optim.Adam(net_Adam.parameters(), lr=LR, betas=(0.9, 0.99))
50 optimizers = [opt_SGD, opt_Momentum, opt_RMSprop, opt_Adam]
51 
52 loss_func = torch.nn.MSELoss()
53 losses_his = [[], [], [], []]   # record loss
54 
55 # training
56 for epoch in range(EPOCH):
57     print('Epoch: ', epoch)
58     for step, (b_x, b_y) in enumerate(loader):          # for each training step
59         for net, opt, l_his in zip(nets, optimizers, losses_his):
60             output = net(b_x)              # get output for every net
61             loss = loss_func(output, b_y)  # compute loss for every net
62             opt.zero_grad()                # clear gradients for next train
63             loss.backward()                # backpropagation, compute gradients
64             opt.step()                     # apply gradients
65             l_his.append(loss.data.numpy())     # loss recoder
66 
67 labels = ['SGD', 'Momentum', 'RMSprop', 'Adam']
68 for i, l_his in enumerate(losses_his):
69     plt.plot(l_his, label=labels[i])
70 plt.legend(loc='best')
71 plt.xlabel('Steps')
72 plt.ylabel('Loss')
73 plt.ylim((0, 0.2))
74 plt.show()

Reference: https: //blog.csdn.net/qingxuanmingye/article/details/90514018

Task6.PyTorch understand more neural network optimization method

Guess you like