Pytorch Tutorial【Chapter 3. Simple Neural Network】

Pytorch Tutorial【Chapter 3. Simple Neural Network】

Chapter 3. Simple Neural Network

3.1 Train Neural Network Procedure

A typical neural network training process includes the following points:

  • Define a neural network with trainable parameters

  • Iterate over the entire input

  • Process input through neural network

  • Calculate loss(loss)

  • Backpropagating gradients to parameters of a neural network

  • Update the parameters of the network, typically using a simple update method: weight = weight - learning_rate *gradient

3.2 Build Neural Network Procedure Build neural network

  • Define a class and inherit from ittorch.nn.Moulde
  • Use torch.nncomponents in classes and torch.nn.functionalcomponents in to build network structures
  • Rewrite forwardthe method of this class. In this method, further improve the network structure (such as activation function, pooling layer, etc.), and obtain the output of the return value

torch.nnFirst, let’s briefly introduce the components that need to be used.

  • torch.nn.Conv2d(in_channels, out_channels, kernel_size)Perform convolution operation, the input is (N, C in, H, W) (N, C_{in},H,W)(N,Cin,H,W ) , the output is( N , C out , H , W ) (N,C_{out},H,W)(N,Cout,H,W ) (seeConv2d), the impact of convolution on image size is as follows
Image
  • torch.nn.Linear(in_features, out_features)Perform affine mapping, that is, y = x AT + by=xA^T+by=x AT+b , (seeLinear)
  • torch.nn.Flatten(start_dim=1, end_dim=- 1), flatten a continuous range into a tensor (see Flatten for details )

torch.nn.functionalLet’s introduce the components that need to be used

  • torch.nn.functional.relu(), use the Relu activation function on the input. The specific operation is to perform ReLU ( x ) = max ⁡ ( 0 , x ) \text{ReLU}(x)=\max(0,x) for each elementReLU ( x )=max(0,x ) calculation, (seeReLu)

  • torch.nn.functional.softmax(), use the Softmax activation function for the input. The specific operation is to perform Softmax ( xi ) = exp ⁡ ( xi ) ∑ j exp ⁡ ( xj ) \text{Softmax}(x_i)=\frac{\exp(x_i )}{\sum_{j}\exp(x_j)}Softmax(xi)=jexp(xj)exp(xi)(See Softmax for details )

  • torch.nn.functional.max_pool2d(input, kernel_size, stride), perform max pooling operation, the input is KaTeX parse error: Expected 'EOF', got '_' at position 28: …atch}, \text{in_̲channels}, iH,i…, see Max_Pool2D for details )

For example, in the following code, we need to build a neural network structure as shown below

Image

​ 输入是 [ batch , channels , height , weight ] [\text{batch},\text{channels},\text{height},\text{weight}] [batch,channels,height,weight ] pictures, respectively representing the batch size, number of channels, image height, and image width. Let's take the batch size to be1 11 , then this example becomes[ 1 , 1 , 32 , 32 ] [1,1,32,32][1,1,32,32 ] , the change process of the tensor is as follows
KaTeX parse error: Expected 'EOF', got '_' at position 74: …arrow{\text{max_̲pool}}[1,6,14,1…

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()
        # 1 is input_channel 6 is output_channel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # affine function y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
    
    def forward(self,x):
        # Max pooling over a (2,2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), (2,2))
        # flat the feature as a vector
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.softmax(self.fc3(x), dim=1)
        return x


    def num_flat_features(self,x):
        size = x.size()[1:] # all dimensions except the batch dimension
        num_feature = 1
        for s in size:
            num_feature *= s
        return num_feature

net = Net()
print(net)

or

class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()
        # 1 is input_channel 6 is output_channel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # affine function y = Wx + b
        self.flat = nn.Flatten(1,-1)    # flat the feature
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
    
    def forward(self,x):
        # Max pooling over a (2,2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), (2,2))
        # flat the feature as a vector
        x = self.flat(x)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.softmax(self.fc3(x), dim=1)
        return x


net = Net()
print(net)

The results are as follows, you can view the network structure

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

The trainable parameters of a model can be net.parameters() returned by calling,

params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

The results are as follows

10
torch.Size([6, 1, 5, 5])

Then we can customize some inputs to obtain the output of the network

input = torch.randn(3, 1, 32, 32)
out = net(input)
print(out.data)
sum = 0
for i in out.data[0]:
    sum += i.item()
print(sum)

The output is as follows

tensor([[0.0996, 0.1000, 0.0907, 0.0946, 0.0930, 0.1067, 0.1044, 0.1119, 0.1073,
         0.0918],
        [0.0975, 0.1005, 0.0913, 0.0935, 0.0939, 0.1070, 0.1045, 0.1121, 0.1065,
         0.0933],
        [0.0968, 0.1009, 0.0896, 0.0978, 0.0903, 0.1107, 0.1055, 0.1130, 0.1043,
         0.0913]])
0.9999999925494194

torch.nn.Module.zero_grad()Set all parameter gradient registers to zero,

net.zero_grad() #把所有参数梯度缓存器置零,用随机的梯度来反向传播
out.backward(torch.randn_like(out))
net.conv1.bias.grad  #查看Conv卷积层的偏置和权重一些参数的梯度
net.conv1.weight.grad

The results are as follows

tensor([-0.0064,  0.0136, -0.0046,  0.0008, -0.0044,  0.0021])
...

3.3 Use Loss Function to Backward Use loss function for back propagation

now we are done

  • Define a neural network

  • Process input and invoke backpropagation

Remaining:

  • Calculate loss value

  • Update weights in the network

loss function

A loss function takes a pair of inputs: a model output and a target, and then computes a value that evaluates how far the output is from the target.

There are a few different loss functions in the nn package (see loss-functions for details ). A simple loss function is nn.MSELoss, which calculates the mean square error

input = torch.randn(3, 1, 32, 32)
target = torch.randn(3, 10)
predict = net(input)
criterion = nn.MSELoss()
loss = criterion(predict, target)

print(loss)
print(loss.grad_fn.next_functions[0][0])

We calculated the MSE loss function and were able to trace its computational graph

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear -> sfotmax
      -> MSELoss
      -> loss

When we use loss.backward(), the entire computational graph is differentiated, and in order to implement the backpropagation loss, all we need to do is just use loss.backward(). You need to clear the existing gradients, otherwise the gradients calculated now will be accumulated with the gradients saved in history .

net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

The results are as follows

conv1.bias.grad before backward
None
conv1.bias.grad after backward
tensor([-3.4418e-04, -1.0766e-04,  1.0913e-04, -5.5018e-05,  1.7342e-04,
        -5.3316e-04])

3.4 Update Parameter of NN Update neural network parameters

3.4.1 Update Manually Manually update parameters

We can manually implement stochastic gradient descent to update parameters (please refer to Chapter 6. Stochastic Approximation for details )

wk + 1 = wk − ak ∇ wkf ( wk , xk ) \textcolor{red}{w_{k+1} = w_k - a_k \nabla_{w_k} f(w_k,x_k)}wk+1=wkakwkf(wk,xk)

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)    #sub_() is in-place minus
3.4.2 Update Automatically automatically update parameters

However, if you are using neural networks, you want to use different update rules, similar to SGD, Nesterov-SGD, Adam, RMSProp, etc. To make this possible, we built a small package that torch.optimimplements all the methods. We can use it optim.step()to replace the above manually implemented code, and it is very simple to use.

import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)	
loss.backward()		# 计算梯度
optimizer.step()    # Does the update

Reference

Reference tutorial 1Reference
tutorial 2

Guess you like

Origin blog.csdn.net/qq_44940689/article/details/132120278