[Deep Learning] pytorch - Neural Network Toolbox nn

The notes are study notes compiled by myself. If there are any mistakes, please point them out~

Deep learning column link:
http://t.csdnimg.cn/dscW7

Introduction

PyTorch Neural Network Toolbox nn is a core module for building deep learning models. It provides a simple and flexible set of APIs to easily define, train and evaluate various types of neural networks.

The nn module contains many predefined modules and methods, such as linear layers, convolutional layers, recurrent neural networks, loss functions, etc. These modules can be directly called to build deep learning models. The following are the main functions of the nn module:

  1. Define neural network models: Various types of neural networks can be defined using the nn module. By inheriting the nn.Module class, you can create your own neural network class and define the structure and parameters of each layer in it.

  2. Easy-to-use layers: The nn module provides various types of layers such as linear layers, convolutional layers, pooling layers, normalization layers, etc. These layers have implemented the forward propagation and back propagation processes by default and can be used directly.

  3. Nonlinear activation function: The nn module provides common activation functions, such as ReLU, Sigmoid, Tanh, etc., which can add nonlinear transformation between layers.

  4. Automatic derivation: The nn module is based on PyTorch's automatic derivation mechanism, which can automatically calculate gradients and perform backpropagation. This allows users to focus more on the design and implementation of the model without having to manually calculate gradients.

  5. Training and optimization: You can use the optimizer in the nn module (such as SGD, Adam, etc.) to train the model, and use predefined loss functions (such as cross entropy, mean square error, etc.) to evaluate the performance of the model.

  6. Serialization and saving: You can use the API in the nn module to serialize the model to a file and restore the model when needed.

PyTorch Neural Network Toolbox nn provides a set of simple and flexible APIs to easily define, train and evaluate various types of neural networks. It also supports user-defined models and layers, allowing users to create more complex deep learning models according to their own needs and application scenarios.

nn.Module

nn.Module is the base class for all neural network models in PyTorch. All user-defined neural network models should inherit from thenn.Module class and implement its methods.

nn.ModuleProvides some important attributes and methods, allowing users to easily define the structure and parameters of the neural network and perform forward and back propagation calculations.

The following are some important properties and methods ofnn.Module class:

  1. __init__(self): Constructor, used to initialize the structure and parameters of the model. In this method, the user can define various layers in the model and specify their parameters.

  2. forward(self, input): Forward propagation method. In this method, the user defines the forward propagation process of the model. By calling the forward propagation method of each layer and combining and transforming their outputs, the output of the model is finally generated.

  3. parameters(self): Returns all learnable parameters in the model. This method returns an iterator that can be used to iterate over all parameters in the model and optimize and update them.

  4. to(self, device): Move the model to the specified device, such as CPU or GPU. Through this method, the model can be easily loaded onto a suitable device for calculation.

  5. state_dict(self) and load_state_dict(self, state_dict): These two methods are used to serialize and load the state of the model. The state_dict() method returns a dictionary containing all states of the model, which can be saved to a file. The load_state_dict() method loads the model's state from the given dictionary.

  6. train() and eval(): These two methods are used to set the training mode and evaluation mode of the model. In training mode, the model retains some specific operations, such as Dropout layers, and in evaluation mode these operations are turned off.

By inheritingnn.Module class and overriding its methods, users can easily define their own neural network model. At the same time, nn.Module also provides some practical methods and functions, such as parameter management, device movement and status saving, etc., making model training and deployment easier and more efficient.

nn.Module implements the fully connected layer

Use nn.Module to implement the fully connected layer. Fully connected layer, also known as affine layer, output y \textbf{y} yJapanese import x \textbf{x} xfull foot y=Wx+b \textbf{y=Wx+b} y=Wx+b W \textbf{W} Wsum b \textbf{b} b is a learnable parameter.

import torch as t
from torch import nn

class Linear(nn.Module): # 继承nn.Module
    def __init__(self, in_features, out_features):
        super(Linear, self).__init__() # 等价于nn.Module.__init__(self)
        self.w = nn.Parameter(t.randn(in_features, out_features))
        self.b = nn.Parameter(t.randn(out_features))
    
    def forward(self, x):
        x = x.mm(self.w) # x.@(self.w)
        return x + self.b.expand_as(x)
    
    
layer = Linear(4,3)
input = t.randn(2,4)
output = layer(input)
print(output)

for name, parameter in layer.named_parameters():
    print(name, parameter) # w and b 

Code explanation:
This code defines a custom linear layerLinear, which inherits fromnn.Module class, and overridden the __init__ and forward methods in it. The input dimension of this layer is in_features, and the output dimension is out_features.

class Linear(nn.Module): 
    def __init__(self, in_features, out_features):
        super(Linear, self).__init__() 
        self.w = nn.Parameter(t.randn(in_features, out_features))
        self.b = nn.Parameter(t.randn(out_features))
    
    def forward(self, x):
        x = x.mm(self.w) 
        return x + self.b.expand_as(x)

In the__init__ method, the constructor of the parent class is first called to initialize the model, and then two parameters are defined: self.w and < a i=3>, which correspond to the weight matrix and bias vector of this layer respectively, and mark them as learnable parameters. self.b

In theforward method, the input tensorx is first multiplied by the weight matrixself.w, and then Bias vectorself.b, and finally get the output of this layer.

Next, the code creates an Linear objectlayer and passes in the dimensions of the input 4 and the output Dimensions3. Then a random input tensor input is defined with shape (2, 4).

layer = Linear(4,3)
input = t.randn(2,4)

Finally, pass the input tensor into the method of the layer object to get the output tensor . forwardoutput

output = layer(input)

You can view the calculation results of the model for a given input by printing the value of the tensor through the print(output) statement. output

print(output)

In addition, the code uses the named_parameters method to iterate over all the parameters in the layer object and prints their names and values. Two parameters, w and b, can be obtained.

for name, parameter in layer.named_parameters():
    print(name, parameter)

nn.Module implements multi-layer perceptron

Insert image description here

import torch as t
from torch import nn

class Linear(nn.Module): # 继承nn.Module
    def __init__(self, in_features, out_features):
        super(Linear, self).__init__() # 等价于nn.Module.__init__(self)
        self.w = nn.Parameter(t.randn(in_features, out_features))
        self.b = nn.Parameter(t.randn(out_features))
    
    def forward(self, x):
        x = x.mm(self.w) # x.@(self.w)
        return x + self.b.expand_as(x)

class Perceptron(nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        nn.Module.__init__(self)
        self.layer1 = Linear(in_features, hidden_features) # 此处的Linear是自定义的全连接层
        self.layer2 = Linear(hidden_features, out_features)
    def forward(self,x):
        x = self.layer1(x)
        x = t.sigmoid(x)
        return self.layer2(x)
    
perceptron = Perceptron(3,4,1)
for name, param in perceptron.named_parameters():
    print(name, param.size())

Code explanation:

First, a class named Linear is defined, which inherits from nn.Module. This class represents a linear layer containing weight parametersw and bias parametersb.

class Linear(nn.Module):
    def __init__(self, in_features, out_features):
        super(Linear, self).__init__()
        self.w = nn.Parameter(t.randn(in_features, out_features))
        self.b = nn.Parameter(t.randn(out_features))
    
    def forward(self, x):
        x = x.mm(self.w)
        return x + self.b.expand_as(x)

Then, a class named Perceptron is defined, which also inherits from nn.Module. This class represents a multilayer perceptron model, consisting of two linear layers.

class Perceptron(nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        nn.Module.__init__(self)
        self.layer1 = Linear(in_features, hidden_features)
        self.layer2 = Linear(hidden_features, out_features)
    
    def forward(self, x):
        x = self.layer1(x)
        x = t.sigmoid(x)
        return self.layer2(x)

In the initialization method, two Linear objects are created self.layer1 and self.layer2 as the two layers of the perceptron . In the forward propagation method, the input data is first calculated through self.layer1, then passes through the sigmoid activation function, and finally the output is obtained through self.layer2.

Finally, an Perceptronobjectperceptron is created, and the parameter names and sizes are printed out.

perceptron = Perceptron(3, 4, 1)
for name, param in perceptron.named_parameters():
    print(name, param.size())

This code shows how to build a multi-layer perceptron model using custom linear layers and print out the parameter information. Through this example, we can understand how to use PyTorch to build a neural network model, debug and optimize it.

Naming convention for parameters in module:

  • For something likeself.param_name = nn.Parameter(t.randn(3, 4)), name itparam_name
  • For the parameter in the sub-Module, the name of the current Module will be added before its name. For example, for self.sub_module = SubModel(), there is a parameter named param_name in SubModel, then the parameter name formed by splicing the two is sub_module.param_name.

Commonly used neural network layers

In neural networks, there are many commonly used layers used to build various different types of models. Here are a few common neural network layers and their uses:

  1. Fully Connected Layer: The fully connected layer is one of the most basic layers, also called a linear layer or a dense layer. It multiplies each element of the input with a weight and gets the output by adding a bias term. Fully connected layers are often used to extract features and perform classification.

  2. Convolutional Layer: The convolutional layer is a layer specially used to process image and spatial data. It uses convolution operations to filter input data to extract local spatial features. Convolutional layers are commonly used for tasks such as image recognition, object detection, and speech processing.

  3. Pooling Layer: The pooling layer is mainly used to reduce the spatial size of the feature map and retain the most important features. Common pooling operations include max pooling and average pooling, which select the maximum or average value in each region as output respectively.

  4. Recurrent Layer: The recursive layer is used to process sequence data, such as natural language processing and time series forecasting. They have memory mechanisms that transfer information at each time step. Common recurrent layers include recurrent neural networks (RNN) and long short-term memory networks (LSTM).

  5. Embedding Layer: The embedding layer is used to encode discrete symbolic or categorical data into a continuous low-dimensional vector representation. It is widely used in natural language processing to map words or characters into a continuous vector space.

  6. Normalization Layer: The normalization layer is used to standardize the input data in the neural network to improve the stability and convergence speed of the model. Common normalization operations include Batch Normalization and Layer Normalization.

  7. Activation function layer: The activation function layer applies non-linear transformation to the output of the neural network to introduce non-linear capabilities. Common activation functions include ReLU (Rectified Linear Unit), Sigmoid and Tanh, etc.

This is just a small number of commonly used neural network layers. There are many other types of layers in practical applications, such as attention layers, residual connections, etc. Depending on the task and problem, choosing the right combination of layers can effectively build a powerful neural network model.

Image related layer

Image-related layers mainly include convolution layers (Conv), pooling layers (Pool), etc. In actual use, these layers can be divided into one-dimensional (1D), two-dimensional (2D), and three-dimensional (3D). The pooling method is also It is divided into average pooling (AvgPool), maximum pooling (MaxPool), adaptive pooling (AdaptiveAvgPool), etc. In addition to the commonly used forward convolution, the convolution layer also has inverse convolution (TransposeConv).

Convolution layer (Conv) implements sharpening filter

# 导入所需的库
from PIL import Image
from torchvision.transforms import ToTensor, ToPILImage
import torch as t
import torch.nn as nn

# 创建将图像转换为张量的转换器
to_tensor = ToTensor()

# 创建将张量转换为图像的转换器
to_pil = ToPILImage()

# 打开图像文件
lena = Image.open('C:/图片路径/1.png')

# 将彩色图像转换为灰度图像,并将其通道数从3变为1。并将图像转换为张量,并添加一维作为batch_size
input = to_tensor(lena.convert('L')).unsqueeze(0) 

# 定义锐化卷积核
kernel = t.ones(3, 3)/-9.
kernel[1][1] = 1
conv = nn.Conv2d(1, 1, (3, 3), 1, bias=False)
conv.weight.data = kernel.view(1, 1, 3, 3)

# 对输入进行卷积操作
out = conv(input)

# 将输出张量转换为图像,并显示出来
to_pil(out.data.squeeze(0))

Insert image description here
Insert image description here

The function of this code is to implement a sharpening filter to sharpen the input image. Specific steps are as follows:

  1. Create a converter that converts an image to a tensorto_tensor = ToTensor() and a converter that converts a tensor to an imageto_pil = ToPILImage().

  2. Open image filelena = Image.open('图片路径').

  3. input = to_tensor(lena.convert('L')).unsqueeze(0), converts the color image to grayscale and changes its number of channels from 3 to 1. And convert the image to tensor and add one dimension as batch_size, i.e. convert it to model input with input size (1, channels, height, width).

  4. kernel = t.ones(3, 3)/-9.; kernel[1][1] = 1
    Define the sharpening convolution kernel: This convolution kernel is a 3x3 matrix, the center point is 1, and other positions are -1/9, that is, the offset is -1/9.

  5. conv = nn.Conv2d(1, 1, (3, 3), 1, bias=False), create a convolution layer conv. The number of input channels of the convolution layer is 1, the number of output channels is 1, the convolution kernel size is (3, 3), the step size is 1, and the bias is False.

  6. conv.weight.data = kernel.view(1, 1, 3, 3), convert the previously created convolution kernel into a tensor of the same shape, and assign it to the weight of the convolution layer.

  7. out = conv(input): Perform a convolution operation on the input, use the created convolution kernel to perform a convolution operation on the input, and obtain a tensor with an output size of (1, 1, height, width).

  8. to_pil(out.data.squeeze(0)), convert the output tensor to an image and display it. Since one-dimensional batch_size is added during input, this dimension needs to be removed first bysqueeze().

Common activation functions

function image
sigmoid Insert image description here
fishy Insert image description here
resume Insert image description here
ReLU leaks Insert image description here

Here is an example of PyTorch implementing a common activation function:

import torch
import torch.nn as nn

# 定义输入张量x
x = torch.randn(1, 10)

# Sigmoid激活函数
sigmoid = nn.Sigmoid()
activated_x = sigmoid(x)
print("Sigmoid激活后的输出:", activated_x)

# Tanh(双曲正切)激活函数
tanh = nn.Tanh()
activated_x = tanh(x)
print("Tanh激活后的输出:", activated_x)

# ReLU激活函数
relu = nn.ReLU()
activated_x = relu(x)
print("ReLU激活后的输出:", activated_x)

# LeakyReLU激活函数
leaky_relu = nn.LeakyReLU(negative_slope=0.01)
activated_x = leaky_relu(x)
print("LeakyReLU激活后的输出:", activated_x)

# Softmax激活函数
softmax = nn.Softmax(dim=1)
activated_x = softmax(x)
print("Softmax激活后的输出:", activated_x)

In this example, we first define an input tensor of size, and then use ReLU, Sigmoid, Tanh, LeakyReLU and Softmax activation functions process it and output the processed results. (1, 10)x

Note that each activation function is implemented by creating a corresponding PyTorch model. For example, the ReLU activation function is implemented by creating an instance of nn.ReLU. When using it, we just need to pass the input tensor to the model.

The output is as follows:

Sigmoid激活后的输出: tensor([[0.6914, 0.3946, 0.2316, 0.3845, 0.6496, 0.7061, 0.3284, 0.4206, 0.8200, 0.6755]])
Tanh激活后的输出: tensor([[ 0.6678, -0.4035, -0.8334, -0.4386,  0.5492,  0.7046, -0.6141, -0.3099, 0.9080,  0.6250]])
ReLU激活后的输出: tensor([[0.8068, 0.0000, 0.0000, 0.0000, 0.6172, 0.8765, 0.0000, 0.0000, 1.5163,  0.7332]])
LeakyReLU激活后的输出: tensor([[ 0.8068, -0.0043, -0.0120, -0.0047,  0.6172,  0.8765, -0.0072, -0.0032, 1.5163,  0.7332]])
Softmax激活后的输出: tensor([[0.1407, 0.0409, 0.0189, 0.0392, 0.1164, 0.1508, 0.0307, 0.0456, 0.2860,  0.1307]])

The ReLU function has an inplace parameter. If set to True, it will overwrite the output directly into the input, which can save memory/video memory. The reason why it can be covered is that when calculating the back propagation of ReLU, the gradient of the back propagation can be calculated only based on the output. However, only a few autograd operations support inplace operations (such as tensor.sigmoid_()). Unless you clearly know what you are doing, generally do not use inplace operations.

# ReLU激活函数
relu = nn.ReLU(inplace=True)
activated_x = relu(x)
print("ReLU激活后的输出:", activated_x)

ModuleList和Sequential

ModuleList and Sequential are two important classes used to build neural network models in PyTorch.

ModuleList is a container class that can contain multiple submodules (such as layers, activation functions, etc.) and process them as a whole. By using ModuleList, multiple submodules can be easily managed and organized.

Here is an example of usingModuleList to build a model:

import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.layers = nn.ModuleList([
            nn.Linear(10, 20),
            nn.ReLU(),
            nn.Linear(20, 30),
            nn.Tanh()
        ])
    
    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x

In this example, theMyModel class defines a model that contains four submodules. In the model's constructor, an ModuleList object is created and four submodules are added to it. In the forward method of the model, the final output is obtained by traversing the sub-modules in ModuleList and passing the input to them layer by layer.

Compared with directly using lists or tuples to store submodules, one advantage of using ModuleList is that it will automatically register the submodule so that the parameters of the submodule can be modeled Other parts are identified and updated.

Sequential is another class for building models that provides a more concise way to define a continuous sequence of layers. UsingSequential, multiple layers (such as linear layers, activation functions, etc.) can be connected in series to form a complete neural network model.

Here is an example of usingSequential to build a model:

import torch
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(10, 20),
    nn.ReLU(),
    nn.Linear(20, 30),
    nn.Tanh()
)

In this example, the Sequential class is used directly to construct the model, passing each layer as a parameter to the Sequential constructor. In this way, each layer will be added to the model in order.

One advantage of usingSequential is that it provides a more concise and intuitive way to define the model structure. However, sinceSequential is only applicable to sequential connection models, it cannot flexibly handle some complex network structures, such as skip connections or multi-way connections.

ModuleListBoth and Sequential are important tools for building neural network models. ModuleList is suitable for managing and organizing multiple submodules, while Sequential is suitable for concisely defining sequentially connected models. Choosing which class to use depends on your specific needs and model structure.

Recurrent Neural Network Layer (RNN)

Recurrent neural network layer (RNN) is a neural network layer used to process sequence data. Unlike traditional feedforward neural networks, RNN takes each element in a sequence as input and transfers information through memory states, thereby achieving continuous transfer of information in the sequence.

In RNN, the output at the current moment depends not only on the input at the current moment, but also on the input and memory status of all previous moments. Specifically, the calculation of RNN can be expressed as:

h t = f ( W ∗ x t + U ∗ h t − 1 ) h_t = f(W * x_t + U * h_{t-1}) ht=f(Wxt+INht1)
Among them, h_t represents the memory state at the current moment, x_t represents the input at the current moment, and W and U are weight matrices.

The basic structure of RNN is as follows:
Insert image description here
The advantage of RNN is that it can process sequence data of any length, has strong memory ability, and can learn long-term dependencies in the sequence. relation. However, because RNN needs to reuse the same weight matrix many times during the calculation process, it causes the problem of gradient disappearance or explosion, which affects its training effect. In order to solve this problem, some improved RNN structures later appeared, such as LSTM and GRU.

Here is an example of implementing a simple RNN model using PyTorch:

import torch
import torch.nn as nn

class MyRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MyRNN, self).__init__()
        self.hidden_size = hidden_size
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        h0 = torch.zeros(1, x.size(0), self.hidden_size) # 初始化记忆状态
        out, hn = self.rnn(x, h0)
        out = self.fc(out[:, -1, :]) # 取最后一个时刻的输出作为模型输出
        return out

In this example, we create an RNN model class called MyRNN and use the nn.RNN class to define the RNN layer. In the forward method of the model, we first initialize the memory state through the torch.zeros function and pass the input tensor and memory state to the RNN layer for calculation. We then get the final model output by taking the output at the last moment and using a fully connected layer to map it into the output space.

optimizer

In PyTorch, you can use thetorch.optim module to implement the basic use of the optimizer, set different learning rates for different parts of the model, and adjust the learning rate. Below we introduce these contents respectively.

  1. Basic usage of optimizer

First, you need to create an optimizer object to optimize the parameters of the model. Assuming that we have defined a model objectmodel, we can create an optimizer object in the following way:

import torch.optim as optim

optimizer = optim.Adam(model.parameters(), lr=0.01)

In this example, we use the Adam optimization algorithm to update the model parameters. We pass all the trainable parameters (i.e. weights and biases) of the model objectmodel to the constructor of the optimizer and set the initial learning rate0.01.

Next, during the training loop, parameter updates can be made as follows:

optimizer.zero_grad() # 清零梯度
loss.backward() # 反向传播,计算梯度
optimizer.step() # 更新参数

In this example, we first call the zero_grad() function to clear the gradients of all parameters, and then call the backward() function to calculate the gradient of the loss function with respect to the model parameters. , and finally call thestep() function to update the parameters.

  1. Set different learning rates for different parts of the model

Set different learning rates for different parts of the model to better tune model parameters. In PyTorch, this can be achieved by:

optimizer = optim.Adam([
                {
    
    'params': model.conv1.parameters()},
                {
    
    'params': model.conv2.parameters(), 'lr': 0.01},
                {
    
    'params': model.fc.parameters(), 'lr': 0.001}
            ], lr=0.0001)

In this example, the three parts of the model (conv1, conv2, and fc) are placed in different dictionaries, and different learning rates are set for each dictionary. Finally, we pass all dictionaries to the optimizer's constructor and set the initial learning rate0.0001.

  1. Adjust learning rate

Sometimes, the learning rate needs to be adjusted during training to better optimize model parameters. In PyTorch, the learning rate can be adjusted as follows:

# 减小学习率到原来的1/10
for param_group in optimizer.param_groups:
    param_group['lr'] /= 10.0

In this example, we first iterate through all parameter groups, and then divide the learning rate of each parameter group by 10.0, thereby reducing the learning rate to 1/10 of the original. Note that optimizer.param_groups is a list, where each element is a dictionary containing all information related to the parameter group, such as learning rate, weight decay coefficient, etc.

nn.functional VS nn.Module

nn.functional and nn.Module are both very important modules in the PyTorch deep learning library. They both help define neural network models, but they are used in slightly different ways and for different purposes.

  1. nn.functional

nn.functional is a module in PyTorch that contains various nonlinear functions, pooling functions, convolution functions, etc. These functions are implemented as pure functions, that is, their output depends only on the input and does not depend on any external state. Therefore, they are often used to build simple layers with no parameters or complex custom loss functions.

Below is a simple fully connected layer implemented using nn.functional:

import torch.nn as nn
import torch.nn.functional as F

class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 10)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

In this example, nn.Linear is used to create two fully connected layers, and then the F.relu function is used as the activation function in the forward method.

  1. nn.Module

nn.Module is a base class in PyTorch, and all neural network models should inherit it. nn.Module provides some useful methods, such as parameters(), named_parameters(), and modules(), etc., which can conveniently access the parameters in the model. and submodules. After inheriting the nn.Module class, we need to implement the __init__ and forward methods to define the model structure and forward propagation process.

Below is a simple fully connected layer implemented using nn.Module:

import torch.nn as nn

class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 10)

    def forward(self, x):
        x = nn.functional.relu(self.fc1(x))
        x = self.fc2(x)
        return x

In this example, nn.Linear is used to create two fully connected layers, and they are defined as attributes of the model in the __init__ method. Then the nn.functional.relu function is used as the activation function in the forward method.

In general, nn.Module is used to build more complex neural network models with parameters, while nn.functional is used to build simple layers without parameters or custom loss functions. When building neural networks, the nn.Module is often preferred because it provides more flexibility and scalability.

nn.init implements parameter initialization

nn.init is a module in the PyTorch deep learning library used to initialize neural network weights. When training a neural network, it is usually necessary to initialize the weights to improve the convergence speed and generalization ability of the model.

nn.init provides a variety of initialization methods, including common normal distribution initialization, uniform distribution initialization, and Xavier initialization. The following uses Xavier initialization as an example to explain.

  1. Xavier initialization

Xavier initialization is a commonly used weight initialization method that aims to equalize the variances of the input and output signals. This can help speed up model convergence and improve model performance. The specific formula for Xavier initialization is as follows:

W ∼ U [ − 6 n i n + n o u t , 6 n i n + n o u t ] W \sim U[-\frac{\sqrt{6}}{\sqrt{n_{in}+n_{out}}}, \frac{\sqrt{6}}{\sqrt{n_{in}+n_{out}}}] INU[nin+nout 6 ,nin+nout 6 ]

In that, U [ a , b ] U[a, b] U[a,b]Display area [ a , b ] [a, b] [a,Random sampling from the uniform distribution of b], n i n n_{in} nin n o u t n_{out} noutrepresent the input and output dimensions of the weights respectively.

  1. Xavier initialization using nn.init

Using nn.init to initialize Xavier is very simple. You only need to pass the parameters that need to be initialized to the corresponding initialization function. For example, the following code demonstrates how to use nn.init for Xavier initialization:

import torch.nn as nn
import torch.nn.init as init

class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 10)

        # 对权重进行Xavier初始化
        init.xavier_uniform_(self.fc1.weight)
        init.xavier_uniform_(self.fc2.weight)

    def forward(self, x):
        x = nn.functional.relu(self.fc1(x))
        x = self.fc2(x)
        return x

In this example, two fully connected layers are created using nn.Linear and defined as attributes of the model in the __init__ method. Then use the init.xavier_uniform_ function to perform Xavier initialization of the weights.

Overall, using nn.init for weight initialization can help the model converge faster and improve the performance of the model. However, it should be noted that weight initialization is not omnipotent. Sometimes it is necessary to adjust hyperparameters such as learning rate and regularization to optimize the model.

nn.Module in-depth analysis

Constructor of nn.Module base class

def __init__(self):
    self._parameters = OrderedDict()
    self._modules = OrderedDict()
    self._buffers = OrderedDict()
    self._backward_hooks = OrderedDict()
    self._forward_hooks = OrderedDict()
    self.training = True

Constructor of the nn.Module base class__init__ is one of the basic building blocks of all PyTorch neural network models. Its main function is to initialize various attributes of the model and create an empty ordered dictionary to store the model's parameters, submodules, caches, forward/backward propagation hooks, etc.

The following explains each attribute defined in__init__ function:

  1. self._parameters: An ordered dictionary used to store the learnable parameters of the model (such as weights and biases). Variables of type nn.Parameter defined in the model are automatically added to the dictionary.

  2. self._modules: An ordered dictionary used to store submodules of the model. Variables of type nn.Module defined in the model are automatically added to the dictionary.

  3. self._buffers: An ordered dictionary used to store the cache of the model. For example, some intermediate results, sliding averages, etc. can be saved here.

  4. self._backward_hooks: An ordered dictionary used to store the backpropagation hooks of the model, that is, the functions executed during the backpropagation process.

  5. self._forward_hooks: An ordered dictionary used to store the forward propagation hooks of the model, that is, the functions executed during the forward propagation process.

  6. self.training: Boolean variable used to indicate the training status of the model. During training, this variable is True and during testing, this variable is False.

In general,nn.ModuleThe constructor of the base class initializes each attribute in the neural network model and creates an ordered dictionary to store the parameters, submodules, Caching, forward/backpropagation hooks, etc. These properties and ordered dictionaries provide a convenient mechanism to access and manage the components of the model and allow for easy serialization and deserialization.

Sample code:

import torch.nn as nn

class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 10)

    def forward(self, x):
        x = nn.functional.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# 创建模型实例
model = MyNet()

# 打印 Module 属性
print("Parameters:")
for name, param in model.named_parameters():
    print(name, param.size())

print("\nModules:")
for name, module in model.named_modules():
    print(name, module)

print("\nBuffers:")
for name, buffer in model.named_buffers():
    print(name, buffer.size())

print("\nForward hooks:")
for hook in model._forward_hooks.values():
    print(hook)

print("\nBackward hooks:")
for hook in model._backward_hooks.values():
    print(hook)

In this example, we create a neural network model that inherits from the nn.Module base class and defines two fully connected layers in it. Then, we printed the Parameter, Module, Buffer and forward direction of the model by accessing the properties of the instance object and the ordered dictionary. /Back propagation hook and other properties. The output is as follows:

Parameters:
fc1.weight torch.Size([256, 784])
fc1.bias torch.Size([256])
fc2.weight torch.Size([10, 256])
fc2.bias torch.Size([10])

Modules:
 MyNet(
  (fc1): Linear(in_features=784, out_features=256, bias=True)
  (fc2): Linear(in_features=256, out_features=10, bias=True)
)
fc1 Linear(in_features=784, out_features=256, bias=True)
fc2 Linear(in_features=256, out_features=10, bias=True)

Buffers:

Forward hooks:

Backward hooks:

As can be seen from the output:

  • ParametersThe attributes print the learnable parameters of the model, namely the weights and biases of the two fully connected layers.
  • ModulesThe attribute prints the submodules of the model, including the entire model itself and its two fully connected layers.
  • BuffersThe property is empty because no cache is defined in the model.
  • Forward hooksThe property is empty because there are no forward propagation hooks defined in the model.
  • Backward hooksThe property is empty because there are no backpropagation hooks defined in the model.

Access submodules in the model

import torch
import torch.nn as nn

class SubNet(nn.Module):
    def __init__(self):
        super(SubNet, self).__init__()
        self.conv = nn.Conv2d(3, 16, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        
    def forward(self, x):
        x = self.conv(x)
        x = self.pool(x)
        return x

class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.subnet1 = SubNet()
        self.subnet2 = SubNet()
        self.fc = nn.Linear(16 * 8 * 8, 10)
        
    def forward(self, x):
        x1 = self.subnet1(x)
        x2 = self.subnet2(x)
        x = torch.cat((x1, x2), dim=1)
        x = x.view(-1, 16 * 8 * 8)
        x = self.fc(x)
        return x

# 创建模型实例
model = MyNet()

# 查看直接子模块
print("Children:")
for name, module in model.named_children():
    print(name, module)

# 查看所有子模块(包括当前模块)
print("\nModules:")
for name, module in model.named_modules():
    print(name, module)

# 查看命名直接子模块
print("\nNamed Children:")
for name, module in model.named_children():
    print(name, module)

# 查看命名所有子模块(包括当前模块)
print("\nNamed Modules:")
for name, module in model.named_modules():
    print(name, module)

In this example, we define a SubNet submodule and a MyNet model, which MyNet contains TwoSubNetsubmodules and a fully connected layer. We looked at direct submodules, all submodules, and their named versions using the named_children and named_modules functions.

The output is as follows:

Children:
subnet1 SubNet(
  (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
subnet2 SubNet(
  (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)

Modules:
 MyNet(
  (subnet1): SubNet(
    (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (subnet1.conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (subnet1.pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (subnet2): SubNet(
    (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (subnet2.conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (subnet2.pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc): Linear(in_features=1024, out_features=10, bias=True)
)
subnet1 SubNet(
  (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
subnet1.conv Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
subnet1.pool MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
subnet2 SubNet(
  (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
subnet2.conv Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
subnet2.pool MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
fc Linear(in_features=1024, out_features=10, bias=True)

Named Children:
subnet1 SubNet(
  (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
subnet2 SubNet(
  (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)

Named Modules:
 MyNet MyNet(
  (subnet1): SubNet(
    (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (subnet1.conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (subnet1.pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (subnet2): SubNet(
    (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (subnet2.conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (subnet2.pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc): Linear(in_features=1024, out_features=10, bias=True)
)
subnet1 SubNet(
  (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
subnet1.conv Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
subnet1.pool MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
subnet2 SubNet(
  (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
subnet2.conv Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
subnet2.pool MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
fc Linear(in_features=1024, out_features=10, bias=True)

It can be seen from the output results:

  • Use the named_children function to return a direct submodule and access the name of the submodule through the first element in the returned tuple, and the submodule itself through the second element.
  • Usenamed_modules function to return all submodules (including the current module). You can also access the name of the module through the first element in the returned tuple, and the second element through module itself.
  • named_childrenThe and named_modules functions each have a named version, named_childen and named_modules respectively. The first element in the tuple returned by these two functions is the name of the submodule, and the second element is the submodule itself.

These functions provide a convenient way to manage, access, and control submodules in complex neural network models.

おすすめ

転載: blog.csdn.net/weixin_44319595/article/details/134230360