PyTorch builds convolutional neural network process

1. Data reading

PyTorch provides us with a very convenient data reading mechanism, that is, to use the combination of the Dataset class and the DataLoader class to get the data iterator. During training or prediction, the data iterator can output the data required for each batch, and perform corresponding preprocessing and data enhancement operations on the data.

Let's look at the Dataset class and the DataLoader class separately.

1. Dataset class

The Dataset class in PyTorch is an abstract class that can be used to represent datasets. We inherit the Dataset class to customize the format, size and other attributes of the dataset, which can be directly used by the DataLoader class later.

In fact, this means that whether you use a custom dataset or an official dataset packaged for us, its essence is to inherit the Dataset class. When inheriting the Dataset class, at least the following methods need to be rewritten:

  • __ init __() : Constructor, which can customize the data reading method and data preprocessing;
  • __len __() : returns the dataset size ;
  • __getitem __() : Index a certain data in the data set.

It is not easy to understand just by looking at the principle. Let's write a simple example to see how to use the Dataset class to define a Tensor type dataset.

import torch
from torch.utils.data import Dataset

class MyDataset(Dataset):
    # 构造函数
    def __init__(self, data_tensor, target_tensor):
        self.data_tensor = data_tensor
        self.target_tensor = target_tensor
    # 返回数据集大小
    def __len__(self):
        return self.data_tensor.size(0)
    # 返回索引的数据与标签
    def __getitem__(self, index):
        return self.data_tensor[index], self.target_tensor[index]

Combined with the code, we can see that we have defined a data set named MyDataset. In the constructor, the data and labels of the Tensor type are passed in; in the __len__ function, the size of the Tensor is returned directly; in __getitem__ The data and labels of the index are returned in the function.

Next, let's take a look at how to call the data set just defined. First randomly generate a 10*3-dimensional data Tensor, and then generate a 10-dimensional label Tensor corresponding to the data Tensor. Use these two Tensors to generate a MyDataset object. You can directly use the len() function to check the size of the data set, and you can directly use the subscript to call the index data.

# 生成数据
data_tensor = torch.randn(10, 3)
target_tensor = torch.randint(2, (10,)) # 标签是0或1

# 将数据封装成Dataset
my_dataset = MyDataset(data_tensor, target_tensor)

# 查看数据集大小
print('Dataset size:', len(my_dataset))
'''
输出:
Dataset size: 10
'''

# 使用索引调用数据
print('tensor_data[0]: ', my_dataset[0])
'''
输出:
tensor_data[0]:  (tensor([ 0.4931, -0.0697,  0.4171]), tensor(0))
'''

2. DataLoader class

In an actual project, if the amount of data is large, considering issues such as limited memory and I/O speed, it is impossible to load all the data into the memory at one time during the training process, nor can it be loaded by only one process. Therefore, multi-process and iterative loading are required, and DataLoader is designed based on these needs.

DataLoader is an iterator. The most basic way to use it is to pass in a Dataset object. It will generate a batch of data according to the value of the parameter batch_size. While saving memory, it can also implement multi-process and data scrambling.

The DataLoader class is called as follows:

from torch.utils.data import DataLoader
tensor_dataloader = DataLoader(dataset=my_dataset, # 传入的数据集, 必须参数
                               batch_size=2,       # 输出的batch大小
                               shuffle=True,       # 数据是否打乱
                               num_workers=0)      # 进程数, 0表示只有主进程

# 以循环形式输出
for data, target in tensor_dataloader: 
    print(data, target)
'''
输出:
tensor([[-0.1781, -1.1019, -0.1507],
        [-0.6170,  0.2366,  0.1006]]) tensor([0, 0])
tensor([[ 0.9451, -0.4923, -1.8178],
        [-0.4046, -0.5436, -1.7911]]) tensor([0, 0])
tensor([[-0.4561, -1.2480, -0.3051],
        [-0.9738,  0.9465,  0.4812]]) tensor([1, 0])
tensor([[ 0.0260,  1.5276,  0.1687],
        [ 1.3692, -0.0170, -1.6831]]) tensor([1, 0])
tensor([[ 0.0515, -0.8892, -0.1699],
        [ 0.4931, -0.0697,  0.4171]]) tensor([1, 0])
'''
 
# 输出一个batch
print('One batch tensor data: ', iter(tensor_dataloader).next())
'''
输出:
One batch tensor data:  [tensor([[ 0.9451, -0.4923, -1.8178],
        [-0.4046, -0.5436, -1.7911]]), tensor([0, 0])]
'''

Combined with the code, let's sort out several parameters in DataLoader, which respectively represent:

  • dataset: Dataset type, input dataset, required parameters;
  • batch_size: int type, how many samples are there in each batch;
  • shuffle: bool type, at the beginning of each epoch, whether to reshuffle the data;
  • num_workers: int type, the number of processes to load data, 0 means that all data will be loaded into the main process, the default is 0.

2. The process of building a neural network in Pytorch

Original link: Master the process of building a neural network in Pytorch in ten minutes .


If you want to make a network, you need to define a Class first and inherit nn.Module (this is necessary, so first import torch.nn as nn, nn is a toolbox, very useful), we call the name of the class into Net.

Class Net(nn.Module):

There are mainly two functions written in this Class, one is the initialized __init__ function, and the other is the forward function. Let's take one at random, as follows:

    def __init__(self):
        super().__init__()
        self.conv1=nn.Conv2d(1,6,5)
        self.conv2=nn.Conv2d(6,16,5)
 
    def forward(self, x):
        x=F.max_pool2d(F.relu(self.conv1(x)),2)
        x=F.max_pool2d(F.relu(self.conv2(x)),2)
        return x
  1. __ init __ is to define the convolutional layer. Of course, you have to super() first to initialize the parent class nn.Module. (Basic knowledge of Python) The main thing here is to define the convolutional layer. For example, the first layer, we call it conv1, and define it as a convolution with input 1 channel, output 6 channels, and convolution kernel 5*5. layer. Conv2 is the same. The "deep learning" of the neural network is actually mainly to learn the parameters in the convolution kernel, like other things that don't need to be learned and changed, so don't put them in. For example, the activation function relu(), if you have to put it in, it is also possible to name it myrelu.
  2. Forward is where the flow of data is actually executed. For example, in the above code, the input x is first passed through the defined conv1 (this name is your own name), and then through the activation function F.relu() (this is not a name you named yourself, you should first import torch.nn .functional as F, F.relu() is an official function. Of course, if you define relu in __init__ as the myrelu I mentioned above, then the first sentence here becomes x=F. max_pool2d(myrelu(self.conv1(x)), 2). The F.max_pool2d pooling in the next step is the same, no more nonsense. After a series of flows, finally return x to the outside.
  3. The Class definition of this Net mainly pays attention to two points . The first : pay attention to the consistency of the output channel and input channel before and after. The first convolutional layer cannot output 4 channels, and the second input 6 channels, so an error will be reported. Second : It is a little different from our regular python class, have you found it? How should we use this Net? First define an instance of Net (after all, Net is just a class and cannot directly pass parameters, output=Net (input) of course not)
net=Net()

In this way, we can pass x in, assuming that you already have an input data "input" to the neural network (this input should be defined as tensor type, how to define tensor, then read the book by yourself. In addition, there are three points of pytorch For several versions, you have to make it into a Variable type, which is not necessary after 4.0) When passing in, it is:

output=net(input)

Look at the previous definition:

def __init__(self):
   ……
 
def forward(self, x):
   ……

It's kind of weird. It seems that regular python generally passes in a data x to the class. In the definition of the class, this x should be passed into the __init__ function as a formal parameter. In the above definition, x is passed in as a formal parameter inside the forward function. In fact, it is not contradictory, because when you define net, it is net=Net(), and no parameters are passed into it. If you want to pass in as needed during initialization, just pass in what you need. It's just that x is the input of the neural network, but it is not required for initialization. To initialize a network, must there be input data? Not necessarily. It's just that when it is passed into the network, it will automatically think that your x is fed to the forward.


After the network is defined, it involves passing in parameters, calculating errors, backpropagating, updating weights... It is really easy to forget the format and order of these things. The way of passing in has been introduced above, which is equivalent to a forward propagation, and the input x of each layer along the way is calculated. If you want the output of the neural network to be similar to the ground truth you expect, that is to continuously reduce the difference between the two. This difference is defined by yourself, that is, the object function or the loss function. . If the loss function loss approaches 0, then the goal is naturally achieved. The loss function loss basically cannot reach 0, but it is hoped that it can reach the minimum value, so it is hoped that it can descend according to the gradient. Everyone should be familiar with the formula of gradient descent. If you are not familiar with it, it is recommended to read the relevant theory. It's just that your input is determined by you, so what can the neural network learn and decide? Naturally it can only determine the weight of each convolutional layer. So the neural network can only modify the weight continuously, such as y=wx+b, x is given by you, it can only change w, b to make the final output y as close as possible to the y value you want, so that the loss will increase smaller. If the partial derivative of loss to the parameter W in the convolutional layer is close to 0, doesn't it mean that it has reached a minimum value? And when your loss calculation method has been given, the reduction of the partial derivative of loss with respect to w can only be realized by updating the parameter convolution layer parameter W (other things can’t be determined, it’s all about your input and which provided). Therefore, the update of W is realized in the following way: ( Note these numbers, which will be mentioned below ) [1] First calculate the partial derivative of the loss for the input x, (of course there are several layers in the network, and this x refers to each layer. input, not the original input); [2] multiply the result of [1] by a step size (this is equivalent to getting a modification to the parameter W); [3] use W to subtract this modification Quantity, to complete a modification of the parameter W.It's not very precise, but the general meaning is this. You can implement this process manually, but how can a large-scale neural network be implemented manually? That's impossible. So we are going to use the framework pytorch and the toolbox torch.nn. So to define the loss function, take MSEloss as an example

compute_loss=nn.MSELoss()

Obviously, it is also a class and cannot directly pass in input data, so direct loss=nn.MSEloss(target, output) is wrong. This function needs to be assigned an instance called compute_loss. Then you can pass in the output of your neural network and the standard answer target:

loss=compute_loss(target,output)

Calculate the loss, the next step is backpropagation :

loss.backward()

This step is actually to calculate [1], and get the update amount of the parameter W in one step , which is regarded as a backpropagation . Pay attention here, what is loss.backward()? If it is your own defined loss (for example, you define a def loss (x, y): return yx ) then you will definitely go wrong directly backward. So you should use the functions provided in nn. Of course, it is impossible to only use the official loss function for deep learning, so if you want to use your own loss function, you must also define the loss as the above Net (otherwise your loss cannot be backpropagated, this point should be noted , Note: This point was written before, the version a long time ago didn’t work, but now it’s okay, basically it’s not necessary anymore ), it also inherits nn.Module, puts the incoming parameters into the forward, and the specific loss is in the forward Forget it, and finally return loss. __init()__ is empty, just write super().__init__. After backpropagation, how to achieve [2] and [3]? It is achieved through the optimizer. Let the optimizer automatically update the network weight W. So after Net is defined , you need to write an optimizer definition (choose SGD method as an example):

from torch import optim
optimizer=optim.SGD(net.parameters(),lr=0.001,momentum=0.9)

Similarly, the optimizer is also a class, first define an instance optimizer, and then use it later. Note that when the optimizer is defined, it is necessary to pass in the parameters of the net to SGD, so that the optimizer has control over the network parameters and can modify it. When passing in, the learning rate lr is also passed in . Before each iteration, first clear the gradient stored in the optimizer (because the "update amount" that has been updated by W will not be used next time)

optimizer.zero_grad()

After loss.backward() backpropagation, update the parameters:

optimizer.step()

So our sequence is:

  1. Define the network first: write the Class of the network Net, declare the instance net=Net() of the network,
  2. define optimizer optimizer=optim.xxx(net.parameters(), lr=xxx),
  3. Then define the loss function (write your own class or use the official one directly, compute_loss=nn.MSELoss() or others.
  4. After the definition is complete, start the cycle again and again: ① Clear the gradient information in the optimizer first, optimizer.zero_grad(); ② Then pass in the input, output=net(input), forward propagation ③ Calculate the loss, loss= compute_loss(target, output) ##Here target is the reference standard value GT, which needs to be prepared by yourself, and corresponds to the input passed in before ④ Error backpropagation, loss.backward() ⑤Update parameters, optimizer.step() like this A basic neural network is implemented. The training of most neural networks can be simplified to this process, nothing more than the complexity of the incoming content, the complexity of the network definition, the complexity of the loss function, and so on. Thanks for pointing out where there is a problem.

3. Convolution layer

The convolutional layer scans and operates the input multi-channel (channel) feature map through a specific number of convolution kernels (also known as filters), so as to obtain multiple output feature maps with higher-level semantic information (the number of channels is equal to the volume number of cores).

3.1. One-dimensional convolution layer: torch.nn.Conv1d()

class torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

One-dimensional convolutional layer, the input scale is (N,C_in,L_in) , the output scale is (N,C_out,L_out)

3.1.1. Parameter description:

  • in_channels (int) – channels of the input signal
  • out_channels (int) – the channels produced by the convolution
  • kerner_size (int or tuple) - the size of the convolution kernel
  • stride (int or tuple, optional) - convolution stride
  • padding (int or tuple, optional) - the number of layers of 0 added to each edge of the input
  • dilation (int or tuple, `optional``) – spacing between convolution kernel elements
  • groups (int, optional) – Number of blocking connections from input channels to output channels. group=1, the output is the convolution of all inputs; group=2, this is equivalent to having two convolutional layers side by side, each convolutional layer calculates half of the input channel, and the output generated is half of the output channel , and then connect the two outputs together.
  • bias (bool, optional) - if bias=True, add a bias

3.1.2. example:

m = nn.Conv1d(16, 33, 3, stride=2)
input = autograd.Variable(torch.randn(20, 16, 50))
output = m(input)

3.2. Two-dimensional convolution layer: torch.nn.Conv2d()

class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

Two-dimensional convolutional layer, the input scale is (N,C_in,H_in,W_in) and the output scale is (N,C_out,H_out,W_out)

  1. 2.1. Parameter description:
  • in_channels (int) – channels of the input signal
  • out_channels (int) – the channels produced by the convolution
  • kerner_size (int or tuple) - the size of the convolution kernel
  • stride (int or tuple, optional) – convolution stride
  • padding (int or tuple, optional) - the number of layers of 0 added to each edge of the input
  • dilation (int or tuple, optional) – spacing between convolution kernel elements
  • groups (int, optional) – number of blocking connections from input channel to output channel
  • bias (bool, optional) - if bias=True, add a bias

3.2.2. example:

# With square kernels and equal stride
m = nn.Conv2d(16, 33, 3, stride=2)
# non-square kernels and unequal stride and with padding
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
# non-square kernels and unequal stride and with padding and dilation
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
input = autograd.Variable(torch.randn(20, 16, 50, 100))
output = m(input)

3.3. Three-dimensional convolution layer: torch.nn.Conv3d()

class torch.nn.Conv3d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

Three-dimensional convolutional layer, the input scale is (N, C_in,D,H,W) and the output scale is (N, C_out, D_out, H_out,W_out)

3.3.1. Parameter description:

  • in_channels (int) – channels of the input signal
  • out_channels (int) – the channels produced by the convolution
  • kerner_size (int or tuple) - the size of the convolution kernel
  • stride (int or tuple, optional) – convolution stride
  • padding (int or tuple, optional) - the number of layers of 0 added to each edge of the input
  • dilation (int or tuple, optional) – spacing between convolution kernel elements
  • groups (int, optional) – number of blocking connections from input channel to output channel
  • bias (bool, optional) - if bias=True, add a bias

3.2. example:

# With square kernels and equal stride
m = nn.Conv3d(16, 33, 3, stride=2)
# non-square kernels and unequal stride and with padding
m = nn.Conv3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(4, 2, 0))
input = autograd.Variable(torch.randn(20, 16, 10, 50, 100))
output = m(input)

4. Pooling layer

In the convolutional neural network, a pooling layer is usually added between adjacent convolutional layers. The pooling layer can effectively reduce the size of the parameter matrix, thereby reducing the number of parameters in the last connection layer. Therefore, adding a pooling layer can speed up the calculation and prevent overfitting.

4.1. 1-dimensional maximum pooling: torch.nn.MaxPool1d()

class torch.nn.MaxPool1d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

如果输入的大小是 (N,C_in,L_in),那么输出的大小是 (N,C_out,L_out)

4.1.1. 参数说明:

  • kernel_size(int or tuple) - max pooling的窗口大小
  • stride(int or tuple,optional) - max pooling的窗口移动的步长。默认值是kernel_size
  • padding(int or tuple,optional) - 输入的每一条边补充0的层数
  • dilation(int or tuple, optional) – 一个控制窗口中元素步幅的参数
  • return_indices - 如果等于True,会返回输出最大值的序号,对于上采样操作会有帮助
  • ceil_mode - 如果等于True,计算输出信号大小的时候,会使用向上取整,代替默认的向下取整的操作

4.1.2. example:

# pool of size=3, stride=2
m = nn.MaxPool1d(3, stride=2)
input = autograd.Variable(torch.randn(20, 16, 50))
output = m(input)

4.2. 2维最大池化:torch.nn.MaxPool2d()

class torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

如果输入的大小是 (N,C, H i n H_{in} Hin, W i n W_{in} Win),那么输出的大小是 (N,C, H o u t H_{out} Hout, W o u t W_{out} Wout)

4.2.1. 参数说明:

  • kernel_size(int or tuple) - max pooling的窗口大小
  • stride(int or tuple,optional) - max pooling的窗口移动的步长。默认值是kernel_size
  • padding(int or tuple,optional) - 输入的每一条边补充0的层数
  • dilation(int or tuple, optional) – 一个控制窗口中元素步幅的参数
  • return_indices - 如果等于True,会返回输出最大值的序号,对于上采样操作会有帮助
  • ceil_mode - 如果等于True,计算输出信号大小的时候,会使用向上取整,代替默认的向下取整的操作

4.2.2. example:

# pool of square window of size=3, stride=2
m = nn.MaxPool2d(3, stride=2)
# pool of non-square window
m = nn.MaxPool2d((3, 2), stride=(2, 1))
input = autograd.Variable(torch.randn(20, 16, 50, 32))
output = m(input)

4.3. 3维最大池化:torch.nn.MaxPool3d()

class torch.nn.MaxPool3d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

如果输入的大小是 (N,C,D,H,W),那么输出的大小是 (N,C,D , H o u t ,H_{out} ,Hout, W o u t W_{out} Wout)

4.3.1. 参数说明:

  • kernel_size(int or tuple) - max pooling的窗口大小
  • stride(int or tuple,optional) - max pooling的窗口移动的步长。默认值是kernel_size
  • padding(int or tuple,optional) - 输入的每一条边补充0的层数
  • dilation(int or tuple, optional) – 一个控制窗口中元素步幅的参数
  • return_indices - 如果等于True,会返回输出最大值的序号,对于上采样操作会有帮助
  • ceil_mode - 如果等于True,计算输出信号大小的时候,会使用向上取整,代替默认的向下取整的操作

4.3.2. example:

# pool of square window of size=3, stride=2
m = nn.MaxPool3d(3, stride=2)
# pool of non-square window
m = nn.MaxPool3d((3, 2, 2), stride=(2, 1, 2))
input = autograd.Variable(torch.randn(20, 16, 50, 32))
output = m(input)

五、非线性激活层

5.1. torch.nn.ReLU()

class torch.nn.ReLU(inplace=False)

5.1.1 参数:

  • inplace-选择是否进行覆盖运算

5.1.2 example:

m = nn.ReLU()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

5.2. torch.nn.ELU()

class torch.nn.ELU(alpha=1.0, inplace=False)

5.2.1 参数:

  • inplace-选择是否进行覆盖运算

5.2.2 example:

m = nn.ELU()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

5.3. torch.nn.LeakyReLU()

class torch.nn.LeakyReLU(negative_slope=0.01, inplace=False)

5.3.1 参数:

  • inplace-选择是否进行覆盖运算
  • negative_slope:控制负斜率的角度,默认等于0.01

5.3.2 example:

m = nn.LeakyReLU(0.1)
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

5.4. torch.nn.Softmax()

对n维输入张量运用Softmax函数,将张量的每个元素缩放到(0,1)区间且和为1。

class torch.nn.Softmax()

5.4.1 example:

m = nn.Softmax()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

5.5. torch.nn.Sigmoid()

对n维输入张量运用Softmax函数,将张量的每个元素缩放到(0,1)区间且和为1。

class torch.nn.Sigmoid()

5.5.1 example:

m = nn.Sigmoid()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

5.6. torch.nn.Tanh ()

对n维输入张量运用Softmax函数,将张量的每个元素缩放到(0,1)区间且和为1。

class torch.nn.Tanh()

5.6.1 example:

m = nn.Tanh()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

六、归一化层

6.1. torch.nn.BatchNorm1d()

对小批量(mini-batch)的2d或3d输入进行批标准化(Batch Normalization)操作

class torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True)

y = x − m e a n [ x ] V a r [ x ] + ϵ ∗ g a m m a + b e t a y = \frac{x - mean[x]}{ \sqrt{Var[x]} + \epsilon} * gamma + beta y=Var[x] +ϵxmean[x]gamma+beta
在每一个小批量(mini-batch)数据中,计算输入各个维度的均值和标准差。gamma与beta是可学习的大小为C的参数向量(C为输入大小)。

在训练时,该层计算每次输入的均值与方差,并进行移动平均。移动平均默认的动量值为0.1。

在验证时,训练求得的均值/方差将用于标准化验证数据。

6.1.1 参数:

  • num_features: 来自期望输入的特征数,该期望输入的大小为’batch_size x num_features [x
    width]’
  • eps: 为保证数值稳定性(分母不能趋近或取0),给分母加上的值。默认为1e-5。
  • momentum:动态均值和动态方差所使用的动量。默认为0.1。
  • affine:一个布尔值,当设为true,给该层添加可学习的仿射变换参数。

Shape: - 输入:(N, C)或者(N, C, L) - 输出:(N, C)或者(N,C,L)(输入输出相同)

6.1.2 example:

# With Learnable Parameters
m = nn.BatchNorm1d(100)
# Without Learnable Parameters
m = nn.BatchNorm1d(100, affine=False)
input = autograd.Variable(torch.randn(20, 100))
output = m(input)

6.2. torch.nn.BatchNorm2d()

对小批量(mini-batch)3d数据组成的4d输入进行批标准化(Batch Normalization)操作

class torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True)

y = x − m e a n [ x ] V a r [ x ] + ϵ ∗ g a m m a + b e t a y = \frac{x - mean[x]}{ \sqrt{Var[x]} + \epsilon} * gamma + beta y=Var[x] +ϵxmean[x]gamma+beta
在每一个小批量(mini-batch)数据中,计算输入各个维度的均值和标准差。gamma与beta是可学习的大小为C的参数向量(C为输入大小)。

在训练时,该层计算每次输入的均值与方差,并进行移动平均。移动平均默认的动量值为0.1。

在验证时,训练求得的均值/方差将用于标准化验证数据。

6.2.1 参数:

  • num_features: 来自期望输入的特征数,该期望输入的大小为’batch_size x num_features x height x width’
  • eps: 为保证数值稳定性(分母不能趋近或取0),给分母加上的值。默认为1e-5。
  • momentum:动态均值和动态方差所使用的动量。默认为0.1。
  • affine:一个布尔值,当设为true,给该层添加可学习的仿射变换参数。

Shape: - 输入:(N, C,H, W) - 输出:(N, C, H, W)(输入输出相同)

6.2.2 example:

# With Learnable Parameters
m = nn.BatchNorm2d(100)
# Without Learnable Parameters
m = nn.BatchNorm2d(100, affine=False)
input = autograd.Variable(torch.randn(20, 100, 35, 45))
output = m(input)

6.3. torch.nn.BatchNorm3d()

对小批量(mini-batch)4d数据组成的5d输入进行批标准化(Batch Normalization)操作

class torch.nn.BatchNorm3d(num_features, eps=1e-05, momentum=0.1, affine=True)

y = x − m e a n [ x ] V a r [ x ] + ϵ ∗ g a m m a + b e t a y = \frac{x - mean[x]}{ \sqrt{Var[x]} + \epsilon} * gamma + beta y=Var[x] +ϵxmean[x]gamma+beta
在每一个小批量(mini-batch)数据中,计算输入各个维度的均值和标准差。gamma与beta是可学习的大小为C的参数向量(C为输入大小)。

在训练时,该层计算每次输入的均值与方差,并进行移动平均。移动平均默认的动量值为0.1。

在验证时,训练求得的均值/方差将用于标准化验证数据。

6.3.1 参数:

  • num_features: 来自期望输入的特征数,该期望输入的大小为 ‘batch_size x num_features depth x height x width’
  • eps: 为保证数值稳定性(分母不能趋近或取0),给分母加上的值。默认为1e-5。
  • momentum:动态均值和动态方差所使用的动量。默认为0.1。
  • affine:一个布尔值,当设为true,给该层添加可学习的仿射变换参数。

Shape: - 输入:(N, C,D, H, W) - 输出:(N, C, D, H, W)(输入输出相同)

6.3.2 example:

# With Learnable Parameters
m = nn.BatchNorm3d(100)
# Without Learnable Parameters
m = nn.BatchNorm3d(100, affine=False)
input = autograd.Variable(torch.randn(20, 100, 35, 45, 10))
output = m(input)

七、线性层

class torch.nn.Linear(in_features, out_features, bias=True)

对输入数据做线性变换:(y = Ax + b)

7.1.参数:

  • in_features - 每个输入样本的大小
  • out_features - 每个输出样本的大小
  • bias - 若设置为False,这层不会学习偏置。默认值:True

7.2.shape

  • 输入: (N,in_features)
  • 输出: (N,out_features)

7.3.example

m = nn.Linear(20, 30)
input = autograd.Variable(torch.randn(128, 20))
output = m(input)
print(output.size())

八、Dropout layers

class torch.nn.Dropout(p=0.5, inplace=False)

随机将输入张量中部分元素设置为0。对于每次前向调用,被置0的元素都是随机的。

8.1.参数:

  • p - 将元素置0的概率。默认值:0.5
  • in-place - 若设置为True,会在原地执行操作。默认值:False

8.2.shape

  • 输入: 任意。输入可以为任意形状。
  • 输出: 相同。输出和输入形状相同。

8.3.example

m = nn.Dropout(p=0.2)
input = autograd.Variable(torch.randn(20, 16))
output = m(input)

九、Loss functions

基本用法:

criterion = LossCriterion() #构造函数有自己的参数
loss = criterion(x, y) #调用标准时也有参数

9.1. torch.nn.L1Loss()

原型:

class torch.nn.L1Loss(size_average=True)

创建一个衡量输入x(模型预测输出)和目标y之间差的绝对值的平均值的标准。计算公式:
l o s s ( x , y ) = 1 / n ∑ ∣ x i − y i ∣ loss(x,y)=1/n\sum|x_i-y_i| loss(x,y)=1/nxiyi

参数:

  • xy 可以是任意形状,每个包含n个元素。
  • n个元素对应的差值的绝对值求和,得出来的结果除以n
  • 如果在创建L1Loss实例的时候在构造函数中传入size_average=False,那么求出来的绝对值的和将不会除以n

9.2. torch.nn.MSELoss()

原型:

class torch.nn.MSELoss(size_average=True)

创建一个衡量输入x(模型预测输出)和目标y之间差均方误差标准。计算公式:
l o s s ( x , y ) = 1 / n ∑ ( x i − y i ) 2 loss(x,y)=1/n\sum(x_i-y_i)^2 loss(x,y)=1/n(xiyi)2

参数:

  • xy 可以是任意形状,每个包含n个元素。
  • n个元素对应的差值的绝对值求和,得出来的结果除以n
  • 如果在创建MSELoss实例的时候在构造函数中传入size_average=False,那么求出来的平方和将不会除以n

9.3. torch.nn.CrossEntropyLoss()

原型:

class torch.nn.CrossEntropyLoss(weight=None, size_average=True)

此标准将 LogSoftMax NLLLoss 集成到一个类中。

当训练一个多类分类器的时候,这个方法是十分有用的。

  • weight(tensor): 1-D tensor,n个元素,分别代表n类的权重,如果你的训练样本很不均衡的话,是非常有用的。默认值为None。

l o s s ( x , y ) = 1 / n ∑ ( x i − y i ) 2 loss(x,y)=1/n\sum(x_i-y_i)^2 loss(x,y)=1/n(xiyi)2

调用时参数:

  • input : 包含每个类的得分, 2-D tensor, shape batch*n
  • target: 大小为 n 1—D tensor,包含类别的索引 (0到 n-1)

Loss可以表述为以下形式:

在这里插入图片描述
当weight参数被指定的时候,loss的计算公式变为:
l o s s ( x , c l a s s ) = w e i g h t s [ c l a s s ] ∗ ( − x [ c l a s s ] + l o g ( ∑ j e x p ( x [ j ] ) ) ) loss(x, class) = weights[class] * (-x[class] + log(\sum_j exp(x[j]))) loss(x,class)=weights[class](x[class]+log(jexp(x[j])))

计算出的loss对mini-batch的大小取了平均。

shape:

  • Input: (N,C) C 是类别的数量
  • Target: (N) N是mini-batch的大小,0 <= targets[i] <= C-1

9.4. 更多Loss functions参见pytorch手册:

pytorch手册Loss functions链接.

Guess you like

Origin blog.csdn.net/luanfenlian0992/article/details/111084991