1 Neural Network Framework
1.1 Use of the Module class
NN (Neural network): Neural Network
Containers: Container
Convolution Layers: Convolutional Layer
Pooling Layers: Pooling Layer
Padding Layers: Filling Layer
Non-linear Activations (weighted sum, nonlinearity): Non-linear activation
Non-linear Activations (other): Nonlinear activation
Normalization Layers: Normalization layer
Recurrent Layers: Recursive layer
Transformer Layers: Transformation layer
Linear Layers: Linear layer
Dropout Layers: Abandonment layer
Loss Functions: Loss function
...
Containers include:
(1) Module
: The base class for all neural networks
https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module
Class torch.nn.Module(*args, **kwargs)
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, inputX):
x = F.relu(self.conv1(inputX))
return F.relu(self.conv2(inputX))
In the forward function: relu() is the activation function, and conv is the convolution function. Enter inputX-> convolution-> nonlinear processing (relu)-> convolution->non-linear (relu).
python code:
from torch import nn
import torch
class MyNN(nn.Module):
def __init__(self):
super().__init__()
def forward(self, inputX):
outputX = inputX + 1
return outputX
mynn = MyNN()
x = torch.tensor(1.0)
output = mynn(x)
print(output)
Output result:
tensor(2.)
2 Convolution Layers Convolution Layers
2.1 Two-dimensional convolution calculation
Both the input and output matrix types of two-dimensional convolution conv2d() need(N, C_{in}, H_{in}, W_{in})
The input image is 1024x800, the convolution kernel is 3x3, each time 9 elements are multiplied and added, and the calculation is continuously moved to the right, after moving to the far right; then moving down and calculating, after moving to the bottom, the volume is completed product calculation.
import torch
import torch.nn.functional as F
input = torch.tensor([[1, 2, 0, 3, 1],
[0, 1, 2, 3, 1],
[1, 2, 1, 0, 0],
[5, 2, 3, 1, 1],
[2, 1, 0, 1, 1]])
kernel = torch.tensor([[1, 2, 1],
[0, 1, 0],
[2, 1, 0]])
input = torch.reshape(input, (1, 1, 5, 5))
kernel = torch.reshape(kernel, (1, 1, 3, 3))
print("input:")
print(input)
print("kernel:")
print(kernel)
output = F.conv2d(input, kernel, stride=1)
print("output:")
print(output)
Output result:
input:
tensor([[[[1, 2, 0, 3, 1],
[0, 1, 2, 3, 1],
[1, 2, 1, 0, 0],
[5, 2, 3, 1, 1],
[2, 1, 0, 1, 1]]]])
kernel:
tensor([[[[1, 2, 1],
[0, 1, 0],
[2, 1, 0]]]])
output:
tensor([[[[10, 12, 12],
[18, 16, 16],
[13, 9, 3]]]])
If the step stride is modified to 2.
output2 = F.conv2d(input, kernel, stride=2)
print("output2:")
print(output2)
The output is:
output2:
tensor([[[[10, 12],
[13, 3]]]])
Padding fills the original image with a circle of 0, so that the result dimension of the convolution calculation will be larger.
output3 = F.conv2d(input, kernel, stride=1, padding=1)
print("output3:")
print(output3)
Output result:
tensor([[[[ 1, 3, 4, 10, 8],
[ 5, 10, 12, 12, 6],
[ 7, 18, 16, 16, 8],
[11, 13, 9, 3, 4],
[14, 13, 9, 7, 4]]]])
2.2 Image convolution operation
Learning link:
https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d
CLASS torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode=‘zeros’, device=None, dtype=None)
- in_channels (int) – number of input image channels
- out_channels (int) – number of output image channels
- kernel_size (int or tuple) – convolution kernel size
- stride (int or tuple, optional) – Convolution stride (default 1).
- padding (int, tuple or str, optional) – the length of the sides to add to the input image (default 1)
- padding_mode (str, optional) – The side length type: 'zeros', 'reflect', 'replicate' or 'circular'. Default: 'zeros'
- dilation (int or tuple, optional) – spacing between convolution kernels (default 1), dilated convolution.
*groups (int, optional) – Number of blocking connections from input channels to output channels (default 1). - bias (bool, optional) – if True, add a learnable bias to the output (default True).
If in_channel=1, out_channel=2, two convolution kernels will be used to calculate the input image and output the data of two channels:
Convolution formula:
Two-dimensional convolution animation:
https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
When dilation=2, the convolution method:
Image two-dimensional convolution python code:
import torch
import torchvision
from torch import nn
from torch.nn import Conv2d
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
dataset = torchvision.datasets.CIFAR10(root="G:\\Anaconda\\pycharm_pytorch\\learning_project\\dataset_CIFAR10",
train=False,
transform=torchvision.transforms.ToTensor(),
download=False)
dataloader = DataLoader(dataset, batch_size=64)
class MyNN(nn.Module):
def __init__(self):
super(MyNN, self).__init__()
self.conv1 = Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)
def forward(self, x):
x = self.conv1(x)
return x
myNN = MyNN()
print(myNN)
writer = SummaryWriter("G:/Anaconda/pycharm_pytorch/learning_project/logs")
step = 0
for data in dataloader:
imgs, targets = data
output = myNN(imgs)
print(imgs.shape) # torch.Size([64, 3, 32, 32])
print(output.shape) # torch.Size([64, 6, 30, 30])
writer.add_images("input", imgs, step)
# torch.Size([64, 6, 30, 30]) -> # torch.Size([xxx, 3, 30, 30])
output = torch.reshape(output, (-1, 3, 30, 30))
writer.add_images("output", output, step)
step = step + 1
writer.close()
After the code runs, enter in the terminal tensorboard --logdir=logs
to open tensorboard.
It can be seen that the output image is part of the channel after convolution.
3 Pooling Layers Pooling Layers
Learning link:
https://pytorch.org/docs/stable/nn.html#pooling-layers
The role of the pooling layer: (1) downsampling (downsampling), reduce the data dimension, reduce the memory size consumed by the network forward operation; (2) maintain the input characteristics, expand the perception field of the network model; (3) prevent overfitting or underfitting.
3.1 Maximum pooling MaxPool2d
Applies 2D max pooling on an input signal consisting of several input planes.
CLASS torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
- kernel_size (Union[int, Tuple[int, int]]) – The maximum size of the pooling window.
- stride (Union[int, Tuple[int, int]]) – stride of the pooling window (default is kernel_size ).
- padding (Union[int, Tuple[int, int]]) – Implicitly add negative infinite padding on both sides.
- dilation (Union[int, Tuple[int, int]]) – a parameter that controls the step size of elements in the window.
- return_indices (bool) – If True, will return max indices along with output. Useful after torch.nn.MaxUnpool2d.
- ceil_mode (bool) – when True, will use ceil instead of floor to compute the output shape.
Ceil means ceiling mode (ceiling), floor means floor mode (floor). If it is Ceil, it means to round up when rounding an integer; if floor means rounding down when rounding an integer.
In the two-dimensional convolution, when the following conditions occur, when the ceil_mode is True, the convolution of the remaining 6 numbers needs to be retained; if the ceil_mode is False, the convolution does not need to be retained.
The pooling operation is different from the convolution operation. The compensation of pooling is the size of the pooling kernel. The output result of the pooling operation is shown on the right side of the figure below. The results obtained by True and False of ceil_mode are different in size.
Maximum pooling python code:
import torch
from torch import nn
from torch.nn import MaxPool2d
input = torch.tensor([[1, 2, 0, 3, 1],
[0, 1, 2, 3, 1],
[1, 2, 1, 0, 0],
[5, 2, 3, 1, 1],
[2, 1, 0, 1, 1]], dtype=torch.float32)
input = torch.reshape(input, (-1, 1, 5, 5))
print(input)
class MYNN(nn.Module):
def __init__(self):
super(MYNN,self).__init__()
self.maxpool1 = MaxPool2d(kernel_size=3,ceil_mode=True)
self.maxpool2 = MaxPool2d(kernel_size=3, ceil_mode=False)
def forward(self, input):
output1 = self.maxpool1(input)
output2 = self.maxpool2(input)
return output1, output2
mynn = MYNN()
output1, output2 = mynn(input)
print(output1)
print(output2)
Run the script to get the output:
tensor([[[[1., 2., 0., 3., 1.],
[0., 1., 2., 3., 1.],
[1., 2., 1., 0., 0.],
[5., 2., 3., 1., 1.],
[2., 1., 0., 1., 1.]]]])
tensor([[[[2., 3.],
[5., 1.]]]])
tensor([[[[2.]]]])
3.2 Image pooling operation
python code:
import torch
import torchvision
from torch import nn
from torch.nn import MaxPool2d
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
dataset = torchvision.datasets.CIFAR10(root="G:\\Anaconda\\pycharm_pytorch\\learning_project\\dataset_CIFAR10",
train=False,
transform=torchvision.transforms.ToTensor(),
download=False)
dataloader = DataLoader(dataset, batch_size=64)
class MYNN(nn.Module):
def __init__(self):
super(MYNN,self).__init__()
self.maxpool = MaxPool2d(kernel_size=3, ceil_mode=False)
def forward(self, input):
output = self.maxpool(input)
return output
mynn = MYNN()
writer = SummaryWriter("G:/Anaconda/pycharm_pytorch/learning_project/logs_maxpool")
step = 0
for data in dataloader:
imgs, targets = data
writer.add_images("input", imgs, step)
# torch.Size([64, 6, 30, 30]) -> # torch.Size([xxx, 3, 30, 30])
output = mynn(imgs)
writer.add_images("output", output, step)
step = step + 1
writer.close()
After the code runs, enter in the terminal tensorboard --logdir=logs_maxpool
to open tensorboard. It can be seen that the output image is reduced in resolution after the pooling operation.
4 Padding Layers
Learning link:
https://pytorch.org/docs/stable/nn.html#padding-layers
Mainly used functions:
Function name | illustrate |
---|---|
nn.ZeroPad2d | Pads the input Tensor bounds with zeros. |
nn.ConstantPad2d | Pads the input Tensor bounds with constants. |
It can also be implemented in other layers, so this layer can be omitted.
5 Non-linear activation Non-linear Activations (weighted sum, nonlinearity)
Introduce nonlinear features to the neural network.
Function name | illustrate |
---|---|
nn. ReLU | Applies the modified linear unit function element-wise. |
nn.Sigmoid | Applies a function element-wise. |
5.1 Activation function
5.1.1 ReLU
Apply the modified linear unit function element-wise:
CLASS torch.nn.ReLU(inplace=False)
parameter:
- inplace (bool) – Whether to optionally do inplace operations (default False).
Shape: - Input: (∗), where ∗ refers to any number of dimensions.
- Output: (∗), same shape as input.
5.1.2 Sigmod
Apply element function:
CLASS torch.nn.Sigmoid(*args, **kwargs)
Shape:
- Input: (∗), where ∗ refers to any number of dimensions.
- Output: (∗), the same shape as the input.
5.2 Digital Substitution Activation Function Test
python code:
import torch
from torch import nn
from torch.nn import ReLU
from torch.nn import Sigmoid
input = torch.tensor([[1, -0.5],
[-1, 3]])
output = torch.reshape(input, (-1, 1, 2, 2))
print(output)
class MYNN(nn.Module):
def __init__(self):
super(MYNN, self).__init__()
self.relu1 = ReLU()
self.sigmod1 = Sigmoid()
def forward(self, input):
output = self.relu1(input)
output2 = self.sigmod1(input)
return output, output2
mynn = MYNN()
output, output2 = mynn(input)
print(output)
print(output2)
operation result:
tensor([[[[ 1.0000, -0.5000],
[-1.0000, 3.0000]]]])
tensor([[1., 0.],
[0., 3.]])
tensor([[0.7311, 0.3775],
[0.2689, 0.9526]])
5.3 Image nonlinear activation operation
The python code for image nonlinear activation operation:
# 使用数字显示relu和sigmod非线性激活函数的作用
import torch
import torchvision
from torch import nn
from torch.nn import ReLU
from torch.nn import Sigmoid
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
dataset = torchvision.datasets.CIFAR10(root="G:\\Anaconda\\pycharm_pytorch\\learning_project\\dataset_CIFAR10",
train=False,
transform=torchvision.transforms.ToTensor(),
download=False)
dataloader = DataLoader(dataset, batch_size=64)
class MYNN(nn.Module):
def __init__(self):
super(MYNN, self).__init__()
self.relu1 = ReLU()
self.sigmod1 = Sigmoid()
def forward(self, input):
output_relu = self.relu1(input)
output_sigmod = self.sigmod1(input)
return output_relu, output_sigmod
mynn = MYNN()
writer = SummaryWriter("G:/Anaconda/pycharm_pytorch/learning_project/logs_relu")
step = 0
for data in dataloader:
imgs, targets = data
writer.add_images("input", imgs, step)
output_relu, output_sigmod = mynn(imgs)
writer.add_images("output_relu", output_relu, step)
writer.add_images("output_sigmod", output_sigmod, step)
step += 1
print(step)
writer.close()
print("Done")
After the code runs, enter in the terminal tensorboard --logdir=logs_relu
to open tensorboard. It can be seen that the output image is reduced in resolution after the pooling operation.
Because the relu operation is to correct the assignment to 0, but the image is 0-255, so there is no difference between input and output_relu; but the sigmod operation is to correct the value of the image 0-255 according to a certain exponential ratio, so it will produce grayscale Variety.
6 Regularization Layer Normalization Layers
Learning link:
https://pytorch.org/docs/stable/nn.html#normalization-layers
Regularization, also known as normalization, is a step that speeds up neural network learning.