pytorch (6) - the use of the basic skeleton of the neural network nn.module

1 Neural Network Framework

1.1 Use of the Module class

NN (Neural network): Neural Network
Containers: Container
Convolution Layers: Convolutional Layer
Pooling Layers: Pooling Layer
Padding Layers: Filling Layer
Non-linear Activations (weighted sum, nonlinearity): Non-linear activation
Non-linear Activations (other): Nonlinear activation
Normalization Layers: Normalization layer
Recurrent Layers: Recursive layer
Transformer Layers: Transformation layer
Linear Layers: Linear layer
Dropout Layers: Abandonment layer
Loss Functions: Loss function
...

insert image description here

Containers include:
(1) Module: The base class for all neural networks

https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module

Class torch.nn.Module(*args, **kwargs)

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, inputX):
        x = F.relu(self.conv1(inputX))
        return F.relu(self.conv2(inputX))

In the forward function: relu() is the activation function, and conv is the convolution function. Enter inputX-> convolution-> nonlinear processing (relu)-> convolution->non-linear (relu).

python code:

from torch import nn
import torch


class MyNN(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, inputX):
        outputX = inputX + 1
        return outputX

mynn = MyNN()
x = torch.tensor(1.0)
output = mynn(x)
print(output)

Output result:

tensor(2.)

2 Convolution Layers Convolution Layers

2.1 Two-dimensional convolution calculation

insert image description here


Both the input and output matrix types of two-dimensional convolution conv2d() need(N, C_{in}, H_{in}, W_{in})

The input image is 1024x800, the convolution kernel is 3x3, each time 9 elements are multiplied and added, and the calculation is continuously moved to the right, after moving to the far right; then moving down and calculating, after moving to the bottom, the volume is completed product calculation.

import torch
import torch.nn.functional as F

input = torch.tensor([[1, 2, 0, 3, 1],
                      [0, 1, 2, 3, 1],
                      [1, 2, 1, 0, 0],
                      [5, 2, 3, 1, 1],
                      [2, 1, 0, 1, 1]])
kernel = torch.tensor([[1, 2, 1],
                       [0, 1, 0],
                       [2, 1, 0]])
input = torch.reshape(input, (1, 1, 5, 5))
kernel = torch.reshape(kernel, (1, 1, 3, 3))

print("input:")
print(input)
print("kernel:")
print(kernel)

output = F.conv2d(input, kernel, stride=1)
print("output:")
print(output)

Output result:

input:
tensor([[[[1, 2, 0, 3, 1],
          [0, 1, 2, 3, 1],
          [1, 2, 1, 0, 0],
          [5, 2, 3, 1, 1],
          [2, 1, 0, 1, 1]]]])
kernel:
tensor([[[[1, 2, 1],
          [0, 1, 0],
          [2, 1, 0]]]])
output:
tensor([[[[10, 12, 12],
          [18, 16, 16],
          [13,  9,  3]]]])

If the step stride is modified to 2.

output2 = F.conv2d(input, kernel, stride=2)
print("output2:")
print(output2)

The output is:

output2:
tensor([[[[10, 12],
          [13,  3]]]])

Padding fills the original image with a circle of 0, so that the result dimension of the convolution calculation will be larger.
insert image description here

output3 = F.conv2d(input, kernel, stride=1, padding=1)
print("output3:")
print(output3)

Output result:

tensor([[[[ 1,  3,  4, 10,  8],
          [ 5, 10, 12, 12,  6],
          [ 7, 18, 16, 16,  8],
          [11, 13,  9,  3,  4],
          [14, 13,  9,  7,  4]]]])

2.2 Image convolution operation

Learning link:

https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d

CLASS torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode=‘zeros’, device=None, dtype=None)

  • in_channels (int) – number of input image channels
  • out_channels (int) – number of output image channels
  • kernel_size (int or tuple) – convolution kernel size
  • stride (int or tuple, optional) – Convolution stride (default 1).
  • padding (int, tuple or str, optional) – the length of the sides to add to the input image (default 1)
  • padding_mode (str, optional) – The side length type: 'zeros', 'reflect', 'replicate' or 'circular'. Default: 'zeros'
  • dilation (int or tuple, optional) – spacing between convolution kernels (default 1), dilated convolution.
    *groups (int, optional) – Number of blocking connections from input channels to output channels (default 1).
  • bias (bool, optional) – if True, add a learnable bias to the output (default True).

If in_channel=1, out_channel=2, two convolution kernels will be used to calculate the input image and output the data of two channels:
insert image description here

Convolution formula:
insert image description here
Two-dimensional convolution animation:

https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md

insert image description here

When dilation=2, the convolution method:
insert image description here

Image two-dimensional convolution python code:

import torch
import torchvision
from torch import nn
from torch.nn import Conv2d
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

dataset = torchvision.datasets.CIFAR10(root="G:\\Anaconda\\pycharm_pytorch\\learning_project\\dataset_CIFAR10",
                                       train=False,
                                       transform=torchvision.transforms.ToTensor(),
                                       download=False)
dataloader = DataLoader(dataset, batch_size=64)

class MyNN(nn.Module):
    def __init__(self):
        super(MyNN, self).__init__()
        self.conv1 = Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)

    def forward(self, x):
        x = self.conv1(x)
        return x

myNN = MyNN()
print(myNN)

writer = SummaryWriter("G:/Anaconda/pycharm_pytorch/learning_project/logs")
step = 0
for data in dataloader:
    imgs, targets = data
    output = myNN(imgs)
    print(imgs.shape)       # torch.Size([64, 3, 32, 32])
    print(output.shape)     # torch.Size([64, 6, 30, 30])
    writer.add_images("input", imgs, step)
    # torch.Size([64, 6, 30, 30]) -> # torch.Size([xxx, 3, 30, 30])
    output = torch.reshape(output, (-1, 3, 30, 30))
    writer.add_images("output", output, step)

    step = step + 1

writer.close()

After the code runs, enter in the terminal tensorboard --logdir=logsto open tensorboard.
It can be seen that the output image is part of the channel after convolution.
insert image description here

3 Pooling Layers Pooling Layers

Learning link:

https://pytorch.org/docs/stable/nn.html#pooling-layers

The role of the pooling layer: (1) downsampling (downsampling), reduce the data dimension, reduce the memory size consumed by the network forward operation; (2) maintain the input characteristics, expand the perception field of the network model; (3) prevent overfitting or underfitting.

3.1 Maximum pooling MaxPool2d

Applies 2D max pooling on an input signal consisting of several input planes.

CLASS torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

  • kernel_size (Union[int, Tuple[int, int]]) – The maximum size of the pooling window.
  • stride (Union[int, Tuple[int, int]]) – stride of the pooling window (default is kernel_size ).
  • padding (Union[int, Tuple[int, int]]) – Implicitly add negative infinite padding on both sides.
  • dilation (Union[int, Tuple[int, int]]) – a parameter that controls the step size of elements in the window.
  • return_indices (bool) – If True, will return max indices along with output. Useful after torch.nn.MaxUnpool2d.
  • ceil_mode (bool) – when True, will use ceil instead of floor to compute the output shape.

Ceil means ceiling mode (ceiling), floor means floor mode (floor). If it is Ceil, it means to round up when rounding an integer; if floor means rounding down when rounding an integer.
insert image description here

In the two-dimensional convolution, when the following conditions occur, when the ceil_mode is True, the convolution of the remaining 6 numbers needs to be retained; if the ceil_mode is False, the convolution does not need to be retained.
insert image description here
The pooling operation is different from the convolution operation. The compensation of pooling is the size of the pooling kernel. The output result of the pooling operation is shown on the right side of the figure below. The results obtained by True and False of ceil_mode are different in size.
insert image description here

Maximum pooling python code:

import torch
from torch import nn
from torch.nn import MaxPool2d

input = torch.tensor([[1, 2, 0, 3, 1],
                     [0, 1, 2, 3, 1],
                     [1, 2, 1, 0, 0],
                     [5, 2, 3, 1, 1],
                     [2, 1, 0, 1, 1]], dtype=torch.float32)

input = torch.reshape(input, (-1, 1, 5, 5))
print(input)

class MYNN(nn.Module):
    def __init__(self):
        super(MYNN,self).__init__()
        self.maxpool1 = MaxPool2d(kernel_size=3,ceil_mode=True)
        self.maxpool2 = MaxPool2d(kernel_size=3, ceil_mode=False)

    def forward(self, input):
        output1 = self.maxpool1(input)
        output2 = self.maxpool2(input)
        return output1, output2

mynn = MYNN()
output1, output2 = mynn(input)
print(output1)
print(output2)

Run the script to get the output:

tensor([[[[1., 2., 0., 3., 1.],
          [0., 1., 2., 3., 1.],
          [1., 2., 1., 0., 0.],
          [5., 2., 3., 1., 1.],
          [2., 1., 0., 1., 1.]]]])
tensor([[[[2., 3.],
          [5., 1.]]]])
tensor([[[[2.]]]])

3.2 Image pooling operation

python code:

import torch
import torchvision
from torch import nn
from torch.nn import MaxPool2d
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

dataset = torchvision.datasets.CIFAR10(root="G:\\Anaconda\\pycharm_pytorch\\learning_project\\dataset_CIFAR10",
                                       train=False,
                                       transform=torchvision.transforms.ToTensor(),
                                       download=False)
dataloader = DataLoader(dataset, batch_size=64)


class MYNN(nn.Module):
    def __init__(self):
        super(MYNN,self).__init__()
        self.maxpool = MaxPool2d(kernel_size=3, ceil_mode=False)

    def forward(self, input):
        output = self.maxpool(input)
        return output


mynn = MYNN()
writer = SummaryWriter("G:/Anaconda/pycharm_pytorch/learning_project/logs_maxpool")
step = 0
for data in dataloader:
    imgs, targets = data
    writer.add_images("input", imgs, step)
    # torch.Size([64, 6, 30, 30]) -> # torch.Size([xxx, 3, 30, 30])
    output = mynn(imgs)
    writer.add_images("output", output, step)
    step = step + 1

writer.close()

After the code runs, enter in the terminal tensorboard --logdir=logs_maxpoolto open tensorboard. It can be seen that the output image is reduced in resolution after the pooling operation.
insert image description here

4 Padding Layers

Learning link:

https://pytorch.org/docs/stable/nn.html#padding-layers

Mainly used functions:

Function name illustrate
nn.ZeroPad2d Pads the input Tensor bounds with zeros.
nn.ConstantPad2d Pads the input Tensor bounds with constants.

It can also be implemented in other layers, so this layer can be omitted.

5 Non-linear activation Non-linear Activations (weighted sum, nonlinearity)

Introduce nonlinear features to the neural network.

Function name illustrate
nn. ReLU Applies the modified linear unit function element-wise.
nn.Sigmoid Applies a function element-wise.

5.1 Activation function

5.1.1 ReLU

Apply the modified linear unit function element-wise:
insert image description here

CLASS torch.nn.ReLU(inplace=False)

parameter:

  • inplace (bool) – Whether to optionally do inplace operations (default False).
    Shape:
  • Input: (∗), where ∗ refers to any number of dimensions.
  • Output: (∗), same shape as input.
    insert image description here

5.1.2 Sigmod

Apply element function:
insert image description here

CLASS torch.nn.Sigmoid(*args, **kwargs)

Shape:

  • Input: (∗), where ∗ refers to any number of dimensions.
  • Output: (∗), the same shape as the input.

insert image description here

5.2 Digital Substitution Activation Function Test

python code:

import torch
from torch import nn
from torch.nn import ReLU
from torch.nn import Sigmoid

input = torch.tensor([[1, -0.5],
                      [-1, 3]])

output = torch.reshape(input, (-1, 1, 2, 2))
print(output)

class MYNN(nn.Module):
    def __init__(self):
        super(MYNN, self).__init__()
        self.relu1 = ReLU()
        self.sigmod1 = Sigmoid()

    def forward(self, input):
        output = self.relu1(input)
        output2 = self.sigmod1(input)
        return output, output2

mynn = MYNN()
output, output2 = mynn(input)
print(output)
print(output2)

operation result:

tensor([[[[ 1.0000, -0.5000],
          [-1.0000,  3.0000]]]])
tensor([[1., 0.],
        [0., 3.]])
tensor([[0.7311, 0.3775],
        [0.2689, 0.9526]])

5.3 Image nonlinear activation operation

The python code for image nonlinear activation operation:

# 使用数字显示relu和sigmod非线性激活函数的作用
import torch
import torchvision
from torch import nn
from torch.nn import ReLU
from torch.nn import Sigmoid
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

dataset = torchvision.datasets.CIFAR10(root="G:\\Anaconda\\pycharm_pytorch\\learning_project\\dataset_CIFAR10",
                                       train=False,
                                       transform=torchvision.transforms.ToTensor(),
                                       download=False)
dataloader = DataLoader(dataset, batch_size=64)

class MYNN(nn.Module):
    def __init__(self):
        super(MYNN, self).__init__()
        self.relu1 = ReLU()
        self.sigmod1 = Sigmoid()

    def forward(self, input):
        output_relu = self.relu1(input)
        output_sigmod = self.sigmod1(input)
        return output_relu, output_sigmod

mynn = MYNN()
writer = SummaryWriter("G:/Anaconda/pycharm_pytorch/learning_project/logs_relu")
step = 0
for data in dataloader:
    imgs, targets = data
    writer.add_images("input", imgs, step)
    output_relu, output_sigmod = mynn(imgs)
    writer.add_images("output_relu", output_relu, step)
    writer.add_images("output_sigmod", output_sigmod, step)
    step += 1
    print(step)

writer.close()
print("Done")

After the code runs, enter in the terminal tensorboard --logdir=logs_reluto open tensorboard. It can be seen that the output image is reduced in resolution after the pooling operation.

Because the relu operation is to correct the assignment to 0, but the image is 0-255, so there is no difference between input and output_relu; but the sigmod operation is to correct the value of the image 0-255 according to a certain exponential ratio, so it will produce grayscale Variety.
insert image description here

6 Regularization Layer Normalization Layers

Learning link:

https://pytorch.org/docs/stable/nn.html#normalization-layers

Regularization, also known as normalization, is a step that speeds up neural network learning.

Guess you like

Origin blog.csdn.net/qq_45362336/article/details/131999099