Lecture10 Basic CNN

1 Fully connected network

Fully Connected Neural Network ( Fully Connected Neural Network ) is a basic artificial neural network model, also known as Multi-Layer Perceptron ( MLP ). In a fully connected neural network, the connection between neurons is fully connected, each neuron is connected to all neurons in the previous layer, and each neuron has a weight, which is used to calculate the weighted sum of the input signal , and perform nonlinear transformation through the activation function , and finally output a result.

However, since the connections between neurons are fully connected, this leads to a large number of model parameters and is prone to problems such as overfitting. Therefore, fully connected neural networks are gradually replaced by other types of neural network models in practical applications, such as convolutional neural networks and recurrent neural networks.

The following is a schematic diagram of a fully connected neural network:

insert image description here
insert image description here

2 Convolutional Neural Networks

2.1 What is a Convolutional Neural Network

Convolutional Neural Network ( CNN ) is a feedforward neural network. It extracts image features through convolution ( convolution ) operations, and then reduces the size and dimension of feature maps through pooling ( pooling ) operations.

2.2 Convolutional neural network or fully connected network?

Convolutional neural networks have the following advantages over fully connected networks:

Parameter sharing : Each convolution kernel of the convolutional layer performs a convolution operation on the entire input, so parameters are shared in different positions. This means that the model has a lower number of parameters, is faster to train, and has better generalization capabilities.

Position invariance : Convolution operations are local operations that can efficiently handle transformations such as translation, rotation, and scaling in images. This position invariance is one of the important reasons why convolutional neural networks can be successfully applied to tasks such as image recognition.

Deep composability : Convolutional neural networks consist of multiple convolutional and pooling layers, each of which can extract specific features of the data. More complex feature extraction and classification tasks can be achieved by stacking multiple layers together.

In summary, convolutional neural networks have better parameter efficiency, position invariance, and depth composability than fully connected networks.

2.3 Convolutional Neural Network Execution Process

insert image description here
Note: In the Input image 1x28x28, it meanschannel(通道) * width(宽度) * height(高度)

The role of each layer in the above process:
C 1 C_1C1Layer, the role is to preserve the spatial structure of the image. Because when accessing image data, we store the image as one-dimensional by default, so some spatial structure information of the image is lost, which needs to be restored through this layer.

S 1 S_1 S1Layer, which means subsampling , is used to reduce the amount of data and reduce computing requirements. Downsampling can reduce the size of Feature maps, thereby reducing the number of parameters that need to be calculated in the network, thereby improving the computational efficiency of the network.

Feature extraction : CNN can automatically learn features, and can gradually extract more abstract and meaningful feature representations of input data through multi-layer convolution and pooling operations.

Classification : Map these features to different categories to classify the input data.

CNN extracts image features through convolutional layers, and finally flattens the feature map into a one-dimensional vector, and maps it to the dimension of the target output space through a fully connected layer. For example, the map is mapped to a ten-dimensional output, so the cross-entropy loss can be calculated, and then the softmax function is used to convert the output of the model into a probability distribution for comparison with the real label to solve the classification problem.

2.4 Calculation process of convolution

insert image description here
To put it simply, an image is divided into three channels of red, green and blue. Then take a patch
in the image , and then do convolution. The convolution is done for a patch in the image.

In convolutional neural networks, features are usually extracted by taking a small area ( called a patch or kernel ) in an image and performing convolution operations on it. The size of this small area is usually a square or a rectangle, specified by the user. In the convolution operation, the convolution kernel slides over the image and performs a dot product operation on each small area with the convolution kernel to produce an output value. By changing the size and shape of the convolution kernel, different feature information can be extracted.

The operation process of convolution:
insert image description here
Note: the red box in the above picture is equivalent to a patch

Final Results:
insert image description here

Multiple convolution kernels participate in the operation:
insert image description here
Note: The number of convolution kernels must be the same as the number of input channels.

insert image description here

If you want to get n output channels, then you need n convolution kernels:
insert image description here

m * n * 卷积核的宽度 * 卷积核的高度. n is the input channel, and m is the number of convolution kernels.
For example, here m=4, that is, the convolution kernel is assembled into a four-dimensional tensor
insert image description here

2.5 Detailed Code Explanation

import torch

# 定义输入数据
input = [3, 4, 6, 5, 7,
         2, 4, 6, 8, 2,
         1, 6, 7, 8, 4,
         9, 7, 4, 6, 2,
         3, 7, 5, 4, 1]

# 将输入数据转换为张量并设置其维度为 (batch_size=1, channels=1, height=5, width=5)
input = torch.Tensor(input).view(1, 1, 5, 5)

# 定义卷积层,设置卷积核大小为 3*3,填充为 1,且不使用偏置项
conv_layer = torch.nn.Conv2d(1, 1, kernel_size=3, padding=1, bias=False)

# 定义卷积核
```这里的view函数是用来对Tensor进行形状的变换,
具体地,[1, 2, 3, 4, 5, 6, 7, 8, 9]
这个一维Tensor被变换成了形状为[1, 1, 3, 3]的四维Tensor。
其中第一个1表示batch_size的维度,
第二个1表示输入通道的维度,
第三个3表示kernel的高度,
第四个3表示kernel的宽度。```
kernel = torch.Tensor([1, 2, 3, 4, 5, 6, 7, 8, 9]).view(1, 1, 3, 3)

# 将卷积核的值赋给卷积层的权重
conv_layer.weight.data = kernel.data

# 对输入数据进行卷积操作
output = conv_layer(input)

# 打印卷积结果
print(output)

Output result:
insert image description here

2.6 Detailed explanation of each step of convolutional neural network

2.6.1 Padding

Filling operation (padding):
insert image description here

insert image description here

insert image description here

In the convolution process, the padding operation refers to adding a fixed number of virtual pixels (usually 0) around the input data, so that the convolution kernel can completely cover the boundary pixels of the input data during convolution . .

The filling operation can prevent the edge pixels of the input data from being ignored during the convolution process, so that the size of the output feature map can be kept consistent with the input feature map or reduced more slowly. At the same time, the filling operation can also increase the network's ability to extract features, thereby improving the performance of the network.

import torch

input = [3,4,6,5,7,
         2,4,6,8,2,
         1,6,7,8,4,
         9,7,4,6,2,
         3,7,5,4,1]
input = torch.Tensor(input).view(1, 1, 5, 5)

conv_layer = torch.nn.Conv2d(1, 1, kernel_size=3, padding=1, bias=False)
kernel = torch.Tensor([1,2,3,4,5,6,7,8,9]).view(1, 1, 3, 3)
conv_layer.weight.data = kernel.data

output = conv_layer(input)
print(output)

2.6.2 Step size

Assume that the step size is 2:
insert image description here
In the convolutional neural network, the step size ( Stride ) refers to the step size of each slide of the convolution kernel when convolving the input data. The larger the stride, the smaller the size of the output feature map after each convolution, thereby reducing the amount of calculation. On the contrary, the smaller the stride, the larger the size of the output feature map, which increases the amount of calculation, but improves the detail representation ability of the feature map.
Therefore, according to actual needs, setting an appropriate stride value can improve calculation speed and efficiency while ensuring accuracy.

2.6.3 Pooling

Maximum pooling layer operation:
insert image description here
one of the more pooling methods used in downsampling is to set the maxpooling size to the 2 * 2default stride=2, which is equivalent to dividing the image into four blocks and finding the maximum value in each block. Then stitch the maximum value together:
insert image description here
Note: using maxpooling, the number of channels remains unchanged

code:

import torch

# 定义输入矩阵
input = [3, 4, 6, 5,
         2, 4, 6, 8,
         1, 6, 7, 8,
         9, 7, 4, 6]
# 转换为张量,并设定维度
input = torch.Tensor(input).view(1, 1, 4, 4)

# 定义最大池化层,设定核大小为2
maxpooling_layer = torch.nn.MaxPool2d(kernel_size=2)

# 输入矩阵通过最大池化层处理
output = maxpooling_layer(input)

# 输出处理后的结果
print(output)

2.7 Realize the calculation process

2.7.1 Detailed calculation process

Next, let's implement a simple neural network to realize the identification and processing of the minist data set:
insert image description here
let's take a look at the calculation process in detail:
insert image description here

(batch, 1, 28, 28)Represents a data set with a size of batch, where the shape of each data point is (1, 28, 28), and the shape of the data point (1, 28, 28)indicates that this data point is a three-dimensional array, including 1 channel (channel), 28 rows (rows) and 28 columns (columns) image data.

insert image description here

The first convolution kernel size we used is 5 x 51, the number of input channels is 1, and the number of output channels is 10.
The size after convolution (batch,10,12,12)
10represents 10 channels, 12,12representing the size after convolution (take the row as an example, the number of rows after convolution = 原行数 - 卷积核行数 + 1)

insert image description here
Next, after a maximum pooling, the size 24 x 24is reduced by half to become12 x 12

insert image description here
Then go through a 5 x 5convolution kernel and set the output to 20a channel
, so the size becomes(20,8,8)

insert image description here
Then perform maximum pooling, which becomes(20,4,4)

insert image description here
Finally, the multidimensional feature map is flattened ( flatten ) to convert the data into a one-dimensional feature vector. Finally, through a fully connected layer, 320elements are mapped to 10feature values

The whole process is as follows:
insert image description here

2.7.2 Code implementation

whole code:

# 定义一个名为Net的类,继承自torch.nn.Module类
class Net(torch.nn.Module):
    def __init__(self):
        # 调用父类的构造函数
        super(Net, self).__init__()
        # 定义一个卷积层,输入通道数为1,输出通道数为10,卷积核大小为5
        self.conv1 = torch.nn.Conv2d(1, 10, kernel_size=5)
        # 定义第二个卷积层,输入通道数为10,输出通道数为20,卷积核大小为5
        self.conv2 = torch.nn.Conv2d(10, 20, kernel_size=5)
        # 定义一个最大池化层,池化核大小为2
        self.pooling = torch.nn.MaxPool2d(2)
        # 定义一个全连接层,输入节点数为320,输出节点数为10
        self.fc = torch.nn.Linear(320, 10)

    # 定义模型的前向传播过程
    def forward(self, x):
        # 获取批次大小
        batch_size = x.size(0)
        # 经过第一层卷积层,再通过ReLU激活函数,然后进行最大池化
        x = F.relu(self.pooling(self.conv1(x)))
        # 经过第二层卷积层,再通过ReLU激活函数,然后进行最大池化
        x = F.relu(self.pooling(self.conv2(x)))
        # 将多维特征图展平为一维向量
        x = x.view(batch_size, -1)  # flatten展开成一维向量
        # 经过全连接层,输出结果
        x = self.fc(x)
        return x

# 实例化Net类,创建一个名为model的模型
model = Net()

2.7.3 Some possible difficulties

Why use RELU?
Through the function of the activation function, the negative value in the feature map can be set to zero, thereby realizing nonlinear mapping and enhancing the semantic expression ability of the feature map for the image. Therefore, the ReLU activation function is usually used after the convolutional layer to enhance the expressive ability of the feature map.

What does -1 in x.view(batch_size, -1) mean?
-1Indicates the automatically derived dimension size such that the size of the tensor can be correctly flattened into (batch_size, num_features)the shape. Here, the value of is automatically derived -1from the sum of the total number of elements in the tensor . batch_sizeThis operation flattens the multi-dimensional feature map into a one-dimensional vector, which is convenient for the subsequent processing of the fully connected layer.

3 Comes with a wallpaper (^ ^)/~

insert image description here

Guess you like

Origin blog.csdn.net/m0_56494923/article/details/129452728