4. Overall basic structure of convolutional neural network

1. Computer development and application

The neural network is mainly used for feature extraction
. The convolutional neural network is mainly used in the image field to solve the risks of over-fitting and too many weights in the traditional neural network.

1. The development of CV field

Computer vision The development of computer vision was rescued after AlexNet appeared in 2012. Before that,
there were some traditional machine learning algorithms, but the error rate never went down. After 12 years, deep learning began to take center stage. Until 2016, The error rate of the deep learning model has been lower than the level of human naked eye recognition. In 2017, the challenge achieved the expected results and will no longer be held.
insert image description here

2. Convolutional Neural Network (CNN) Application Scenarios

Ⅰ Detection task

Target detection
insert image description here
Semantic segmentation (person, car, building...)
Instance segmentation (person a, person b, car a, car b, car c, building a...)
insert image description here

Ⅱ Classification and Search

Classification: What is this image?
insert image description here
Retrieval: Input an image and return the same content, similar to the photo shopping in tb and jd. If you don’t know what it is, you can retrieve it by taking a photo, that is, find something with a closer similarity
insert image description here

Ⅲ Super-resolution reconstruction

Given a blurry image, perform resolution reconstruction on it to increase the resolution
insert image description here

Ⅳ Unmanned driving

insert image description here

Ⅴ face recognition

insert image description here

Ⅵ Applications in other fields

① Cell detection

insert image description here

②OCR font recognition

insert image description here

③ Logo recognition

insert image description here

3. The difference between convolutional neural network and traditional neural network

Traditional neural network: the input is a list of features, and the data needs to be reshaped into a vector for processing.
insert image description here
Convolutional neural network: the original image features are saved and processed directly
insert image description here

2. Overall Architecture of Convolutional Neural Network

The overall architecture of the convolutional neural network mainly includes:输入层、卷积层、池化层、全连接层
insert image description here

1. Input layer

This is relatively simple, that is, input the image to be trained or to be detected

2. Convolution layer

The convolution operation is also relatively simple, so I won't go into too much detail here, and only choose the core content.
The convolution operation is actually the inner product operation:对应元素相乘再相加

Ⅰ Sliding window step size

The original image is 7*7the pixel size, the convolution kernel size is 3*3, red and green are 两次convolutions, and the sliding window step 1
insert image description here
is 2, that is, the convolution is performed line by line; the left to right step is 2, then Up and down is also a step size of 2
insert image description here

Ⅱ convolution kernel (filter) size

The size of the commonly used convolution kernel is 3*3, of course 7*7, but generally the size of the convolution kernel with an odd number determines the size of the feature map obtained after convolution

Ⅲ Number of convolution kernels

The number of convolution kernels depends on the depth of the feature map obtained after convolution.
The kernel of each convolution kernel is different.
insert image description here

  • The size of the convolution kernel determines [32,32]—>[28,28]
  • The number of convolution kernels determines the depth of the final activation maps 6
  • Because the image has 3 color channels, the size of the convolution kernel must also be 3

insert image description here

Ⅳ Edge Filling

In the process of convolution, it is easy to see that the edge area is less effective than the middle area, but the edge information is not unimportant, so the information that was originally in the edge area can be moved to it by edge filling. , so as to make up for the lack of some boundary information, and it is relatively fair to the boundary features.
Generally, when adding edges, adding a circle is all 0 , and the added things cannot have other effects, so add 0 ; sometimes for the convenience of calculation, you can also customize the number of circles to add.
insert image description here

Ⅴ Convolution result calculation

The calculation formula of the convolution result given by PyTorch official website
insert image description here
insert image description here
can be simplified as follows:
insert image description here

Example:输入数据为[32,32,3],使用10个[5,5,3]的卷积核进行卷积,步长为1,边界填充为2,求最终得到的特征图规模。
insert image description here

Ⅵ Convolution parameter sharing

Convolution parameter sharing means: if a color image with three color channels, the three channels are all convolved with the same convolution kernel parameter.
insert image description here
If 10 convolution kernel filters of 5 5 3 are used for convolution, 5*5*3=75,每个卷积核需要75个参数, 75*10=750,共10个filter, 750+10=760,每个filter都有一个偏置项bias, 760 weight parameters are finally required. Compared with the full connection, the weight is much less.

3. Pooling layer

The pooling layer is also called downsampling and compression. Its main purpose is to reduce the characteristic parameters without any operation.
Mainly include: maximum pooling, average pooling and other pooling operations, among which maximum pooling is the most commonly used and the effect is the best
insert image description here

4. Fully connected layer

After convolution (the depth of the feature map is increased) and pooling (reducing the feature parameters), a three-dimensional feature map is obtained. At this time, the feature map needs to be stretched into a row of vectors, and finally connected to the FC fully connected layer according to The actual situation is output.
insert image description here
Example:最终归为5分类任务,即最后得到5个结果的概率值
insert image description here

5. Receptive field

insert image description here
Two convolutions are used here 3*3, and the sliding window step size is 1, and its receptive field is 5*5; if 3 3*3convolution kernels are used, its receptive field is 7*7, that is, rolled again; this 7*7receptive field is the same as using 7*7one The results obtained by the convolution kernel are the same. Why do many papers still use 3*3the convolution kernel?
insert image description here
很明显,堆叠小的卷积核所需的参数更少一些,并且卷积过程越多,特征提取也会越细致,加入的非线性变换也随着增多,还不会增大权重参数个数,这就是VGG网络的基本出发点,用小的卷积核来完成体特征提取操作。

Guess you like

Origin blog.csdn.net/qq_41264055/article/details/131320344