1. Computer development and application
The neural network is mainly used for feature extraction
. The convolutional neural network is mainly used in the image field to solve the risks of over-fitting and too many weights in the traditional neural network.
1. The development of CV field
Computer vision The development of computer vision was rescued after AlexNet appeared in 2012. Before that,
there were some traditional machine learning algorithms, but the error rate never went down. After 12 years, deep learning began to take center stage. Until 2016, The error rate of the deep learning model has been lower than the level of human naked eye recognition. In 2017, the challenge achieved the expected results and will no longer be held.
2. Convolutional Neural Network (CNN) Application Scenarios
Ⅰ Detection task
Target detection
Semantic segmentation (person, car, building...)
Instance segmentation (person a, person b, car a, car b, car c, building a...)
Ⅱ Classification and Search
Classification: What is this image?
Retrieval: Input an image and return the same content, similar to the photo shopping in tb and jd. If you don’t know what it is, you can retrieve it by taking a photo, that is, find something with a closer similarity
Ⅲ Super-resolution reconstruction
Given a blurry image, perform resolution reconstruction on it to increase the resolution
Ⅳ Unmanned driving
Ⅴ face recognition
Ⅵ Applications in other fields
① Cell detection
②OCR font recognition
③ Logo recognition
3. The difference between convolutional neural network and traditional neural network
Traditional neural network: the input is a list of features, and the data needs to be reshaped into a vector for processing.
Convolutional neural network: the original image features are saved and processed directly
2. Overall Architecture of Convolutional Neural Network
The overall architecture of the convolutional neural network mainly includes:输入层、卷积层、池化层、全连接层
1. Input layer
This is relatively simple, that is, input the image to be trained or to be detected
2. Convolution layer
The convolution operation is also relatively simple, so I won't go into too much detail here, and only choose the core content.
The convolution operation is actually the inner product operation:对应元素相乘再相加
Ⅰ Sliding window step size
The original image is 7*7
the pixel size, the convolution kernel size is 3*3
, red and green are 两次
convolutions, and the sliding window step 1
is 2
, that is, the convolution is performed line by line; the left to right step is 2, then Up and down is also a step size of 2
Ⅱ convolution kernel (filter) size
The size of the commonly used convolution kernel is 3*3
, of course 7*7
, but generally the size of the convolution kernel with an odd number determines the size of the feature map obtained after convolution
Ⅲ Number of convolution kernels
The number of convolution kernels depends on the depth of the feature map obtained after convolution.
The kernel of each convolution kernel is different.
- The size of the convolution kernel determines [32,32]—>[28,28]
- The number of convolution kernels determines the depth of the final activation maps 6
- Because the image has 3 color channels, the size of the convolution kernel must also be 3
Ⅳ Edge Filling
In the process of convolution, it is easy to see that the edge area is less effective than the middle area, but the edge information is not unimportant, so the information that was originally in the edge area can be moved to it by edge filling. , so as to make up for the lack of some boundary information, and it is relatively fair to the boundary features.
Generally, when adding edges, adding a circle is all 0 , and the added things cannot have other effects, so add 0 ; sometimes for the convenience of calculation, you can also customize the number of circles to add.
Ⅴ Convolution result calculation
The calculation formula of the convolution result given by PyTorch official website
can be simplified as follows:
Example:输入数据为[32,32,3],使用10个[5,5,3]的卷积核进行卷积,步长为1,边界填充为2,求最终得到的特征图规模。
Ⅵ Convolution parameter sharing
Convolution parameter sharing means: if a color image with three color channels, the three channels are all convolved with the same convolution kernel parameter.
If 10 convolution kernel filters of 5 5 3 are used for convolution, 5*5*3=75,每个卷积核需要75个参数
, 75*10=750,共10个filter
, 750+10=760,每个filter都有一个偏置项bias
, 760 weight parameters are finally required. Compared with the full connection, the weight is much less.
3. Pooling layer
The pooling layer is also called downsampling and compression. Its main purpose is to reduce the characteristic parameters without any operation.
Mainly include: maximum pooling, average pooling and other pooling operations, among which maximum pooling is the most commonly used and the effect is the best
4. Fully connected layer
After convolution (the depth of the feature map is increased) and pooling (reducing the feature parameters), a three-dimensional feature map is obtained. At this time, the feature map needs to be stretched into a row of vectors, and finally connected to the FC fully connected layer according to The actual situation is output.
Example:最终归为5分类任务,即最后得到5个结果的概率值
5. Receptive field
Two convolutions are used here 3*3
, and the sliding window step size is 1, and its receptive field is 5*5
; if 3 3*3
convolution kernels are used, its receptive field is 7*7
, that is, rolled again; this 7*7
receptive field is the same as using 7*7
one The results obtained by the convolution kernel are the same. Why do many papers still use 3*3
the convolution kernel?
很明显,堆叠小的卷积核所需的参数更少一些,并且卷积过程越多,特征提取也会越细致,加入的非线性变换也随着增多,还不会增大权重参数个数,这就是VGG网络的基本出发点,用小的卷积核来完成体特征提取操作。