Keras deep learning application 1-face recognition based on convolutional neural network (CNN) (on)


For specific code implementation, please refer to
Keras Deep Learning Application 1-Face Recognition Based on Convolutional Neural Network (CNN) (below)

Code download

Github source code download address:
https://github.com/Kyrie-leon/CNN-FaceRec-keras

1. CNN overview

1.1 CNN development history

Convolutional neural networks are inspired by biological natural visual cognitive mechanisms. In 1962, Nobel Prize winners DH Hubel and TN Wiesel, these two scholars discovered their unique network structure when they studied specific neurons in the cortex of the cat's brain. Based on this inspiration, some scholars put forward the concept of a new recognition machine (recognition), which is also the first real convolutional neural network in deep learning. Since then, scientists have conducted in-depth research and improvement on CNN.
Convolutional neural network is a kind of deep machine learning method of multi-layer neural network under feedforward neural network. As shown in the figure below, you can see the approximate distribution of neural network.
Insert picture description here
Figure 1-1 Neural network distribution map
Compared with traditional image processing algorithms, the advantage of CNN is that it can effectively avoid the need to perform a large amount of tedious manual feature extraction on the image in the preliminary preparation. This also shows that CNN can directly start from the original pixels of the image, and only through some pre-processing processes, it can identify the characteristics of the image. But at the time, because the research environment was not ideal, for example, the number of training data sets was limited, and the computer's graphics processing capabilities could not keep up, the two basic factors that deep learning relied on: "fuel" and "engine", were not ideal. Operational standards. This also leads to the well-known LeNet-5 neural network structure, which does not perform well in the processing of complex problems. As the development of neural networks has entered a low ebb, CNN research has not made breakthroughs.

Since 2006, driven by big data and high-performance computing platforms, scientists have begun to reinvent CNN and try to overcome the difficulty of training deep CNN. The most famous CNN structure is named: AlexNet, which is also a more classic structure. In the image recognition task, AlexNet has made a major breakthrough and won the ImageNet championship with record results. Its overall framework is similar to LeNet-5, but the number of layers is deeper after improvement.

After the AlexNet model achieved remarkable results, scientists have successively proposed other improved models with perfect functions and good effects. So far, the development context of convolutional neural networks is shown in the figure below.
Insert picture description here

The evolution of convolutional neural networks (solid lines indicate model improvements, dashed lines indicate model innovations)

From the structural point of view, a significant feature of the development of CNN is that the number of layers becomes deeper and deeper, and the structure becomes more and more complex. By increasing the depth, the network can abstract deeper and more abstract features.

1.2 The basic structure of CNN

Convolutional neural network, as the name suggests, is named from the convolution operation, and the purpose of convolution is to extract certain features from the image, just as the visual system recognizes the edges of objects with directional characteristics: first detect the horizontal Lines, vertical lines, oblique lines and other basic object edges with directionality, and then combine several edges into parts of the city object, and finally analyze what the object is based on the detected part of the object .

A typical convolutional neural network is a multi-layer network structure, and each layer contains a certain number of two-dimensional planes, and there are many independent neuron structures on these two-dimensional planes. The CNN network usually contains 1 to 3 feature extraction stages, which are composed of convolution and pooling, followed by a classifier. This part is usually composed of 1 or 2 fully connected layers.
A simple convolutional neural network model is shown in Figure 1-3. As can be seen from the figure, the network model contains two feature extraction stages, a classifier (fully connected layer). In the general structure of CNN, it is mainly composed of input layer, hidden layer and output layer. The hidden layer also contains three parts : convolution layer (used for feature extraction) and pooling layer (used for feature optimization selection) , Fully connected layer (equivalent to the hidden layer in the traditional multi-layer perceptron).
Insert picture description here

Two, CNN algorithm principle

2.1 CNN basic network structure

A typical convolutional neural network is mainly composed of five structures: input layer, convolution layer, pooling layer, fully connected layer, and excitation layer . An image is input from the input layer, and then passes through multiple layers of feature extraction such as convolutional layer and pooling layer, and finally abstracted into the feature with the highest amount of information. It is input from the fully connected layer to the Softmax layer for classification.

2.1.1 Input layer

The input layer, as the name implies, is the data input of the entire CNN. Usually a preprocessed image pixel matrix is ​​input into this layer. For different types of image input, the image type needs to be defined. For example, a black and white image is a single-channel image. An RGB picture is a three-channel picture.
When an image is input into the convolutional neural network, it will go through several convolutional and pooling layers until the fully connected layer is reduced in dimension, so the number and resolution of the image will have a certain impact on the performance of the model .

2.1.2 Convolutional layer

The convolutional layer is the core part of the entire convolutional neural network, and its role is to extract image features and data dimensionality reduction . The convolution layer contains multiple convolution kernels, and the size of the convolution kernel is usually manually designated as 3×3 or 5×5.
Assuming that the size of the convolution kernel is 3×3, the convolution kernel is:
(2-1)

The figure explains in detail the execution process of the convolution operation. First select the 3×3 size conv convolution kernel in the figure, and then divide a matrix of the same size as the convolution kernel from the upper left corner of matrix A, and multiply this matrix and the elements at the corresponding position of conv one by one Then sum, the value obtained is the value of the first row and first column of the new matrix, and then repeat the above steps (ie step2-step9 in the figure) according to the window sliding order from left to right from top to bottom, and finally get A new matrix (the right matrix in step9). This matrix saves all the features after the image convolution operation.
Insert picture description here

For the boundary point of the picture, the convolution kernel of CNN has two processing methods. One is to take no operations on the input matrix and directly perform the convolution operation in the order shown above, but after this processing method, the size of the output matrix changes, and the input matrix is ​​larger than the output matrix. Another processing method is to perform zero-padding on the boundary of the original matrix and then perform convolution calculation. This processing method makes the size of the matrix still unchanged after output, as shown in the figure.
Insert picture description here

2.1.3 Pooling layer

Pooling is also called sub sampling or down sampling. The pooling layer can effectively reduce the size of the image and reduce the parameters in the fully connected layer while preserving the characteristics of the original image . Therefore, this step is also called dimensionality reduction . Adding a pooling layer not only speeds up the calculation of the model, but also prevents overfitting of the model.

Pooling is also accomplished by moving a window similar to a convolution kernel in a certain order. The difference from the convolutional layer is that pooling does not need to perform weighting operations on matrix nodes. Commonly used pooling operations include maximum and average operations. The corresponding pooling layer is also called the max pooling layer and the average pooling layer. The rest of the pooling operation is used less in practice, so this article will not go into details.

The pooling window also needs to manually specify the size, whether to use all zero padding and other settings. In this paper, the largest pooling layer is selected, the pooling window size is 2×2, and all zero padding is not used, the pooling process is shown in Figure 2-3. First divide the feature matrix into 4 2×2 size matrices, and then take out the maximum pixel value Max in each 2×2 matrix respectively, and finally compose the maximum pixel value in each 4 matrix into a new matrix. Is the size of the pooling window, which is 2×2. Figure 2-3 shows the operation result of the matrix on the right after the maximum pooling operation. Similarly, the new matrix obtained by averaging each 2×2 matrix is ​​the result of the average pooling operation.
Insert picture description here

2.1.4 Fully connected layer

The fully connected layer is mainly used to integrate the features of the convolutional layer and the pooling layer. Since both the convolutional layer and the pooling layer can extract features of the face, after multiple layers of processing, the information in the image has been extracted into features with higher information content. These features become the final expression of the image information after passing through the fully connected layer, and are input into the classifier as input features to complete the classification task. The fully connected layer is the most difficult part of the entire network to train. If there are too few training samples, it may cause overfitting. Therefore, random inactivation (dropout) technology is used to suppress the overfitting phenomenon of the model. The so-called random deactivation is to reduce the interdependence between nodes by randomly resetting the weights of some nodes of the hidden layer (usually a fully connected layer) to zero during the learning process.

2.1.5 Incentive layer

One of the reasons why a multilayer neural network is stronger than a single-layer perceptron is that the activation function introduces nonlinear learning and processing capabilities to it. Due to its simple structure, the single-layer perceptron has very limited learning ability. Under the limitation of a single model, it can only be used to solve linearly separable problems and cannot be studied in depth.

In the research of neural network, the common activation functions mainly include step function, sigmoid function, ReLU function and so on. Among them, the step function can directly map the input data to 1 or 0 (1 means active, 0 means inhibited), but because it is not continuous, not smooth, and non-conductive, it is usually not used as a Activation function. The sigmoid function has rarely been used in recent years, because it will produce an unsolvable gradient disappearance problem when the neural network is back-propagated training, which leads to unsatisfactory effects of training deep networks. So here is an introduction to the ReLU function.
In the multi-layer neural network, the object to be studied is more complex, and a stable, fast, and better predictive function is required as the activation function. To solve this need, the linear rectification function (ReLU) was proposed in a paper published by Hinton et al. in 2012. Through research, it is found that the convergence speed of ReLU is much faster than other activation functions, and it can more effectively avoid the problems of gradient explosion and gradient disappearance (the number of layers is too large). At present, the ReLU function has been applied to multilayer neural networks as a common activation function. It is usually defined as the formula:
f (x) = max (0, x) f(x)=max(0,x)f(x)=max(0,x )
can also be written as
Insert picture description here

The ReLU function defines the nonlinear output result of the current neuron after linear transformation, and its output can be expressed as. It can be seen from the mathematical definition of the function: when the input is greater than 0, the output is the input, and when the input is less than 0, the output remains at 0, as shown in Figure 2-4.
Insert picture description here

Guess you like

Origin blog.csdn.net/qq_40076022/article/details/109327847