About convolution theory of neural network system design implementation

　　Artificial neural network convolution neural network, referred to as CNN, commonly used in the analysis of visual images of deep learning. Figuratively speaking, these networks of neurons biological structure is fit from the abstract. As each can communicate with each other biological neurons general, CNN produce similar output in accordance with a communication input.

　　To work on the origin of CNN, and that is probably the beginning of the 1980s, with the rapid advances in computing power and the most recent technology continues strong, CNN this popular. In short, CNN technology allows within a reasonable period of time, large amounts of data and complex training convoluting dimension "nervous" operation using its own algorithms and scalability. Currently, CNN is mainly used: artificial intelligence-based virtual assistant, automatic photo tagging, tagging and video aspects of autonomous vehicles and so on.

First, the difference between conventional convolution neural network and neural network

CNN can handle higher resolution pictures, to solve the great problems of conventional neural network computation cost can not be solved; for example: a size of an image in consideration of 224,224 and 3 channels, which corresponds to the 224x224x3 = 150528 input characteristics. A typical neural network hidden layer having 1000 nodes in the first layer itself 150528 × 1000 parameters. This conventional neural networks, it is simply incalculable;
A detector invariance (Translation invariance) characteristics, in which a micro period of time regardless of the recognition process, or the size of the local region image recognition, object recognition objective has invariant properties.

Two, CNN's working mechanisms and principles

　　Convolution layer is based on a mathematical convolution operation. Convolution consists of a group consisting of filter, like a two-dimensional matrix of numbers. Then, the input filter and combined to produce an output image. Convolution in each layer, we use a filter and the filter is slid onto the image to perform a convolution operation. The main task is to convolution filter values and the image pixel matrix multiplication, and the resulting value obtained by adding the output.

　　 CNN can help us find specific localized image features, for example: by means of an edge in the image, we can be in the network in the initial layer, the use of these features to seek a simple model. Let us call a local modeling process. Re-use, the divided image horizontal and vertical edges in the local model. Of course, also by means of a deeper hierarchy, complex pattern for the second time to build.

　　A typical case, the vertical edge detection is exemplary:

Third, the convolution neural network architecture

　　 First, convolution neural network set up by many layers of convolution, convolution layer is relatively basic part of CNN. It is primarily responsible for calculating the load carrying CNN. The base layer helps reduce the space defined by the nerve, which is in line with this feature because we here tentatively hailed these base layer unit cell. This allows us to CNN during a similar operation is in progress, greatly reducing the amount of computation required and weights. Currently, the most mainstream of the detection process can be said to be the biggest pool, which can detect the maximum output of the nearest principle. The detecting unit cell provides invariance we mentioned earlier, it is precisely meant that an object will be identified, whether it appears at any position on the frame.

　　 Here, it may be formed by a new structure, called full connection layer (FC) unit cell portion of the linear link according to certain rules. This layer of neurons of the previous layer or a subsequent layer of all neurons, has full connectivity, just as in conventional neural networks seen. That's why it can be like a conventional neural network, as calculated by matrix multiplication and bias effect. FC layer also represents a mapping relationship between input and output. And after the nonlinear network layer, since convolution is a linear operation, and away from the linear image, so often nonlinear convolution directly on layer after layer, introduced directly activate nonlinear mapping.

　　There are several non-linear operation, popular are:

Sigmoid: This non-linear structure has representation on the mathematical level. For example: F (X) = 1/1 + exp (-x) .. It takes a real number between and compressed to 0-1. However, it is a fatal problem - the problem is the disappearance of the gradient, which is a local gradient becomes smaller gradient back-propagation resulting in the disappearance of the phenomenon.
Tanh: real numbers can be compressed to the range [-1,1]. And Sigmoid as the trigger is saturated, but the difference is that its output is zero at the center.
ReLU: linear correction unit (Relu), while calculation function ƒ (κ) = max (0, κ). In other words, just at the threshold value of zero at the trigger. Compared with the Sigmoid and tanh, relu more reliable convergence speed is increased more than 6 fold.

Fourth, the convolution neural network design

　　In full knowledge of the constituent elements and working mechanisms of CNN, we can build a convolution of the neural network. For example: we can use CIFAR 10, which is a data set of a training set of 50,000 examples and 10,000 examples of the composition. Each example is a color image of 32 × 32, 10 are interconnected from the class label.

　　In the process of fitting the training data with the model, we use the data enhancement methods. In the constructed network, using the batch level, through the adoption of a Gaussian distribution units mandatory in nature, avoid the weight matrix improper initialization problem. CNN model for architecture implemented: