Convolution neural network AlexNet

1 Introduction

LeNet convolution neural network is one of the first to promote the depth of learning and development field. After this is completed by Yann LeCun pioneering work since 1988, a number of successful iterations is named LeNet5. AlexNet is Alex Krizhevsky, who published in 2012 "ImageNet Classification with Deep Convolutional Neural Networks" paper presented, and won the championship in 2012 ImageNet LSVRC, caused a great sensation. AlexNet can be said that a network of historic architecture, before that, the depth of learning has been quiet for a long time, since AlexNet born in 2012, the championship is back ImageNet do with convolutional neural network (CNN), and and deeper level, so that CNN has become the core of image recognition algorithm model classification, brings depth study of the outbreak. This article will explain in detail its use Keras AlexNet model implementation process. Before you begin, first explain convolution neural network.

2. convolutional neural network

2.1 convolution layer

Convolution is a mathematical operation, which in some way to a function "apply" to another function, the result can be understood as "blend" two functions. But how does this help the detection object in the image? It proved very good at detecting the image convolution of a simple structure and a simple combination of these features to construct more complex features. In a convolutional network, this process occurs in a series of layers, each layer of the output before executing a convolution. Convolution is the object of different feature extraction inputs , a first convolutional layer layers can extract only some low-level features such as edges, lines and angle, more layers of the web from low-level feature extraction is more complicated iterative Characteristics.
So, what would you use it convolution in computer vision? To understand this, first understand what the image in the end yes. The image is a second or third byte array, second array contains two dimensions width and height, the third-order array have three dimensions, including width, height, and channels, the second order of FIG grayscale, RGB and FIG third order (including three channels). Simple Explanation byte value is an integer value describing the number of a particular channel that must be used on the corresponding pixel. Basically Therefore, when dealing with computer vision, an image may be thought of as a 2D array of numbers (for RGB or RGBA image, they can be thought of as three or four overlapping 2D digital array).
Micro-letter picture _20190619185025.png
Figure 1: a schematic diagram of a convolution operation (left input, the filter, the right output) point calculation me Demo
It should be noted that the pace and size of the filter parameters are super, which means that the model will not learn them. So you must apply scientific thinking to determine what the value of these amounts that best suits your model. For convolution, and finally a concept you need to understand is filled. If you can not fit the image (will stride into account) with the filter in the integer times, then you have to fill the image. This can be accomplished in two ways: VALID SAME filling and filling. Basically, VALID filled discard all remaining values ​​of image edges. That is, if the filter is a 2 X 2, steps 2, 3 for the width of the image, then ignores the third column filled VALID values ​​from the image. SAME added to the image value of the edge fill (usually 0) to increase its dimension, until the filter fitting can be integer times. Such filling is usually carried out in a symmetrical manner (i.e., it tries to add the same number of columns / rows on each side of the image).

2.2 active layer

The main function of the active layer is active role, then what is the activation function? In the neural network, when a certain input excitation intensity, neurons will be activated to produce an output signal. This analog cellular activation functions, called the activation function. The output of the neuron F, as a function of the input x, to its standard method is to use modeling or sigmoid function . In terms of gradient descent training time, AlexNet proposed ReLu function 6 times faster than the above manner . ReLU called the linear correction unit (Rectified Linear Units) is an operation (applied to each pixel) for the elements, non-linear operation and the replacement of all negative pixel values are zero map features. Its purpose is to introduce non-linear convolution factor in neural networks, because in real life we want to use most of the neural network learning data are nonlinear (a linear convolution operation - carried out by elements of the matrix multiplications and additions Therefore, we hope to solve nonlinear problems by introducing such a nonlinear function ReLU).
max.png
FIG 2: ReLU function (input 0 output is less than 0, the output of the original value input is greater than 0)

2.3 Pool layer

Another important layer you'll see in the convolution of the network are pooled layer. Cell layer has various forms: maximum, average, sum and the like. But the most common is the maximum pooling, wherein the input matrix are split into segments of equal size, each segment using maximum output corresponding to elements of the matrix is filled. Pooling layer may be considered as a pixel interval s pool grid of cells, each cell unit summarizes the position of the center of the pool size of the neighborhood of z × z. If we set s = z (pooling step with the same window size), we get CNN commonly incorporated in the conventional local. If we set s <z (each step movement is less than the window length of the pool), we obtain a pool of overlap. For the first time in the overlapping pool of in AlexNet to avoid over-fitting.
pooling.png
Figure 3: Maximum pooled (left divided into four 16X16 size, the largest circle sunspots number)

2.4 Full connection layer

Full connection layer is a conventional multilayer perceptron, it uses the softmax activation function in the output layer (other classifiers may be used, such as SVM). "Fully connected" to each neuron in one layer before this term means are connected to each neuron in the next layer. This convolution is a common network layer, wherein all outputs of the previous layer are connected to all nodes on the next layer. Convolution layer is converted into fully connected layers, the total number of neurons unchanged.

3. AlexNet model

3.1 Model Introduction

CNN.png
FIG 4: AlexNet model (5 + 3 convolutional layer 8 layer fully connected neural network layer, so using 2GPU divided into two parts)
AlexNet model parameters and contains 60 million 650,000 neurons, comprising five layers convolution , which is followed by the largest pool of several layers (max-pooling) layer, and the three layers fully connected, there is finally a layer 1000 softmax path. In order to speed up the training, AlexNet Relu using a nonlinear activation function as well as an efficient GPU-based convolution methods. To reduce over-fitting fully connected layers, AlexNet using the latest "Dropout" to prevent over-fitting, the method proved to be very effective.

3.2  local normalized (Local Response Normalization, referred to as the LRN)

Neurobiology there is a concept called "lateral inhibition" (lateral inhibitio), it refers to an inhibition of activation of neighboring neurons neurons. Normalizing (Normalization) The purpose is to "inhibit", is the local normalized adopts the idea of ​​"lateral inhibition" to achieve local inhibition, especially when such use ReLU "lateral inhibition" very useful, because the response result ReLU is unbounded (very large), it is necessary to normalize. Use local normalized method helps to increase the generalization ability.
image.png

4. AlexNet process overfitting

4.1 Data Expansion

Reducing the image data of the simplest and most common method of over-fitting, using the tag - Reserved conversion artificially expanded data set. AlexNet model using two different forms, both forms allow conversion image is generated from the original image with a small calculation amount, it is converted image need not be stored on disk. The first form of image data generated by the expansion and conversion level of the reflected components. The second form data extension comprises varying the intensity of the training images in RGB channels.
LRN.png
Figure 5: Data expansion in three ways

4.2 Dropout

A layer of neurons, Dropout do is a probability of 0.5 of each hidden layer neuron output is set to zero. In this manner "Dropped out" neurons neither for forward propagation, nor participate backpropagation. So every time a proposed input, the neural network will try a different structure, shared among all the weight of these structures. Because neurons can not rely on the presence of other specific neurons, so this technique reduces the complexity of the interaction of neurons adapted relationship. For this reason, forced to learn to be more robust features which are useful in a number of different random subset in combination with other neurons. If you do not Dropout, AlexNet network will show a lot of over-fitting.
Dropout.png
FIG 6: Dropout schematic

5. Code Explanation

5.1 Data collection and import dependent libraries

# (1) Importing dependency
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, Flatten, Conv2D, MaxPooling2D
from keras.layers.normalization import BatchNormalization
import numpy as np
np.random.seed(1000)
# (2) Get Data
import tflearn.datasets.oxflower17 as oxflower17
x, y = oxflower17.load_data(one_hot=True)
# (3) Create a sequential model
model = Sequential()

AlexNet model in or thousands classification problems, the count of computer power demanding. Here we reproduce the sake of simplicity, the use of TensorFlow dataset oxflower17, this data set to be 17 flowers classification, each category has 80 photos. Keras artificial neural network comprising a plurality of common blocks implemented, such as a layer, a target, activation function, and a series of optimization tools, can more easily handle the image and text data. There are two main model in Keras: Sequential order model and using the model-based functional Model API. Sequential model used here.

5.3  The first convolution pooled +

# 1st Convolutional Layer
model.add(Conv2D(filters=96, input_shape=(224,224,3), kernel_size=(11,11), strides=(4,4), padding='valid'))
model.add(Activation('relu'))
# Pooling 
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
# Batch Normalisation before passing it to the next layer
model.add(BatchNormalization())

image.png
A convolution layer size 224X224X3, the size of the convolution kernel 11X11X3, number 48, step 4.
Layer on convolution calculation is as follows:

  • Size of the input data
  • 4 Super parameters (model not learn optimization):
    1. The number of filters
    2. Filter in the spatial dimension
    3. Convolution operation step
    4. Number of zero-filling (filling SAME)
  • Size of the output data

Where W1 = 224, H1 = 224, D1 = 3, K = 48, F = 11, S = 4, P = 1.5.

Layer 2 with a convolution calculation W2 = (224-11 + 3) / 4 + 1 = 55, empathy H2 = 55, D2 = K * 2 = 96.

After convolution, the output image size is characterized 55X55X96.
As used herein, the largest pool of steps of S = 2, then W = (55-3) / 2 + 1 = 27.
After a further pooling, wherein the output image size is 27X27X96.

The second convolution + 5.4 Pooling

# 2nd Convolutional Layer
model.add(Conv2D(filters=256, kernel_size=(11,11), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
# Pooling
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
# Batch Normalisation
model.add(BatchNormalization())

image.png
Layer 2 convolutional size 55X55X96, 5X5 convolution kernel size, number 128, a step size.

Similarly, the convolution can be calculated after the size of the image feature 27X27X256.
After pooling, wherein the output image size is 13X13X256. **

5.5 The third convolution

# 3rd Convolutional Layer
model.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
# Batch Normalisation
model.add(BatchNormalization())

image.png
After convolution, the image size is characterized 13X13X384.

5.6 Fourth convolution

# 4th Convolutional Layer
model.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
# Batch Normalisation
model.add(BatchNormalization())

image.png
After convolution, the image size is characterized 13X13X284.

The fifth convolution + 5.7 Pooling

# 5th Convolutional Layer
model.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
# Pooling
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
# Batch Normalisation
model.add(BatchNormalization())

image.png
After convolution, the image size is characterized 13X13X256.

After pooling, wherein the image size is 6X6X256.

5.8 Full connection layer 6

# Passing it to a dense layer
model.add(Flatten())
# 1st Dense Layer
model.add(Dense(4096, input_shape=(224*224*3,)))
model.add(Activation('relu'))
# Add Dropout to prevent overfitting
model.add(Dropout(0.4))
# Batch Normalisation
model.add(BatchNormalization())

image.png
Full-size connecting layer 6 6X6X256, a total of 4,096 neurons, the output vector 4096X1.

5.9 Full connection layer 7

# 2nd Dense Layer
model.add(Dense(4096))
model.add(Activation('relu'))
# Add Dropout
model.add(Dropout(0.4))
# Batch Normalisation
model.add(BatchNormalization())

image.png
Full size 4096X1 connection layer 7, a total of 4,096 neurons, the output vector 4096X1.

5.10 fully connected layers 8

# 3rd Dense Layer
model.add(Dense(1000))
model.add(Activation('relu'))
# Add Dropout
model.add(Dropout(0.4))
# Batch Normalisation
model.add(BatchNormalization())

image.png
Full-size input connection layer 8 4096X1, a total of 4,096 neurons, the output vector 1000X1.

5.11 output layer and Training

# Output Layer
model.add(Dense(17))
model.add(Activation('softmax'))
model.summary()
# (4) Compile 
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy'])
# (5) Train
model.fit(x, y, batch_size=64, epochs=1, verbose=1, validation_split=0.2, shuffle=True)

Finally, after the results obtained in the softmax activation function fully connected layer.
image.png
FIG loss function results in the loss of the value of the training set at minimum 23 iterations. Due to the limited amount of data, loss test set of values can not be reduced to the desired position. (The abscissa is the number of iterations, the ordinate is the value of the loss function)
image.png
can be seen in the training set can achieve nearly 90% accuracy. (The abscissa is the iteration number, and the ordinate is the accuracy)
since the data set is too small, the accuracy of the test can not achieve the desired set value.

6. Summary and Outlook

Currently, you can   Mo  find platform based AlexNet project  Flower , this project for one thousand original classification integration, made 17 final classification of flowers. You in the learning process, difficulties or discover our mistakes, you can contact us at any time.
Project Source Address: http://www.momodel.cn:8899/explore/5cff0ee61afd941c7e304adb?type=app

Summarize the main contribution of AlexNet:

  1. 2-way GPU implementation, accelerate the training speed
  2. Relu nonlinear activation function, reduce training time, speed up training
  3. The overlapping pool, and improve the precision, not prone to overfitting
  4. In order to reduce over-fitting, using the data expansion and "Dropout"
  5. Using partial response normalization to improve the accuracy
  6. 5 + 3 convolution layers fully connected layers, excellent structural properties

7. References

about us

Mo (URL: momodel.cn ) is a Python support of artificial intelligence online modeling platform that can help you quickly develop, training and deployment model.


Mo AI clubs  are sponsored by the site R & D and product design team, committed to the development and use artificial intelligence to reduce the threshold of the club. Team with big data processing and analysis, visualization and data modeling experience, has undertaken multidisciplinary intelligence project, with design and development capability across the board from the bottom to the front end. The main research directions for the management of large data analysis and artificial intelligence technology, and in order to promote data-driven scientific research.

Currently the club held six machine-learning technology salon themed activities under the line in Hangzhou weekly, from time to time to share articles and academic exchanges. Hoping to converge from all walks of life to artificial intelligence interested friends, continue to grow exchanges, promote the democratization of artificial intelligence, wider use.
image.png

Published 36 original articles · won praise 4 · views 10000 +

Guess you like

Origin blog.csdn.net/weixin_44015907/article/details/94128854