1. Background
Convolutional Neural Network (Convolutional Neural Network, also known as ConvNet) retains spatial information and therefore can be better used for image classification.
The convolution operation is based on carefully selected local receptive fields and shares weights in multiple feature planes ; then the fully connected layer is based on a traditional multi-layer perceptron, using softmax as the output layer.
Innovations in convolutional networks: retaining spatial information, adding convolution, pooling and feature planes .
2. Network composition
2.1 Local receptive field
Convolution: Connect the sub-matrix of adjacent input neurons to a single hidden neuron in the next layer. This hidden single neuron represents a local receptive field.
Convolutional layers can effectively represent the local space by reusing convolution kernels. In DL, kernel arrays are learned. Convolution kernels can be designed to detect edges in images.
CNN has multiple filters stacked together to independently identify specific visual features at different locations in the image . These features are very simple in the initial network layer and become more and more complex as the network layer deepens.
(1) padding='same': indicates that the convolution result at the boundary is preserved, the input boundary is padded with 0, and its output is the same size as the input;
(2) padding='vaild': means that only the part where the input and filter are completely superimposed will be convolved, and the output will be smaller than the input.
2.2 Shared weights and biases
pass
2.3 Pooling
All pooling operations are summary operations over a given region.
Summarize the output of the feature plane (aggregate these submatrices into a single output value) to describe the meaning of the associated physical region.
2.3.1 Max Pooling
pass
2.3.2 Average pooling
pass
3. Application
One-dimensional convolution is mainly used for processing sound and text data in the time dimension;
Two-dimensional convolution (height*width) is mainly used for image data processing; its output two-dimensional array can be regarded as a representation of the input at a certain level in the spatial dimension, also called a feature map.
Three-dimensional convolution (height*width*time) is mainly used for video data processing;
3.1 LeNet network prediction minist data set
Features: Let the lower network layers perform convolution and maximum pooling operations alternately, which is very robust to simple geometric transformations and torsions.
Code:
1. Model definition
from keras.models import Sequential
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers import Convolution2D
from keras.layers.core import Activation, Flatten, Dense
from keras.datasets import mnist
from keras.utils import np_utils
from keras.optimizers import Adam
import numpy as np
import matplotlib.pyplot as plt
class LeNet:
@staticmethod
def build(input_shape, classes):
model = Sequential()
model.add(Convolution2D(20, kernel_size=5, padding='same', input_shape=input_shape))
# model.add(Conv2D(20, kernel_size=5, padding='same', input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(500))
model.add(Activation('relu'))
model.add(Dense(classes))
model.add(Activation('softmax'))
return model
2. Model training and evaluation
def model_train(X_train, y_train):
OPTIMIZER = Adam()
model = LeNet.build(input_shape=INPUT_SHAPE, classes=NB_CLASSES)
model.compile(loss='categorical_crossentropy', optimizer=OPTIMIZER, metrics=['accuracy'])
history = model.fit(X_train, y_train, batch_size=BATCH_SIZE, epochs=NB_EPOCH, verbose=1, validation_split=VALIDATION_SPLIT)
# plot_picture(history)
return model
def model_evaluate(model, X_test, y_test):
score = model.evaluate(X_test, y_test, verbose=1)
print('Test score: ', score[0])
print('Test acc: ', score[1])
3. Data loading and preprocessing
def load_and_proc_data():
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# 将类向量转换成二值类别矩阵
y_train = np_utils.to_categorical(y_train, NB_CLASSES)
y_test = np_utils.to_categorical(y_test, NB_CLASSES)
return X_train, X_test, y_train, y_test
4. Main function
NB_EPOCH = 20
BATCH_SIZE = 128
VALIDATION_SPLIT = 0.2
IMG_ROWS, IMG_COLS = 28, 28
INPUT_SHAPE = (IMG_ROWS, IMG_COLS, 1) # 单通道
NB_CLASSES = 10
if __name__ == '__main__':
X_train, X_test, y_train, y_test = load_and_proc_data()
model = model_train(X_train, y_train)
model_evaluate(model, X_test, y_test)
3.2 LeNet network predicts CIFAR-10 image data set
For details, refer to:
3.3 VGG16 network and transfer learning
For details, refer to: