Introduction to Deep Convolutional Neural Network DCNN

1. Background

        Convolutional Neural Network (Convolutional Neural Network, also known as ConvNet) retains spatial information and therefore can be better used for image classification.

        The convolution operation is based on carefully selected local receptive fields and shares weights in multiple feature planes ; then the fully connected layer is based on a traditional multi-layer perceptron, using softmax as the output layer.

        Innovations in convolutional networks: retaining spatial information, adding convolution, pooling and feature planes .

 2. Network composition

2.1 Local receptive field

Convolution: Connect the sub-matrix of adjacent input neurons to a single hidden neuron in the next layer. This hidden single neuron represents a local receptive field.

Convolutional layers can effectively represent the local space by reusing convolution kernels. In DL, kernel arrays are learned. Convolution kernels can be designed to detect edges in images.

        CNN has multiple filters stacked together to independently identify specific visual features at different locations in the image . These features are very simple in the initial network layer and become more and more complex as the network layer deepens.

(1) padding='same': indicates that the convolution result at the boundary is preserved, the input boundary is padded with 0, and its output is the same size as the input;

(2) padding='vaild': means that only the part where the input and filter are completely superimposed will be convolved, and the output will be smaller than the input.

2.2 Shared weights and biases

pass

2.3 Pooling

All pooling operations are summary operations over a given region.

Summarize the output of the feature plane (aggregate these submatrices into a single output value) to describe the meaning of the associated physical region.

2.3.1 Max Pooling

pass

2.3.2 Average pooling

pass

3. Application

One-dimensional convolution is mainly used for processing sound and text data in the time dimension;

Two-dimensional convolution (height*width) is mainly used for image data processing; its output two-dimensional array can be regarded as a representation of the input at a certain level in the spatial dimension, also called a feature map.

Three-dimensional convolution (height*width*time) is mainly used for video data processing;

3.1 LeNet network prediction minist data set

Features: Let the lower network layers perform convolution and maximum pooling operations alternately, which is very robust to simple geometric transformations and torsions.

Code:

1. Model definition

from keras.models import Sequential
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers import Convolution2D
from keras.layers.core import Activation, Flatten, Dense
from keras.datasets import mnist
from keras.utils import np_utils
from keras.optimizers import Adam
import numpy as np
import matplotlib.pyplot as plt

class LeNet:
    @staticmethod
    def build(input_shape, classes):
        model = Sequential()
        model.add(Convolution2D(20, kernel_size=5, padding='same', input_shape=input_shape))
        # model.add(Conv2D(20, kernel_size=5, padding='same', input_shape=input_shape))
        model.add(Activation('relu'))
        model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
        model.add(Flatten())
        model.add(Dense(500))
        model.add(Activation('relu'))
        model.add(Dense(classes))
        model.add(Activation('softmax'))
        return model

2. Model training and evaluation

def model_train(X_train, y_train):
    OPTIMIZER = Adam()
    model = LeNet.build(input_shape=INPUT_SHAPE, classes=NB_CLASSES)
    model.compile(loss='categorical_crossentropy', optimizer=OPTIMIZER, metrics=['accuracy'])
    history = model.fit(X_train, y_train, batch_size=BATCH_SIZE, epochs=NB_EPOCH, verbose=1, validation_split=VALIDATION_SPLIT)
    # plot_picture(history)
    return model

def model_evaluate(model, X_test, y_test):
    score = model.evaluate(X_test, y_test, verbose=1)
    print('Test score: ', score[0])
    print('Test acc: ', score[1])

3. Data loading and preprocessing

def load_and_proc_data():
    (X_train, y_train), (X_test, y_test) = mnist.load_data()

    X_train = X_train.astype('float32')
    X_test = X_test.astype('float32')
    X_train /= 255
    X_test /= 255
    print('X_train shape', X_train.shape)
    print(X_train.shape[0], 'train samples')
    print(X_test.shape[0], 'test samples')

    # 将类向量转换成二值类别矩阵
    y_train = np_utils.to_categorical(y_train, NB_CLASSES)
    y_test = np_utils.to_categorical(y_test, NB_CLASSES)

    return X_train, X_test, y_train, y_test

4. Main function

NB_EPOCH = 20
BATCH_SIZE = 128
VALIDATION_SPLIT = 0.2
IMG_ROWS, IMG_COLS = 28, 28
INPUT_SHAPE = (IMG_ROWS, IMG_COLS, 1)  # 单通道
NB_CLASSES = 10

if __name__ == '__main__':
    X_train, X_test, y_train, y_test = load_and_proc_data()
    model = model_train(X_train, y_train)
    model_evaluate(model, X_test, y_test)

3.2 LeNet network predicts CIFAR-10 image data set

For details, refer to:

Introduction to LeNet Network

3.3 VGG16 network and transfer learning

For details, refer to:

VGG network and middle layer feature extraction

Guess you like

Origin blog.csdn.net/MusicDancing/article/details/130173874