Machine Learning - Deep Neural Network Practice (FCN, CNN, BP)

Table of contents

Series Article Directory

1. Similarities and differences between CNN (Convolutional Neural Network) and FCN (Fully Connected Network)

1. Similarities

2. Difference

2. The optimal iterative formula of the neural network forward and backward propagation algorithm

3. Application of Deep Neural Network Algorithms

1. Face recognition

1.1 Data import

1.2 Model construction and use

1.3 Results and Analysis

2. Handwriting recognition

2.1 Handwritten digit recognition code based on BP neural network

2.2 Results and analysis

3. Image Classification

3.1 Model construction process and results

3.2 Initial pre-training classification

3.3 Fine tune training

4. Design of innovative neural network algorithm

5. Others

1. Datasets and resources

2. References

Summarize


Series Article Directory

This series of blogs focuses on the concepts, principles and code practices of machine learning, and does not include tedious mathematical derivations (if you have any questions, please discuss and point them out in the comment area, or contact me directly by private message).

    It makes sense for everyone to understand the principle and process to reproduce the code can be copied ( although I don’t understand deep learning )! ! !

Chapter 1  Machine Learning - PCA (Principal Component Analysis) and Face Recognition

Chapter 2  Machine Learning - LDA (Linear Discriminant Analysis) and Face Recognition_@李忆如的博客-CSDN博客

Chapter 3  Machine Learning - LR (Linear Regression), LRC (Linear Regression Classification) and Face Recognition_@李梦如的博客

Chapter 4  Machine Learning - SVM (Support Vector Machine) and Face Recognition_@李梦如的博客

Chapter 5  Machine Learning - K-means (clustering) and Face Recognition 

Chapter 6 Machine Learning - Deep Neural Networks in Practice


synopsis

This blog mainly introduces the concept and principle of deep neural network (FNC, CNN, BP, etc.) algorithms and related analysis and derivation, compares the similarities and differences between FCN and CNN, and deduces the optimal iteration formula of neural network forward and backward propagation algorithms. And apply the deep neural network algorithm to solve practical problems (take face recognition, handwriting recognition, image classification as examples). In addition, in view of the shortcomings of the classic neural network, a new deep neural network algorithm has been designed, which has been optimized from various aspects and angles ( dataset and python code are attached ).


1. Similarities and differences between CNN (Convolutional Neural Network) and FCN (Fully Connected Network)

1. Similarities

① Similar structures: Both neural networks are organized through layers of nodes, each node is a neuron, and there are some or two edges between the nodes.

②The process is similar: the input and output of the convolutional neural network and the training process are basically the same as the fully connected neural network. Taking image classification as the column, the input layer of the convolutional neural network is the original image of the image, and each of the output layers Nodes represent different categories of trustworthiness. This is consistent with the input and output of a fully connected neural network. Similarly, the loss function of the fully connected neural network and the optimization process of the parameters are also applicable to the convolutional neural network.

2. Difference

The main difference between the two is the way two adjacent layers of a neural network are connected . In a fully connected neural network, nodes between every two adjacent layers are connected by edges, while for a convolutional neural network, only some nodes are connected between adjacent two layers. Because of the above differences between the two, the fully connected neural network cannot process image data well (as the number of data and network layers increases, the number of parameters explodes, the calculation speed becomes slower, and over-fitting problems are prone to occur), Convolutional neural networks overcome this shortcoming. The difference and the flow of the two algorithms are roughly as follows:

2. The optimal iterative formula of the neural network forward and backward propagation algorithm

BP graph model:

A single activation unit in the network:

Tips : The above figure defines the activation unit in the hidden layer, which contains a bias item b . The related operations are shown in the figure. The upper right corner of the symbol is marked as the layer of the unit in the network. When combined with code implementation, the weight between network activation units is generally saved in the unit of the previous layer.

Loss function definition:

The derivation of the optimized iterative formula of the specific forward and backward propagation algorithm can be seen:

Derivation and Implementation of Forward and Backward Propagation of Neural Network - liangxinGao's Blog - CSDN Blog

Explain the forward propagation and backpropagation of neural network in detail (derivation from scratch) - Maples,'s Blog - CSDN Blog

3. Application of Deep Neural Network Algorithms

1. Face recognition

In Pycharm, the CNN-based deep neural network model is built on the ORL dataset and used for face recognition. The code is as follows:

1.1 Data import

import numpy
import pandas
from PIL import Image
from keras import backend as K
from keras.utils import np_utils


"""
加载图像数据的函数,dataset_path即图像olivettifaces的路径
加载olivettifaces后,划分为train_data,valid_data,test_data三个数据集
函数返回train_data,valid_data,test_data以及对应的label
"""

# 400个样本,40个人,每人10张样本图。每张样本图高57*宽47,需要2679个像素点。每个像素点做了归一化处理
def load_data(dataset_path):
    img = Image.open(dataset_path)
    img_ndarray = numpy.asarray(img, dtype='float64') / 256
    print(img_ndarray.shape)
    faces = numpy.empty((400,57,47))
    for row in range(20):
        for column in range(20):
            faces[row * 20 + column] = img_ndarray[row * 57:(row + 1) * 57, column * 47:(column + 1) * 47]
    # 设置400个样本图的标签
    label = numpy.empty(400)
    for i in range(40):
        label[i * 10:i * 10 + 10] = i
    label = label.astype(numpy.int)
    label = np_utils.to_categorical(label, 40)  # 将40分类类标号转化为one-hot编码

    # 分成训练集、验证集、测试集,大小如下
    train_data = numpy.empty((320, 57,47))   # 320个训练样本
    train_label = numpy.empty((320,40))   # 320个训练样本,每个样本40个输出概率
    valid_data = numpy.empty((40, 57,47))   # 40个验证样本
    valid_label = numpy.empty((40,40))   # 40个验证样本,每个样本40个输出概率
    test_data = numpy.empty((40, 57,47))   # 40个测试样本
    test_label = numpy.empty((40,40))  # 40个测试样本,每个样本40个输出概率

    for i in range(40):
        train_data[i * 8:i * 8 + 8] = faces[i * 10:i * 10 + 8]
        train_label[i * 8:i * 8 + 8] = label[i * 10:i * 10 + 8]
        valid_data[i] = faces[i * 10 + 8]
        valid_label[i] = label[i * 10 + 8]
        test_data[i] = faces[i * 10 + 9]
        test_label[i] = label[i * 10 + 9]

    return [(train_data, train_label), (valid_data, valid_label),(test_data, test_label)]


if __name__ == '__main__':
    [(train_data, train_label), (valid_data, valid_label), (test_data, test_label)] = load_data('olivettifaces.gif')
    oneimg = train_data[0]*256
    print(oneimg)
    im = Image.fromarray(oneimg)
    im.show()

1.2 Model construction and use

import numpy as np

np.random.seed(1337)  # for reproducibility
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D, AveragePooling2D
from PIL import Image
import FaceData

# 全局变量
batch_size = 128  # 批处理样本数量
nb_classes = 40  # 分类数目
epochs = 23000  # 迭代次数
img_rows, img_cols = 57, 47  # 输入图片样本的宽高
nb_filters = 32  # 卷积核的个数
pool_size = (2, 2)  # 池化层的大小
kernel_size = (5, 5)  # 卷积核的大小
input_shape = (img_rows, img_cols, 1)  # 输入图片的维度

[(X_train, Y_train), (X_valid, Y_valid), (X_test, Y_test)] = FaceData.load_data('olivettifaces.gif')

X_train = X_train[:, :, :, np.newaxis]  # 添加一个维度,代表图片通道。这样数据集共4个维度,样本个数、宽度、高度、通道数
X_valid = X_valid[:, :, :, np.newaxis]  # 添加一个维度,代表图片通道。这样数据集共4个维度,样本个数、宽度、高度、通道数
X_test = X_test[:, :, :, np.newaxis]  # 添加一个维度,代表图片通道。这样数据集共4个维度,样本个数、宽度、高度、通道数

print('样本数据集的维度:', X_train.shape, Y_train.shape)
print('测试数据集的维度:', X_test.shape, Y_test.shape)

# 构建模型
model = Sequential()
model.add(Conv2D(6, kernel_size, input_shape=input_shape, strides=1))  # 卷积层1
model.add(AveragePooling2D(pool_size=pool_size, strides=2))  # 池化层
model.add(Conv2D(12, kernel_size, strides=1))  # 卷积层2
model.add(AveragePooling2D(pool_size=pool_size, strides=2))  # 池化层
model.add(Flatten())  # 拉成一维数据
model.add(Dense(nb_classes))  # 全连接层2
model.add(Activation('sigmoid'))  # sigmoid评分

# 编译模型
model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
# 训练模型
model.fit(X_train, Y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(X_valid, Y_valid))
# 评估模型
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

y_pred = model.predict(X_test)
y_pred = y_pred.argmax(axis=1)  # 获取概率最大的分类,获取每行最大值所在的列
for i in range(len(y_pred)):
    oneimg = X_test[i, :, :, 0] * 256
    im = Image.fromarray(oneimg)
    # im.show()
    print('第%d个人识别为第%d个人' % (i, y_pred[i]))

1.3 Results and Analysis

In Pycharm, the CNN-based deep neural network model was built on the ORL data set and used for face recognition, and the relationship between loss and face recognition rate with the iteration number epoch was recorded respectively. The results are shown in the following chart:

 Analysis : Under the ORL data set, as the number of iterations increases, the effect of the CNN deep neural network model gradually improves, showing that the loss rate continues to decrease, and the face recognition rate continues to increase (up to 99.37% after 20,000 iterations). better than most algorithms.

2. Handwriting recognition

In Pycharm, the Mnist data set is built based on the BP deep neural network model for handwritten digit recognition. The code is as follows:

2.1 Handwritten digit recognition code based on BP neural network

import numpy as np
from sklearn.datasets import load_digits
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt

# 载入数据
digits = load_digits()
# 显示图片
for i in range(min(digits.images.shape[0], 2)):
    plt.imshow(digits.images[i], cmap='gray')
    plt.show()

# 数据
X = digits.data
# 标签
y = digits.target

# 定义一个神经网络,结构,64-100-
# 定义输入层到隐藏层之间的权值矩阵
V = np.random.random((64, 100)) * 2 - 1
# 定义隐藏层到输出层之间的权值矩阵
W = np.random.random((100, 10)) * 2 - 1

# 数据切分
# 1/4为测试集,3/4为训练集
X_train, X_test, y_train, y_test = train_test_split(X, y)

# 标签二值化
# 0 -> 1000000000
# 3 -> 0003000000
# 9 -> 0000000001
labels_train = LabelBinarizer().fit_transform(y_train)


# 激活函数
def sigmoid(x):
    return 1 / (1 + np.exp(-x))


# 激活函数的导数
def dsigmoid(x):
    return x * (1 - x)


#  训练模型
def train(X, y, steps=10000, lr=0.011):
    global V, W
    for n in range(steps + 1):
        # 随机选取一个数据
        i = np.random.randint(X.shape[0])
        # 获取一个数据
        x = X[i]
        x = np.atleast_2d(x)
        # BP算法公式
        # 计算隐藏层的输出
        L1 = sigmoid(np.dot(x, V))
        # 计算输出层的输出
        L2 = sigmoid(np.dot(L1, W))
        # 计算L2_delta,L1_delta
        L2_delta = (y[i] - L2) * dsigmoid(L2)
        L1_delta = L2_delta.dot(W.T) * dsigmoid(L1)
        # 更新权值
        W += lr * L1.T.dot(L2_delta)
        V += lr * x.T.dot(L1_delta)

        # 每训练1000次预测一次准确率
        if n % 1000 == 0:
            output = predict(X_test)
            predictions = np.argmax(output, axis=1)
            acc = np.mean(np.equal(predictions, y_test))
            dW = L1.T.dot(L2_delta)
            dV = x.T.dot(L1_delta)
            gradient = np.sum([np.sqrt(np.sum(np.square(j))) for j in [dW, dV]])
            print('steps', n, 'accuracy', acc, 'gradient', gradient)
            # print(classification_report(predictions,y_test))


def predict(x):
    # 计算隐藏层的输出
    L1 = sigmoid(np.dot(x, V))
    # 计算输出层的输出
    L2 = sigmoid(np.dot(L1, W))
    return L2


# 开始训练
train(X_train, labels_train, 20000, lr=0.11)
train(X_train, labels_train, 20000, lr=0.011)
# 训练后结果对比
output = predict(X_test)
predictions = np.argmax(output, axis=1)
acc = np.mean(np.equal(predictions, y_test))
print('accuracy', acc)
print(classification_report(predictions, y_test, digits=4))

2.2 Results and analysis

In Pycharm, the Mnist data set is built based on the BP deep neural network model and used for handwritten digit recognition, and the relationship between the recognition rate and the number of iterations epoch is recorded. The results are shown in the following chart:

Analysis : Under the Mnist data set, as the number of iterations increases, the effect of the BP deep neural network model gradually becomes better, showing that the recognition rate continues to increase (it is basically stable at about 96% after 10,000 iterations).

3. Image Classification

This part mainly uses the MindSpore (CPU) version to train a cat and dog classification model (Huawei Cloud experiment) through Fine-Tuning. The overall experiment process is as follows:

3.1 Model construction process and results

The process and results of model building are shown in the figure:

3.2 Initial pre-training classification

The initial pre-training classification results are as follows:

Analysis : The context_device_init function initializes the training environment, then defines the network structure, reads the CKPT parameters, and configures the network hyperparameters. However, after reading the parameters of the backbone in the pre-training, the prediction is made directly, and the result is displayed. From the result, it can be found that the prediction accuracy is insufficient.

3.3 Fine tune training

After Fine tune training, the results are as follows:

Analysis : After pre-training, the optimizer, learning rate, and loss function are defined. After the official Fine tune training , several npy files will be generated in the original dataset file. The questionnaire mainly saves the relevant variables and labels in the training process. Subsequently, the network completed the local training and successfully obtained the MindIR model. As can be seen from the above figure, After Fine tune training, the accuracy rate increases, the classification is accurate, and the model is successfully built.

4. Design of innovative neural network algorithm

① Deficiencies of classic CNN: overfitting results, poor performance in supervised problems, poor feature understanding, poor interpretability

② Optimizable direction: data processing, convolution method design, architecture design, activation function selection principle, optimizer selection, etc.

③ Optimization indicators: model accuracy, training speed, memory consumption

④ A brief description of the innovative neural network algorithm process:

one. Data cleaning and enhancement : If the selection and cleaning of data are not in place, the results obtained will often appear overfitting. Use different data enhancement methods to improve data quality according to application scenarios. At the same time, for data skew/imbalance, you can add oversampling, downsampling, SMOTE, integrated learning, category weights, change learning methods, etc., or use data loading directly Batch training.

two. Exquisite convolution design : Add some methods to the convolution design that can speed up the operation of CNN without too much loss of accuracy and reduce memory consumption, such as MobileNets (depth separation convolution), XNOR-Net ( Binary convolution), ShuffleNet (using point group convolution and channel randomization), Network Pruning (removing some weights of CNN)

three. Convolution kernel factor (opt-in) : Generally speaking, larger convolution kernels have higher accuracy, but the training speed will be slower and consume more memory, and larger convolution kernels will lead to network generalization Very poor, dilated convolution example is as follows:

Tips: Use spaces between the weights of the convolution kernel. It enables the network to expand the receptive field without increasing the amount of parameters, which means that there is no increase in memory consumption. This approach has been shown to increase network accuracy with a small speed tradeoff.

Four. Reasonable expansion of network size (width / depth): Because the GPU is processed in parallel, increasing the width is more friendly to the GPU than increasing the depth. Many studies have also shown that widening the network is easier to train than deepening the network. However, the benefits brought by width and depth are limited by marginal effects. The greater the width of each layer is, the less the model performance improvement will be brought by increasing the layer width, so a reasonable choice should be made.

five. Activation function selection optimization : According to engineering experience, usually using ReLU will get some good results immediately at the beginning, but if ReLU can’t get good results, you can replace it with the Sigmoid function, if it still doesn’t work, adjust the model and others part to try to improve the accuracy. If you can't get good results, you can try to use activation functions such as ELU, PReLU, Sigmoid or LeakyReLU.

six. Optimizer selection optimization : For a simple CNN applied to image classification problems, the training speed-up effect of using different optimizers is as follows:

Analysis : Choose the optimizer reasonably, and tune other hyperparameters of the model if you get good results. Remember not to set the learning rate too high. It can even be used in combination, using a fast optimizer to set a lower learning efficiency in the front, and selecting a slower optimizer in the second half of the training to perform a combined optimizer.

5. Others

1. Datasets and resources

Data sets used in this experiment: ORL5646, Mnist, cat and dog data sets.

The commonly used face data sets are as follows (don't prostitute hahaha)

Link: https://pan.baidu.com/s/12Le0mKEquGMgh5fhNagZGw 
Extraction code: yrnb

Deep Neural Network Application Complete Code and Required Dataset: Li Yiru/Yiru's Machine Learning - Gitee.com

2. References

1. ​​​​The main difference between fully convolutional neural network (FCN) and convolutional neural network (CNN)_Occasionally lying flat salted fish's blog-CSDN

2. Derivation and implementation of forward and backward propagation of neural network_liangxinGao's blog-CSDN blog

3.keras/Face_Recognition at master · data-infra/keras · GitHub

4. Neural network realizes handwritten font recognition_myourdream2's blog-CSDN blog_Neural network font recognition 

5. "Introduction to Artificial Intelligence" Deep Learning Experiment Manual 

6. CNN (Convolutional Neural Network) - Optimization Guide - Zhihu (zhihu.com)


Summarize

Deep neural network is a popular and important research field in deep learning. Most of the algorithms (CNN, BP, etc.) have excellent performances beyond various classical algorithms in the actual tasks in the field of machine learning and pattern recognition. Moreover, the deep neural network has a good performance in processing big data tasks and has great development potential. But at the same time, there are still many deficiencies, the first of which is the selection and construction of the model, as well as the common problems of deep learning methods-poor interpretability and robustness challenges. This will greatly affect the experimental results, so how to make good use of deep neural networks, how to choose models, and optimize algorithms are very important issues in the field of deep learning.

Guess you like

Origin blog.csdn.net/weixin_51426083/article/details/125255519