Artificial Intelligence-Experiment 4

The fourth experiment

1. Experimental purpose

​ Understand the basic principles of deep learning. Ability to use deep learning open source tools. Learn to use deep learning algorithms to solve real-world problems.

2. Experimental principles

1. Overview of Deep Learning

​ Deep learning originates from artificial neural networks. Its essence is to construct an artificial neural network with multiple hidden layers. It uses convolution, pooling, error backpropagation and other means to perform feature learning and improve the accuracy of classification or prediction. Deep learning achieves better prediction and classification performance by increasing the depth of the network, reducing the number of features that need to be fitted in each layer, and extracting information from the bottom to the top layer by layer. Emphasize the depth of the model structure, usually with more than 5 hidden layers.

​ Neuron is the basic unit in the deep learning model. It weights and accumulates the input information of adjacent forward neurons, adds a deviation value, performs non-linear transformation on the result, and outputs the final result. Commonly used nonlinear transformation activation functions include Sigmoid, Tanh, ReLU, and Softmax.

Insert image description here

There are multiple layers in the deep learning neural network, each layer contains some neurons, and the connection relationships between neurons at different levels constitute different neural networks. Each neuron in each layer has different weights and bias values, which are the parameters of the neural network. The parameters are adjusted through training, resulting in a neural network with high classification and prediction accuracy. Parameter optimization algorithms involve gradient descent, error backpropagation, etc.

gradient descent

​ Gradient descent algorithm is a method of minimizing the loss function. In a multivariate function, the gradient is a vector of partial derivatives with respect to each variable. The opposite direction of the gradient is the direction in which the value of the function decreases fastest. The gradient descent algorithm is used to minimize the loss function and find the direction in which the loss function decreases the fastest for parameter tuning.

error backpropagation

​ Error backpropagation is the process in which the gradient of the loss function with respect to the parameters flows backward through the network.

Insert image description here

In order to reduce the loss function, find the partial derivative of the loss function with respect to the parameter w1, select an increment in the opposite direction of the gradient of the loss function, and adjust the weight of w1 to ensure that the value of the loss function is reduced. The parameter w1 does not appear directly in the loss function. The partial derivative of w1 needs to be obtained through chain derivation:
d L dw 1 = d L d O d O d X d X dw 1 \frac{dL}{dw1} = \frac{dL}{dO}\frac{dO}{dX}\frac{dX}{dw1}d w 1dL=dOdLdXdOd w 1dX
​ In order, they are the partial derivative of the loss function to the output (related to the definition of the loss function), the partial derivative of the activation function, and the last one is the derivation of the weighted accumulation function (w1*out1+w2*out2+w3* out3, the derivative result is out1). The cumulative multiplication of the three is the partial derivative of the loss function with respect to a certain parameter. Each neuron updates its parameters accordingly, and in this process, the error also propagates layer by layer from the output end to the input end. And when there are multiple paths for error propagation, the neurons will also get errors from multiple directions and adjust parameters when the error is back-propagated.

Insert image description here

2. Convolutional Neural Network CNN

​ Convolutional neural network CNN is a type of artificial neural network and is often used for image recognition. A convolutional neural network is a multi-layer perceptron that recognizes two-dimensional shapes and is invariant to translation, scaling, rotation, or other forms of transformation.

The core idea of ​​the convolutional neural network is to combine the three structures of local perception, weight sharing and downsampling to achieve feature learning for image dimensionality reduction. Convolutional neural networks usually have multiple convolutions and pooling hidden layers for feature extraction and dimensionality reduction.

convolution

​ Convolution scans the image from left to right and top to bottom through a weighted sliding window (called a convolution kernel). Performs an inner matrix product of the window value and the covered part of the image. The final calculated result is called a feature map. The feature map reflects the similarity between each convolution kernel size part and the convolution kernel. Convolution takes into account that the feature only accounts for a small part of the image when extracting features, and the same feature may appear in different locations in different images. The value of the convolution kernel is obtained by training the model.

Pooling

The purpose of pooling is to reduce the feature dimension by aggregating statistics on features at different locations. There are many methods of aggregation, including average pooling and max pooling.

Insert image description here

Fully connected layer

​ The convolutional neural network uses local connections between the upper and lower layers to construct the network, simulating the phenomenon that nerve cells only respond to local areas. But after convolution and pooling, the final result is usually obtained through a fully connected layer. The fully connected neural network means that for the n layer and the n-1 layer, any node in the n-1 layer is connected to all nodes in the nth layer.

Insert image description here

convolutional neural network

By combining convolutional layers, pooling layers, and fully connected networks, you can get a complete convolutional neural network. Usually there are many layers of convolution and pooling to extract features from low to high levels, and finally classify through a fully connected neural network.

Insert image description here

3. Recurrent Neural Network RNN

​ Recurrent neural networks are commonly used in natural language processing. The characteristic of language processing is that there is contextual association, so the neural network must establish a temporal relationship. In the RNN neural network, each neuron receives both the current input and the previous output, and works together to calculate the output result. Recurrent neural networks are usually used to extract features, and finally the target output is achieved through fully connected layers.

Insert image description here

3.Experimental content

Keras is a deep learning framework that encapsulates many high-level neural network modules, including fully connected layers, convolutional layers, etc. The contents in this experiment are all implemented based on Keras.

1.Keras builds CNN/RNN

Build CNN

Keras provides some simple interfaces to quickly build convolutional neural networks. Proceed as follows:

  • Instantiate a Sequential object (model)
  • Add layers to the model one by one. For CNN, add convolutional layers, pooling layers, fully connected layers, etc.
  • Compile the model, specify the loss function, parameter update method, etc.
  • Use the fit function to pass in the training set for training, and set the number of training iterations and the number of batches.
  • Use the trained model to make predictions

​ An example of creating a model is as follows:

    model = keras.Sequential()
    model.add(Conv2D(32, kernel_size=3, activation='relu', input_shape=[IMAGE_HEIGHT, IMAGE_WIDTH, 3]))
    model.add(MaxPooling2D(pool_size=2))
    model.add(Conv2D(32, kernel_size=3, activation='relu'))
    model.add(MaxPooling2D(pool_size=2))
    model.add(Flatten())
    model.add(Dense(96, activation='relu'))
    model.add(Dense(2, activation='softmax'))

​ This model contains two convolutional layers. Each convolutional layer is followed by a pooling layer, and then connected to a flattening layer. The function of the flattening layer is to convert the multi-dimensional input into one dimension and is used from the convolutional layer to the full layer. The transition of the connection layer finally connects the two fully connected layers and outputs the result.

The result of this model is a two-bit vector. This model is used for classification problems. Since there are two categories of images, there are two values ​​at the end. These two values ​​are the probability that the image belongs to one of the categories. The one with the higher probability is selected as the classification result. The target output during training must also be converted to this form, called onehot encoding , where the categorical value is mapped to an integer value, the integer value is represented as a binary vector, and then trained.

Insert image description here

Build RNN

Similar to building CNN, the process of building RNN in Keras is also very simple:

model = Sequential()
# 对输入的影评进行word embedding,一般对于自然语言处理问题需要进行word embedding
model.add(Embedding(1000, 64))
# 构建一层有40个神经元的RNN层
model.add(SimpleRNN(40))
# 将RNN层的输出接到只有一个神经元全连接层
model.add(Dense(1, activation='sigmoid'))

​ There is an Embedding layer. This layer implements the mapping from semantic space to vector space, converts each word into a vector of fixed dimensions, and converts two words with close semantics into vectors. The similarity of the two vectors The degree is also high.

2. Face recognition based on Keras

The data set used in the experiment consists of multiple grayscale images with a resolution of 284*286 and is labeled BioID_xxxx.pgm. The platform converts it to JPG format and xxxx is the index. There are 16 types of characters, represented by the letters AV. Each character has 20-50 samples.

Read in data

To build an image classification model, the image information must be divided into two parts, one is the pixel information of the image itself, and the other is the category information of the image. The image information read using OpenCv is read in BGR order and needs to be converted to RGB, and the image must be cropped to a uniform size.

​ Loading images:

def load_pictures():
    pics = []
    labels = []
    for key, v in map_characters.items():
        pictures = [k for k in glob.glob(imgsPath + "/" + v + "/*")]
        for pic in pictures:
            img = cv2.imread(pic)
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            img = cv2.resize(img, (img_width, img_height))
            pics.append(img)
            labels.append(key)
    return np.array(pics), np.array(labels)

​ Among them, map_characters stores the mapping of character categories to numerical values, and AV is mapped to 0-15 as the category label information of the image. This information also needs to be converted into OneHot encoding, which is completed when dividing the training set and validation set.

​ Divide the training set and the test set, and the test set ratio is 0.15.

def get_dataset():
    X, Y = load_pictures()
    Y = keras.utils.to_categorical(Y, num_classes)	#转为OneHot编码
    X_train, X_test = train_test_split(X, test_size=0.15)
    y_train, y_test = train_test_split(Y, test_size=0.15)
    return X_train, X_test, y_train, y_test

Build model

The CNN model constructed in the experiment has 6 convolutional layers with ReLU activation function and a fully connected hidden layer. After every two convolutional layers, there is a pooling layer to reduce parameters, and a Dropout layer to prevent overfitting. , discard part of the output of the pooling layer. The final output layer outputs the probability that the image belongs to each category.

def create_model_six_conv(input_shape):
    # ********** Begin *********#
    model = Sequential()
    model.add(Conv2D(32, (3, 3), activation='relu', input_shape=input_shape))
    #特征提取
    model.add(Conv2D(32, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.2))
    #特征提取
    model.add(Conv2D(64, (3, 3),  activation='relu'))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.2))
    #特征提取
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.2))
    #经过扁平层和全连接层,最终在输出层输出结果
    model.add(Flatten())						#扁平层进行降维
    model.add(Dense(1024, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))
    return model;

Model training

​ Some parameters need to be set before starting model training. It mainly includes the following parts:

  • Optimizer: The model is trained using the stochastic gradient descent algorithm. The relevant parameters of the optimizer are:
    • Learning rate lr: parameter for adjusting parameters in gradient descent method
    • Decay: a floating point number greater than or equal to 0, the learning rate decay value after each update
    • nesterov: Boolean value that determines whether to use Nesterov momentum (momentum methods can speed up learning (speed up gradient descent), especially when dealing with high curvature, small but consistent gradients, or noisy gradients. Momentum algorithms accumulate previous gradients A moving average that decays exponentially and continues to move in that direction.). During the test, it was found that if it is not used, a model with high enough accuracy cannot be obtained within the limited training time.
    • momentum: a floating point number that is large or equal to 0, momentum parameter
  • loss: loss function. For classification problems, you can use the cross-entropy loss function (categorical_crossentropy)
  • metrics: performance indicators, using accuracy
  • batch_size: the number of sample data used in one training
  • epochs: training rounds

During the training process, the learning rate of weight adjustment can be attenuated as the number of training cycles increases. The learning rate can be reduced immediately after maintaining a constant learning rate for a period of time, using a method of the callbacks module LearningRateScheduler [1] ^{[1]}[ 1 ] Implementation:

def lr_schedule(epoch):
    initial_lrate = 0.01
	drop = 0.5
	epochs_drop = 10.0
	lrate = initial_lrate * math.pow(drop, math.floor((1+epoch)/epochs_drop))
	return lrate

​ In addition, in order to ensure that long-term training can be saved midway during testing, the save points of the callbacks module are used to store the weights of the model and save the best model.

lr = 0.01
sgd = SGD(lr=lr, decay=0.0 , momentum=0.9 , nesterov= True)
model.compile(loss= 'categorical_crossentropy' ,
         optimizer= sgd,
         metrics=['accuracy'] )

def lr_schedule(epoch):
    lr = 0.01
	drop = 0.5
	epochs_drop = 10.0
	lrate = lr * math.pow(drop, math.floor((1+epoch)/epochs_drop))
	return lrate

batch_size = 32
epochs = 20

filepath = "model.h5"

history = model.fit(X_train, y_train,  
    batch_size=batch_size,  
    epochs=epochs,  
    validation_data=(X_test, y_test),  
    shuffle=True,  						#打乱数据集
    verbose = 0,  
    callbacks=[LearningRateScheduler(lr_schedule),  
        ModelCheckpoint(filepath, save_best_only=True)])  

Validation and evaluation

After you have the trained model, you can test the model. The reading of the test data is similar to the reading of the data during training. It contains the pixel information and label of the image, and the label is converted into onehot encoding.

imgsPath = "/opt/test/"
def load_test_set(path):
    pics, labels = [], []
    map_characters = {
    
    0: 'A', 1: 'C', 2: 'D',
        3: 'F', 4: 'G', 5: 'H', 6: 'I',
        7: 'J', 8: 'K', 9: 'L', 10:'M',
        11:'P', 12:'R', 13:'S', 14:'T', 15:'V'}
    num_classes = len(map_characters)
    img_width = 42
    img_height = 42
    map_characters = {
    
    v:k for k,v in map_characters.items()}
    for pic in glob.glob(path+'*.*'): 
        name = "".join(os.path.basename(pic).split('_')[0]) 
        if name in map_characters:  
            img = cv2.imread(pic)  
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  
            img = cv2.resize(img, (img_height,img_width)).astype('float32') / 255.  
            pics.append(img)  
            labels.append(map_characters[name])
    X_test = np.array(pics)  
    y_test = np.array(labels)  
    y_test = keras.utils.to_categorical(y_test, num_classes) # one-hot编码  
    return X_test, y_test

After loading the test data, load the trained model, use the model to classify the test data, and compare the classification results with the correct categories to calculate the accuracy of the model test.

def acc():  
    model = load_model("model.h5")
    # 预测与对比  
    y_pred = model.predict_classes(X_valtest)  
    acc = np.sum(y_pred==np.argmax(y_valtest, axis=1))/np.size(y_pred)  
    return(acc)

Insert image description here

3. Thinking questions - The impact of deep learning algorithm parameter settings on algorithm performance

​ Hyperparameters in deep learning are the key to controlling model structure, training efficiency, and training effect. Common hyperparameters and their impact on model training are as follows:

  • Learning rate: Determines how much parameter weights are updated in the optimization algorithm. The learning rate can be constant, gradually decreasing, momentum-based, etc. [2] ^{[2]}[ 2 ] . The value of the learning rate should be set within an appropriate range. If it is too small, it will reduce the convergence speed and increase the training time. If it is too large, it may cause the parameters to oscillate on both sides of the optimal solution. The learning rate can be dynamically adjusted. It is generally larger at the beginning and decreases as the number of iterations increases to improve stability.

  • Number of iterations (Epoch): The number of training times. If the number of times is too small, the training effect may not be good enough. If the number of times is too many, it may lead to overfitting.

  • The number of samples selected for one training (Batch size): affects the training time. If it is too small, there may be gradient oscillation. If it is too large, the gradient will be accurate and the convergence will be fast but it will easily fall into the local optimum.

  • Optimizer: Common ones include SGD (stochastic gradient descent), Adagrad (adaptive gradient descent), different parameters have different learning rates [3] ^{[3]}[ 3 ] ) etc.

  • Activation function: To increase the nonlinearity of the neural network model, an appropriate activation function must be selected according to the specific problem.

  • Loss function: affects the convergence speed and overall performance of the model. Commonly used loss functions in regression models include mean square loss function MSE, smooth L1 loss Huber, mean absolute error MAE, and cross-entropy loss function commonly used in classification problems.

References

[1] TEAM K. Keras documentation: LearningRateScheduler[EB/OL]//keras.io. https://keras.io/api/callbacks/learning_rate_scheduler/.

[2] Hyperparameters in deep learning and their impact on model training_The impact of rnn model hyperparameters_weixin_41783077's blog-CSDN blog [EB/OL]//blog.csdn.net. [2023-05-28] . https://blog.csdn.net/weixin_41783077/article/details/104022476.

[3] Hyperparameter adjustment in deep learning (learning rate, epochs, batch-size...) [EB/OL]//Zhihu column. [2023-05-28]. https://zhuanlan.zhihu.com/ p/433836153.ng_rate_scheduler
/.

[2] Hyperparameters in deep learning and their impact on model training_The impact of rnn model hyperparameters_weixin_41783077's blog-CSDN blog [EB/OL]//blog.csdn.net. [2023-05-28] . https://blog.csdn.net/weixin_41783077/article/details/104022476.

[3] Hyperparameter adjustment in deep learning (learning rate, epochs, batch-size...) [EB/OL]//Zhihu column. [2023-05-28]. https://zhuanlan.zhihu.com/ p/433836153.

Guess you like

Origin blog.csdn.net/Aaron503/article/details/131104795