[LSTM] MATLAB simulation of face recognition algorithm based on LSTM network

1. Software version

matlab2021a

2. Theoretical knowledge of this algorithm

    The long-term memory model LSTM was first proposed by Hochreiter et al. in 1997, and its main principle is to store information for a long time through a special neuron structure. The basic structure of the LSTM network model is shown in the following figure:

Figure 1 Basic structure of LSTM network

    As can be seen from the structure diagram in Figure 1, the LSMT network structure includes three parts: the input layer, the memory module and the output layer. The memory module consists of the input gate (Input Gate), the forget gate (Forget Gate) and the output gate (Output Gate). The LSTM model controls the read and write operations of all neurons in the neural network through these three control gates.

    The basic principle of the LSTM model is to suppress the defect of the vanishing gradient of the RNN neural network through multiple control gates. The LSTM model can save the gradient information for a long time and prolong the processing time of the signal. Therefore, the LSTM model is suitable for processing signals of various frequencies and high and low frequency mixed signals. The Input Gate, Forget Gate and Output Gate in the memory unit in the LSTM model form a nonlinear summation unit through the control unit. The activation function of the input gate, the forget gate and the output gate is the sigmoid function, through which the "open" and "closed" states of the control gate can be changed.

    The following figure shows the internal structure of the memory module in the LSTM model:

 

Figure 2 The internal structure of the memory unit of the LSTM network

    As can be seen from the structure diagram in Figure 2, the working principle of the memory unit of LSTM is that when the input gate enters the "on" state, the external information is read by the memory unit, and when the input gate enters the "off" state, then the external information cannot enter. memory unit. Similarly, the forget gate and the output gate have similar control functions. The LSTM model stores various gradient information in the memory unit for a long time through these three control gates. When the memory unit stores information for a long time, its forget gate is in the "open" state, and the input gate is in the "closed" state.

    When the input gate enters the "open" state, the memory unit begins to receive and store external information. When the input gate enters the "closed" state, the memory unit suspends receiving external information, and at the same time, the output gate enters the "open" state, and the information stored in the memory unit is transmitted to the next layer. The function of the forget gate is to reset the state of neurons when necessary.

    For the forward propagation process of the LSTM network model, the mathematical principles involved are as follows:

 

 2. The calculation process of the forget gate is as follows:

       

 3. The calculation process of the memory unit is as follows:

 4. The output gate calculation process is as follows:

 5. The output calculation process of the memory unit is as follows:

For the back-propagation process of the LSTM network model, the mathematical principles involved are as follows:

 6. The input gate calculation process is as follows:

    The overall algorithm flow chart of the visual recognition algorithm based on LSTM network is shown in the following figure:

                                

 

Figure 3 Flowchart of visual recognition algorithm based on LSTM network

According to the algorithm flow chart in Figure 3, the steps of the visual recognition algorithm based on LSTM network to be studied in this paper are:

    Step 1: Image collection, this paper takes the face image as the research object.

    Step 2: Image preprocessing, according to the content of Section 2 of this chapter, preprocess the visual image to be recognized to obtain a clearer image.

    Step 3: Image segmentation, the image is segmented, and the segment size is determined according to the relationship between the recognition target of the collected image and the overall scene size, and the original image is segmented into sub-images of different sizes.

    Step 4: Extracting geometric elements of sub-graphs, obtaining the geometric elements contained in each sub-graph through the edge extraction method, and forming sentence information from each geometric element.

    Step 5: Input the sentence information into the LSTM network. This step is also the core link. The recognition process of the LSTM network is introduced below. First, the sentence information is input into the LSTM network through the input layer of LSTM. The basic structure diagram is shown in the following figure:

Figure 3 Recognition structure diagram based on LSTM network

    Here, it is assumed that the input feature information and output result of LSTM at a certain time are sum, the input and output in its memory module are sum, and the sum represents the output of the activation function of the LSTM neuron and the output of the hidden layer. The entire LSTM training process is :

3. Core code


function nn = func_LSTM(train_x,train_y,test_x,test_y);

binary_dim     = 8;
largest_number = 2^binary_dim - 1;
binary         = cell(largest_number, 1);

for i = 1:largest_number + 1
    binary{i}      = dec2bin(i-1, binary_dim);
    int2binary{i}  = binary{i};
end

%input variables
alpha      = 0.000001;
input_dim  = 2;
hidden_dim = 32;
output_dim = 1;

%initialize neural network weights
%in_gate = sigmoid(X(t) * U_i + H(t-1) * W_i)
U_i        = 2 * rand(input_dim, hidden_dim) - 1;
W_i        = 2 * rand(hidden_dim, hidden_dim) - 1;
U_i_update = zeros(size(U_i));
W_i_update = zeros(size(W_i));

%forget_gate = sigmoid(X(t) * U_f + H(t-1) * W_f)
U_f        = 2 * rand(input_dim, hidden_dim) - 1;
W_f        = 2 * rand(hidden_dim, hidden_dim) - 1;
U_f_update = zeros(size(U_f));
W_f_update = zeros(size(W_f));

%out_gate    = sigmoid(X(t) * U_o + H(t-1) * W_o)
U_o = 2 * rand(input_dim, hidden_dim) - 1;
W_o = 2 * rand(hidden_dim, hidden_dim) - 1;
U_o_update = zeros(size(U_o));
W_o_update = zeros(size(W_o));

%g_gate      = tanh(X(t) * U_g + H(t-1) * W_g)
U_g = 2 * rand(input_dim, hidden_dim) - 1;
W_g = 2 * rand(hidden_dim, hidden_dim) - 1;
U_g_update = zeros(size(U_g));
W_g_update = zeros(size(W_g));

out_para = 2 * zeros(hidden_dim, output_dim) ;
out_para_update = zeros(size(out_para));
% C(t) = C(t-1) .* forget_gate + g_gate .* in_gate 
% S(t) = tanh(C(t)) .* out_gate                     
% Out  = sigmoid(S(t) * out_para)      


%train 
iter = 9999; % training iterations
for j = 1:iter
 
    % generate a simple addition problem (a + b = c)
    a_int = randi(round(largest_number/2));   % int version
    a     = int2binary{a_int+1};              % binary encoding
    
    b_int = randi(floor(largest_number/2));   % int version
    b     = int2binary{b_int+1};              % binary encoding
    
    % true answer
    c_int = a_int + b_int;                    % int version
    c     = int2binary{c_int+1};              % binary encoding
    
    % where we'll store our best guess (binary encoded)
    d     = zeros(size(c));
 
    
    % total error
    overallError = 0;
    
    % difference in output layer, i.e., (target - out)
    output_deltas = [];
    
    % values of hidden layer, i.e., S(t)
    hidden_layer_values = [];
    cell_gate_values    = [];
    % initialize S(0) as a zero-vector
    hidden_layer_values = [hidden_layer_values; zeros(1, hidden_dim)];
    cell_gate_values    = [cell_gate_values; zeros(1, hidden_dim)];
    
    % initialize memory gate
    % hidden layer
    H = [];
    H = [H; zeros(1, hidden_dim)];
    % cell gate
    C = [];
    C = [C; zeros(1, hidden_dim)];
    % in gate
    I = [];
    % forget gate
    F = [];
    % out gate
    O = [];
    % g gate
    G = [];
    
    % start to process a sequence, i.e., a forward pass
    % Note: the output of a LSTM cell is the hidden_layer, and you need to 
    for position = 0:binary_dim-1
        % X ------> input, size: 1 x input_dim
        X = [a(binary_dim - position)-'0' b(binary_dim - position)-'0'];
        % y ------> label, size: 1 x output_dim
        y = [c(binary_dim - position)-'0']';
        % use equations (1)-(7) in a forward pass. here we do not use bias
        in_gate     = sigmoid(X * U_i + H(end, :) * W_i);  % equation (1)
        forget_gate = sigmoid(X * U_f + H(end, :) * W_f);  % equation (2)
        out_gate    = sigmoid(X * U_o + H(end, :) * W_o);  % equation (3)
        g_gate      = tanh(X * U_g + H(end, :) * W_g);    % equation (4)
        C_t         = C(end, :) .* forget_gate + g_gate .* in_gate;    % equation (5)
        H_t         = tanh(C_t) .* out_gate;                          % equation (6)
        
        % store these memory gates
        I = [I; in_gate];
        F = [F; forget_gate];
        O = [O; out_gate];
        G = [G; g_gate];
        C = [C; C_t];
        H = [H; H_t];
        
        % compute predict output
        pred_out = sigmoid(H_t * out_para);
        
        % compute error in output layer
        output_error = y - pred_out;
        
        % compute difference in output layer using derivative
        % output_diff = output_error * sigmoid_output_to_derivative(pred_out);
        output_deltas = [output_deltas; output_error];
        
        % compute total error
        overallError = overallError + abs(output_error(1));
        
        % decode estimate so we can print it out
        d(binary_dim - position) = round(pred_out);
    end
    
    % from the last LSTM cell, you need a initial hidden layer difference
    future_H_diff = zeros(1, hidden_dim);
    
    % stare back-propagation, i.e., a backward pass
    % the goal is to compute differences and use them to update weights
    % start from the last LSTM cell
    for position = 0:binary_dim-1
        X = [a(position+1)-'0' b(position+1)-'0'];
        % hidden layer
        H_t = H(end-position, :);         % H(t)
        % previous hidden layer
        H_t_1 = H(end-position-1, :);     % H(t-1)
        C_t = C(end-position, :);         % C(t)
        C_t_1 = C(end-position-1, :);     % C(t-1)
        O_t = O(end-position, :);
        F_t = F(end-position, :);
        G_t = G(end-position, :);
        I_t = I(end-position, :);
        
        % output layer difference
        output_diff = output_deltas(end-position, :);
%         H_t_diff = (future_H_diff * (W_i' + W_o' + W_f' + W_g') + output_diff * out_para') ...
%                    .* sigmoid_output_to_derivative(H_t);

%         H_t_diff = output_diff * (out_para') .* sigmoid_output_to_derivative(H_t);
        H_t_diff = output_diff * (out_para') .* sigmoid_output_to_derivative(H_t);
        
%         out_para_diff = output_diff * (H_t) * sigmoid_output_to_derivative(out_para);
        out_para_diff =  (H_t') * output_diff;

        % out_gate diference
        O_t_diff = H_t_diff .* tanh(C_t) .* sigmoid_output_to_derivative(O_t);
        
        % C_t difference
        C_t_diff = H_t_diff .* O_t .* tan_h_output_to_derivative(C_t);
 
        % forget_gate_diffeence
        F_t_diff = C_t_diff .* C_t_1 .* sigmoid_output_to_derivative(F_t);
        
        % in_gate difference
        I_t_diff = C_t_diff .* G_t .* sigmoid_output_to_derivative(I_t);
        
        % g_gate difference
        G_t_diff = C_t_diff .* I_t .* tan_h_output_to_derivative(G_t);
        
        % differences of U_i and W_i
        U_i_diff =  X' * I_t_diff .* sigmoid_output_to_derivative(U_i);
        W_i_diff =  (H_t_1)' * I_t_diff .* sigmoid_output_to_derivative(W_i);
        
        % differences of U_o and W_o
        U_o_diff = X' * O_t_diff .* sigmoid_output_to_derivative(U_o);
        W_o_diff = (H_t_1)' * O_t_diff .* sigmoid_output_to_derivative(W_o);
        
        % differences of U_o and W_o
        U_f_diff = X' * F_t_diff .* sigmoid_output_to_derivative(U_f);
        W_f_diff = (H_t_1)' * F_t_diff .* sigmoid_output_to_derivative(W_f);
        
        % differences of U_o and W_o
        U_g_diff = X' * G_t_diff .* tan_h_output_to_derivative(U_g);
        W_g_diff = (H_t_1)' * G_t_diff .* tan_h_output_to_derivative(W_g);
        
        % update
        U_i_update = U_i_update + U_i_diff;
        W_i_update = W_i_update + W_i_diff;
        U_o_update = U_o_update + U_o_diff;
        W_o_update = W_o_update + W_o_diff;
        U_f_update = U_f_update + U_f_diff;
        W_f_update = W_f_update + W_f_diff;
        U_g_update = U_g_update + U_g_diff;
        W_g_update = W_g_update + W_g_diff;
        out_para_update = out_para_update + out_para_diff;
    end
 
    U_i = U_i + U_i_update * alpha; 
    W_i = W_i + W_i_update * alpha;
    U_o = U_o + U_o_update * alpha; 
    W_o = W_o + W_o_update * alpha;
    U_f = U_f + U_f_update * alpha; 
    W_f = W_f + W_f_update * alpha;
    U_g = U_g + U_g_update * alpha; 
    W_g = W_g + W_g_update * alpha;
    out_para = out_para + out_para_update * alpha;
    
    U_i_update = U_i_update * 0; 
    W_i_update = W_i_update * 0;
    U_o_update = U_o_update * 0; 
    W_o_update = W_o_update * 0;
    U_f_update = U_f_update * 0; 
    W_f_update = W_f_update * 0;
    U_g_update = U_g_update * 0; 
    W_g_update = W_g_update * 0;
    out_para_update = out_para_update * 0;
    
     
end
 

nn = newgrnn(train_x',train_y(:,1)',mean(mean(abs(out_para)))/2);

4. Operation steps and simulation conclusion

    Through the LSTM network recognition algorithm in this paper, the faces collected with different interference sizes are recognized, and the recognition accuracy curve is shown in the following figure:

 

    From the simulation results in Figure 2, it can be seen that with the reduction of the interference to the collected images, the LSTM recognition algorithm studied in this paper has the best recognition accuracy. The RNN neural network and the convolution-based deep neural network have the same recognition rate. The recognition rate performance of the neural network is obviously poor. The specific recognition rate is shown in the following table:

Table 1 Recognition rates of four comparison algorithms

algorithm

-15db

-10db

-5db

0db

5db

10db

15db

NN

17.5250

30.9500

45.0000

52.6000

55.4750

57.5750

57.6000

RBM

19.4000

40.4500

58.4750

67.9500

70.4000

72.2750

71.8750

RNN

20.6750

41.1500

60.0750

68.6000

72.5500

73.3500

73.3500

LSTM

23.1000

46.3500

65.0250

72.9500

75.6000

76.1000

76.3250

5. References

[01] Mi Liangchuan, Yang Zifu, Li Desheng, etc. Automatic robot vision control system [J]. Industrial Control Computer. 2003.3.

[02]Or1ando,Fla.Digital Image Processing Techniques.Academic Pr,Inc.1984

[03]K.Fukushima.A neural network model for selective attention in visual pattern recognition. Biological Cybernetics[J]October 1986‑55(1):5-15.

[04]T.H.Hidebrandt Optimal Training of Thresholded Linear Correlation Classifiers[J]. IEEE Transaction Neural Networks.1991‑2(6):577-588.

[05]Van Ooyen B.Nienhuis Pattern Recognition in the Neocognitron Is Improved by Neural Adaption[J].Biological Cybernetics.1993,70:47-53.

[06]Bao Qing Li BaoXinLi. Building pattern classifiers using convolutional neural networks[J]. Neural.Networks‑vol.5(3): 3081-3085.

[07]E S ackinger‑,B boser,Y lecun‑,L jaclel. Application of the ANNA Neural Network Chip to High Speed Character Recognition[J]. IEEE Transactions on Neural Networks 1992.3:498-505.

A05-40

6. How to obtain the complete source code

Method 1: Contact the blogger via WeChat or QQ

Method 2: Subscribe to the MATLAB/FPGA tutorial, get the tutorial case and any 2 complete source code for free

Guess you like

Origin blog.csdn.net/ccsss22/article/details/124025316