Using C++ to implement artificial intelligence neural network from 0 to 1 and practical cases

introduction

Since it is implemented in C++, we naturally think of designing a neural network class to represent the neural network, which I call the Net class here. Since this class name is too common, it is likely to conflict with programs written by others, so all my programs are included in namespace liu. From this, it is not difficult to think that my surname is Liu. In the previous blog's resource compilation of backpropagation algorithm, I listed several good resources. Students who are not familiar with the theory and are eager to learn can go out and turn left to read the resources in this article. It is assumed here that the reader has a certain understanding of the basic theory of neural networks.

1. Design of Net class and neural network initialization

Elements of Neural Networks

Before actually starting coding, it is still necessary to explain the basics of neural networks, which is actually the idea of ​​designing classes and writing programs. In short, neural networks contain several major elements:

  • neuron node

  • layer

  • Weights

  • Bias

The two major calculation processes of neural networks are forward propagation and back propagation. The forward propagation of each layer includes linear operations of weighted summation (convolution?) and nonlinear operations of activation functions. Backpropagation mainly uses the BP algorithm to update weights. Although there are many details in it, the above content is enough for this first article.

Net class design

Net class - based on Mat

Almost all calculations in neural networks can be expressed in the form of matrix calculations. This is one of the reasons why I use OpenCV’s Mat class. It provides a very complete and fully optimized matrix operation method; another reason is that I The most familiar library is OpenCV... There are many better libraries and frameworks that use many classes to represent different parts when implementing neural networks. For example, the Blob class represents data, the Layer class represents various layers, and the Optimizer class represents various optimization algorithms. But it's not that complicated here. The main reason is that the capabilities are limited. Only one Net class is used to represent the neural network.

Or let the program speak directly. The Net class is included in Net.h, roughly as follows.

#ifndef NET_H
#define NET_H
#endif // NET_H
#pragma once
#include <iostream>
#include<opencv2\core\core.hpp>
#include<opencv2\highgui\highgui.hpp>
//#include<iomanip>
#include"Function.h"
namespace liu
{
    class Net
    {
    public:
        std::vector<int> layer_neuron_num;
        std::vector<cv::Mat> layer;
        std::vector<cv::Mat> weights;
        std::vector<cv::Mat> bias;
    public:
        Net() {};
        ~Net() {};
        //Initialize net:genetate weights matrices、layer matrices and bias matrices
        // bias default all zero
        void initNet(std::vector<int> layer_neuron_num_);
        //Initialise the weights matrices.
        void initWeights(int type = 0, double a = 0., double b = 0.1);
        //Initialise the bias matrices.
        void initBias(cv::Scalar& bias);
        //Forward
        void forward();
        //Forward
        void backward();
    protected:
        //initialise the weight matrix.if type =0,Gaussian.else uniform.
        void initWeight(cv::Mat &dst, int type, double a, double b);
        //Activation function
        cv::Mat activationFunction(cv::Mat &x, std::string func_type);
        //Compute delta error
        void deltaError();
        //Update weights
        void updateWeights();
    };
}

illustrate

The above is not the complete form of the Net class, but a simplified version corresponding to the content of this article. After simplification, it will look clearer.

Member variables and member functions

Now the Net class has only four member variables, which are:

  • The number of neurons in each layer (layer_neuron_num)

  • layer

  • Weight matrix (weights)

  • Bias

It goes without saying that the weights are represented by matrices. It should be noted that for the convenience of calculation, each layer and bias term here are also represented by Mat, and each layer and bias are represented by a single-column matrix.

In addition to the default constructor and destructor, the member functions of the Net class also include:

  • initNet(): used to initialize the neural network

  • initWeights(): Initialize the weight matrix and call the initWeight() function

  • initBias(): initialize bias item

  • forward(): performs forward operations, including linear operations and nonlinear activation, and calculates errors at the same time

  • backward(): performs backward propagation and calls the updateWeights() function to update the weights.

These functions are already central to the core of neural network programs. The rest is to be implemented slowly, adding whatever is needed during implementation, opening roads across mountains and building bridges across rivers.

Neural network initialization

initNet() function

Let’s talk about the initNet() function first. This function only accepts one parameter—the number of neurons in each layer, and then initializes the neural network. The so-called initialization of the neural network here means: generating the matrix of each layer, each weight matrix and each bias matrix. It sounds simple, but it is actually very simple.

The implementation code is in Net.cpp.

There is no difficulty in generating various matrices here. The only thing that needs to be paid attention to is the determination of the number of rows and columns of the weight matrix. It is worth mentioning that the weights are all set to 0 by default.

    //Initialize net
    void Net::initNet(std::vector<int> layer_neuron_num_)
    {
        layer_neuron_num = layer_neuron_num_;
        //Generate every layer.
        layer.resize(layer_neuron_num.size());
        for (int i = 0; i < layer.size(); i++)
        {
            layer[i].create(layer_neuron_num[i], 1, CV_32FC1);
        }
        std::cout << "Generate layers, successfully!" << std::endl;
        //Generate every weights matrix and bias
        weights.resize(layer.size() - 1);
        bias.resize(layer.size() - 1);
        for (int i = 0; i < (layer.size() - 1); ++i)
        {
            weights[i].create(layer[i + 1].rows, layer[i].rows, CV_32FC1);
            //bias[i].create(layer[i + 1].rows, 1, CV_32FC1);
            bias[i] = cv::Mat::zeros(layer[i + 1].rows, 1, CV_32FC1);
        }
        std::cout << "Generate weights matrices and bias, successfully!" << std::endl;
        std::cout << "Initialise Net, done!" << std::endl;
    }

Weight initialization

initWeight() function

The weight initialization function initWeights() calls the initWeight() function, which is actually the difference between initializing one and multiple.

Offset initialization assigns the same value to all offsets. Here a Scalar object is used to assign values ​​to the matrix.

    //initialise the weights matrix.if type =0,Gaussian.else uniform.
    void Net::initWeight(cv::Mat &dst, int type, double a, double b)
    {
        if (type == 0)
        {
            randn(dst, a, b);
        }
        else
        {
            randu(dst, a, b);
        }
    }
    //initialise the weights matrix.
    void Net::initWeights(int type, double a, double b)
    {
        //Initialise weights cv::Matrices and bias
        for (int i = 0; i < weights.size(); ++i)
        {
            initWeight(weights[i], 0, 0., 0.1);
        }
    }

Offset initialization assigns the same value to all offsets. Here a Scalar object is used to assign values ​​to the matrix.

    //Initialise the bias matrices.
    void Net::initBias(cv::Scalar& bias_)
    {
        for (int i = 0; i < bias.size(); i++)
        {
            bias[i] = bias_;
        }
    }

At this point, all parts of the neural network that need to be initialized have been initialized.

Initialization test

We can use the following code to initialize a neural network. Although it has no function, it can at least test whether the current code has bugs:

#include"../include/Net.h"
//<opencv2\opencv.hpp>
using namespace std;
using namespace cv;
using namespace liu;
int main(int argc, char *argv[])
{
    //Set neuron number of every layer
    vector<int> layer_neuron_num = { 784,100,10 };
    // Initialise Net and weights
    Net net;
    net.initNet(layer_neuron_num);
    net.initWeights(0, 0., 0.01);
    net.initBias(Scalar(0.05));
    getchar();
    return 0;
}

2. Forward propagation and backward propagation

In the design of the Net class and the initialization of the neural network, most of them are relatively simple. Because the most important thing is to generate and initialize various matrices. The focus and core of the neural network is the content of this article - the two major computing processes of forward and back propagation. The forward propagation of each layer includes linear operations of weighted summation (convolution?) and nonlinear operations of activation functions. Backpropagation mainly uses the BP algorithm to update weights.

forward process

As mentioned before, the forward process is divided into two parts: linear operation and nonlinear operation. It's relatively simple.

Linear operation can be expressed asY=WX+b , where X is the input sample, here is the single-column matrix of the Nth layer, W is the weight matrix, and Y is the weighted sum The result matrix is ​​the same size as the single-column matrix of layer N+1. b is the bias, which is initialized to 0 by default. It is not difficult to infer that , the size of W is . The code for generating the weights matrix is ​​the same as the previous one:鬼知道我推了多久!(N+1).rows * N.rows

weights[i].create(layer[i + 1].rows, layer[i].rows, CV_32FC1); 

Nonlinear operations can be expressed asO=f(Y). Y is the Y obtained above. O is the output of layer N+1. f is the activation function we have been talking about. Activation functions are generally nonlinear functions. The value of its existence is to provide nonlinear modeling capabilities for neural networks. There are many types of activation functions, such as sigmoid function, tanh function, ReLU function, etc. For the advantages and disadvantages of various functions, you can refer to more professional papers and other more professional materials.

We can first look at the code of the forward function forward():

    //Forward
    void Net::forward()
    {
        for (int i = 0; i < layer_neuron_num.size() - 1; ++i)
        {
            cv::Mat product = weights[i] * layer[i] + bias[i];
            layer[i + 1] = activationFunction(product, activation_function);
        }
    }

The two sentences in the for loop are the linear operation mentioned above and the nonlinear operation of the activation function.

Activation functionactivationFunction() implements different types of activation functions, and you can choose which one to use through the second parameter. The code is as follows:

    //Activation function
    cv::Mat Net::activationFunction(cv::Mat &x, std::string func_type)
    {
        activation_function = func_type;
        cv::Mat fx;
        if (func_type == "sigmoid")
        {
            fx = sigmoid(x);
        }
        if (func_type == "tanh")
        {
            fx = tanh(x);
        }
        if (func_type == "ReLU")
        {
            fx = ReLU(x);
        }
        return fx;
    }

More details for each function are in the Function.h and Function.cpp files.

Back propagation process

The principle of backpropagation is the chain derivation rule, which is actually the derivation rule of composite functions in our high school mathematics. This is only used when deriving formulas. For the specific derivation process, I recommend reading the following tutorial, which uses diagrams to clearly show forward propagation and back propagation. It is highly recommended!

Principles of training multi-layer neural network using backpropagation。

Let’s first take a look at what the code of the backpropagation function backward() looks like:

     //Forward
    void Net::backward()
    {
        calcLoss(layer[layer.size() - 1], target, output_error, loss);
        deltaError();
        updateWeights();
    }

You can see that there are mainly three lines of code, which call three functions:

  • The first functioncalcLoss() calculates the output error and objective function, and the mean of the sum of squares of all output errors is used as the objective function to be minimized.

  • The second functiondeltaError() calculates the delta error, which is the delta1*df() part in the figure below.

  • The third functionupdateWeights() updates the weight, that is, uses the formula in the figure below to update the weight.

Below is a picture taken from the previously highly recommended article:

Let’s look at the code of the updateWeights() function:

    //Update weights
    void Net::updateWeights()
    {
        for (int i = 0; i < weights.size(); ++i)
        {
            cv::Mat delta_weights = learning_rate * (delta_err[i] * layer[i].t());
            weights[i] = weights[i] + delta_weights;
        }
    }

The core two lines of code should still be able to clearly reflect the weight update formula in the picture above. The eta in the formula in the picture is often called the learning rate. This is often required when training neural network parameters.

The part of calculating the output error and delta error is purely mathematical and lackluster. But post the code below.

calcLoss()Function inFunction.cppfile:

    //Objective function
    void calcLoss(cv::Mat &output, cv::Mat &target, cv::Mat &output_error, float &loss)
    {
        if (target.empty())
        {
            std::cout << "Can't find the target cv::Matrix" << std::endl;
            return;
        }
        output_error = target - output;
        cv::Mat err_sqrare;
        pow(output_error, 2., err_sqrare);
        cv::Scalar err_sqr_sum = sum(err_sqrare);
        loss = err_sqr_sum[0] / (float)(output.rows);
    }

deltaError()currently Net.cppcurrently:

    //Compute delta error
    void Net::deltaError()
    {
        delta_err.resize(layer.size() - 1);
        for (int i = delta_err.size() - 1; i >= 0; i--)
        {
            delta_err[i].create(layer[i + 1].size(), layer[i + 1].type());
            //cv::Mat dx = layer[i+1].mul(1 - layer[i+1]);
            cv::Mat dx = derivativeFunction(layer[i + 1], activation_function);
            //Output layer delta error
            if (i == delta_err.size() - 1)
            {
                delta_err[i] = dx.mul(output_error);
            }
            else  //Hidden layer delta error
            {
                cv::Mat weight = weights[i];
                cv::Mat weight_t = weights[i].t();
                cv::Mat delta_err_1 = delta_err[i];
                delta_err[i] = dx.mul((weights[i + 1]).t() * delta_err[i + 1]);
            }
        }
    }

Notice

It should be noted that the calculation formulas of the output layer and the hidden layer are different during calculation.

Another thing to note is...don't you think the code in this series of articles looks very friendly?

At this point, the core part of the neural network has been implemented. All that's left is to figure out how to train. At this time, if you want, you can still write a small program to perform forward propagation and back propagation several times. Again, who knows how long I spent debugging before I could propagate!

3. Training and testing of neural networks

In the previous section, we have implemented the design of the Net class and the processes of forward propagation and back propagation. It can be said that the core part of the neural network has been completed. Next is the application level.

If you want to use neural networks to solve practical problems, such as recognizing handwritten digits, you need to use neural networks to iteratively train samples. After the training is completed, we need to test whether the trained model is good or bad. This is what we need to implement now.

The improved Net class

What you need to know is that the current Net class is relatively complete. In order to realize the next functions, both member variables and member functions have become richer. The current Net class looks like this:

class Net
    {
    public:
        //Integer vector specifying the number of neurons in each layer including the input and output layers.
        std::vector<int> layer_neuron_num;
        std::string activation_function = "sigmoid";
        double learning_rate; 
        double accuracy = 0.;
        std::vector<double> loss_vec;
        float fine_tune_factor = 1.01;
    protected:
        std::vector<cv::Mat> layer;
        std::vector<cv::Mat> weights;
        std::vector<cv::Mat> bias;
        std::vector<cv::Mat> delta_err;

        cv::Mat output_error;
        cv::Mat target;
        float loss;

    public:
        Net() {};
        ~Net() {};

        //Initialize net:genetate weights matrices、layer matrices and bias matrices
        // bias default all zero
        void initNet(std::vector<int> layer_neuron_num_);

        //Initialise the weights matrices.
        void initWeights(int type = 0, double a = 0., double b = 0.1);

        //Initialise the bias matrices.
        void initBias(cv::Scalar& bias);

        //Forward
        void forward();

        //Forward
        void backward();

        //Train,use loss_threshold
        void train(cv::Mat input, cv::Mat target_, float loss_threshold, bool draw_loss_curve = false);        //Test
        void test(cv::Mat &input, cv::Mat &target_);

        //Predict,just one sample
        int predict_one(cv::Mat &input);

        //Predict,more  than one samples
        std::vector<int> predict(cv::Mat &input);

        //Save model;
        void save(std::string filename);

        //Load model;
        void load(std::string filename);

    protected:
        //initialise the weight matrix.if type =0,Gaussian.else uniform.
        void initWeight(cv::Mat &dst, int type, double a, double b);

        //Activation function
        cv::Mat activationFunction(cv::Mat &x, std::string func_type);

        //Compute delta error
        void deltaError();

        //Update weights
        void updateWeights();
    };

You can see that there is already a training function train(), a testing function test(), a predict() function that actually applies the trained model, and functions save() and load() that save and load the model. Most member variables and member functions should still be able to know their functions through their names.

train

Training function train()

This article focuses on the training function train() and the test function test(). These two functions accept input and label (or target value) as input parameters. The training function also accepts a threshold as the iteration termination condition. The last function can be ignored for the time being. It is a symbol for choosing whether to draw the loss value in real time.

The training process is as follows:

  1. Accepts a sample (i.e. a single column matrix) as input, which is the first layer of the neural network;

  2. Perform forward propagation, which is what the forward() function does. Then calculate the loss;

  3. If the loss value is less than the set threshold loss_threshold, backpropagation is performed to update the threshold;

  4. Repeat the above process until the loss is less than or equal to the set threshold.

The implementation of the train function is as follows:

    //Train,use loss_threshold
    void Net::train(cv::Mat input, cv::Mat target_, float loss_threshold, bool draw_loss_curve)
    {
        if (input.empty())
        {
            std::cout << "Input is empty!" << std::endl;
            return;
        }

        std::cout << "Train,begain!" << std::endl;

        cv::Mat sample;
        if (input.rows == (layer[0].rows) && input.cols == 1)
        {
            target = target_;
            sample = input;
            layer[0] = sample;
            forward();
            //backward();
            int num_of_train = 0;
            while (loss > loss_threshold)
            {
                backward();
                forward();
                num_of_train++;
                if (num_of_train % 500 == 0)
                {
                    std::cout << "Train " << num_of_train << " times" << std::endl;
                    std::cout << "Loss: " << loss << std::endl;
                }
            }
            std::cout << std::endl << "Train " << num_of_train << " times" << std::endl;
            std::cout << "Loss: " << loss << std::endl;
            std::cout << "Train sucessfully!" << std::endl;
        }
        else if (input.rows == (layer[0].rows) && input.cols > 1)
        {
            double batch_loss = loss_threshold + 0.01;
            int epoch = 0;
            while (batch_loss > loss_threshold)
            {
                batch_loss = 0.;
                for (int i = 0; i < input.cols; ++i)
                {
                    target = target_.col(i);
                    sample = input.col(i);
                    layer[0] = sample;

                    farward();
                    backward();

                    batch_loss += loss;
                }

                loss_vec.push_back(batch_loss);

                if (loss_vec.size() >= 2 && draw_loss_curve)
                {
                    draw_curve(board, loss_vec);
                }
                epoch++;
                if (epoch % output_interval == 0)
                {
                    std::cout << "Number of epoch: " << epoch << std::endl;
                    std::cout << "Loss sum: " << batch_loss << std::endl;
                }
                if (epoch % 100 == 0)
                {
                    learning_rate *= fine_tune_factor;
                }
            }
            std::cout << std::endl << "Number of epoch: " << epoch << std::endl;
            std::cout << "Loss sum: " << batch_loss << std::endl;
            std::cout << "Train sucessfully!" << std::endl;
        }
        else
        {
            std::cout << "Rows of input don't cv::Match the number of input!" << std::endl;
        }
    }

Two cases of iterative training with a single sample and multiple samples are considered here. There is also another train() function that does not use the loss threshold as the iteration termination condition, but uses the accuracy rate. The content is roughly the same and will not be shown here.

After training with the train() function, a model can be obtained. The so-called model can be simply thought of as a weight matrix. To put it simply, the neural network can be regarded as a combination of super functions. Let us temporarily think that this super function is y = f(x) = ax +b. Then the weights are a and b. The process of backpropagation treats a and b as independent variables, and continuously adjusts them to obtain the optimal value or approach the optimal value. After completing the back propagation, the optimal values ​​of parameters a and b are obtained during training, which is a fixed value. At this time the independent variable changes back to x. We hope that when the optimal values ​​of a and b are known parameters, for our input sample x, the result y calculated through the neural network is consistent with the actual result, which is a high probability event.

test

Test function test()

The function of the test() function is to test the trained model using a set of samples that were not used during training, and compare the results obtained through this model with the actual desired results to see what the correct results are. We want the accuracy to be as high as possible.

The steps of test() are roughly as follows:

  1. Enter a set of samples into the neural network one by one;

  2. Obtain an output value through forward propagation;

  3. Compare the actual output with the ideal output and calculate the accuracy.

The implementation of the test() function is as follows:

     //Test
    void Net::test(cv::Mat &input, cv::Mat &target_)
    {
        if (input.empty())
        {
            std::cout << "Input is empty!" << std::endl;
            return;
        }
        std::cout << std::endl << "Predict,begain!" << std::endl;

        if (input.rows == (layer[0].rows) && input.cols == 1)
        {
            int predict_number = predict_one(input);

            cv::Point target_maxLoc;
            minMaxLoc(target_, NULL, NULL, NULL, &target_maxLoc, cv::noArray());        
            int target_number = target_maxLoc.y;

            std::cout << "Predict: " << predict_number << std::endl;
            std::cout << "Target:  " << target_number << std::endl;
            std::cout << "Loss: " << loss << std::endl;
        }
        else if (input.rows == (layer[0].rows) && input.cols > 1)
        {
            double loss_sum = 0;
            int right_num = 0;
            cv::Mat sample;
            for (int i = 0; i < input.cols; ++i)
            {
                sample = input.col(i);
                int predict_number = predict_one(sample);
                loss_sum += loss;

                target = target_.col(i);
                cv::Point target_maxLoc;
                minMaxLoc(target, NULL, NULL, NULL, &target_maxLoc, cv::noArray());
                int target_number = target_maxLoc.y;

                std::cout << "Test sample: " << i << "   " << "Predict: " << predict_number << std::endl;
                std::cout << "Test sample: " << i << "   " << "Target:  " << target_number << std::endl << std::endl;
                if (predict_number == target_number)
                {
                    right_num++;
                }
            }
            accuracy = (double)right_num / input.cols;
            std::cout << "Loss sum: " << loss_sum << std::endl;
            std::cout << "accuracy: " << accuracy << std::endl;
        }
        else
        {
            std::cout << "Rows of input don't cv::Match the number of input!" << std::endl;
            return;
        }
    }

Here, when performing forward propagation, the forward() function is not called directly, but the predict_one() function is called. The function of the predict function is to give an input and give the desired output value. It contains a call to the forward() function. There is also the need to analyze the output of the neural network and convert it into a numerical value that seems more convenient.

4. Prediction and input and output analysis of neural network

Neural network predictions

Prediction function predict()

At the end of the previous article, the prediction function predict() of the neural network was mentioned, saying that predict calls the forward function and analyzes the output, outputting a value that seems more convenient to us.

predict()I believe the difference between the function and thepredict_one() function can be easily seen from the name, that is, the difference between inputting a sample and getting an output and outputting a set of samples and getting a set of outputs. Obviously first: . So let's take a look at the code of predict() should be implemented by cyclic callpredict_one()predict_one()

 int Net::predict_one(cv::Mat &input)
    {
        if (input.empty())
        {
            std::cout << "Input is empty!" << std::endl;
            return -1;
        }

        if (input.rows == (layer[0].rows) && input.cols == 1)
        {
            layer[0] = input;
            forward();

            cv::Mat layer_out = layer[layer.size() - 1];
            cv::Point predict_maxLoc;

            minMaxLoc(layer_out, NULL, NULL, NULL, &predict_maxLoc, cv::noArray());
            return predict_maxLoc.y;
        }
        else
        {
            std::cout << "Please give one sample alone and ensure input.rows = layer[0].rows" << std::endl;
            return -1;
        }
    }

You can see that the most important content in the second if statement is two lines: the forward propagation and output parsing mentioned above.

forward();
...
...
minMaxLoc(layer_out, NULL, NULL, NULL, &predict_maxLoc, cv::noArray());

Forward propagation obtains the last output layer layer_out, then extracts the position of the maximum value from layer_out, and the final output position y coordinate.

Output organization and parsing

To do this, we have to mention the form in which the label or target value exists here. Take the activation function as an example, the sigmoid function. The sigmoid function maps real numbers to the [0,1] interval, so obviously the final output y: 0<=y<=1. If the activation function is a tanh function, the output interval is [-1,1]. If it is a sigmoid and we want to recognize handwritten fonts, there are a total of ten numbers that need to be recognized: 0-9. Obviously our neural network cannot output a value greater than 1, so we cannot intuitively use the numbers 0-9 as the actual target value or label of the neural network.

The solution adopted here is to set the output layer as a matrix with a single column and ten rows. Set the elements in the rows to 1 according to the label, and set the rest to 0. Since programming generally starts with 0 as the first position, the position corresponds exactly to the numbers 0-9. At that time, we only need to find the location of the maximum output value, and we will know what the output is.

Of course, what is mentioned above is the case where the activation function is sigmoid. What if it is the tanh function? Then just set the first few positions to 1, and set all other positions to -1.

What if it is a ReLU function? The ReLU function ranges from 0 to positive infinity. So we can set the number of the label to what number it is, and set all others to 0. In the end, just find the position of the maximum value.

These need to be determined based on the activation function. In the code, opencv's minMaxLoc() function is called to find the position of the maximum value in the matrix.

How input is organized and read

Now that we have talked about the organization of output, let us also mention the organization of input. When generating a neural network, each layer is represented by a single-column matrix. Obviously the first input layer is a single column matrix. So in the process of preprocessing the data, the input samples and labels are arranged column by column and stored as a matrix. The first column of the label matrix is ​​the label of the sample in the first column. And so on.

It is worth mentioning that all input values ​​​​are normalized to between 0-1.

Since the values ​​here are all saved in the type float, the matrix Mat of this value cannot be directly saved in the image format, so here I chose to save the sample matrix after preprocessing and tag matrix saved into xml document. The code to convert the original csv file into an xml file can be found in the source code. incsv2xml.cpp. Part of the MNIST data that I converted is saved in the data folder and can be found on Github.

Reading and writing xml in opencv is very convenient. The following code writes data:

string filename = "input_label.xml";
FileStorage fs(filename, FileStorage::WRITE);
fs << "input" << input_normalized;
fs << "target" << target_; // Write cv::Mat
fs.release();

And reading the code is just as simple and clear:

cv::FileStorage fs;
fs.open(filename, cv::FileStorage::READ);
cv::Mat input_, target_;
fs["input"] >> input_;
fs["target"] >> target_;
fs.release();

Read samples and labels

Write a functionget_input_label()Extract a certain number of samples and labels from the xml file starting from the specified column. By default, reading starts from column 0, which is just a simple encapsulation of the above function:

    //Get sample_number samples in XML file,from the start column. 
    void get_input_label(std::string filename, cv::Mat& input, cv::Mat& label, int sample_num, int start)
    {
        cv::FileStorage fs;
        fs.open(filename, cv::FileStorage::READ);
        cv::Mat input_, target_;
        fs["input"] >> input_;
        fs["target"] >> target_;
        fs.release();
        input = input_(cv::Rect(start, 0, sample_num, input_.rows));
        label = target_(cv::Rect(start, 0, sample_num, target_.rows));
    }

At this point, you can actually start practicing and train the neural network to recognize handwritten digits. There's only one part that hasn't been mentioned yet, and that's saving and loading models. Next, we will talk about the save and load of the model, and then we can actually start training the examples.

5. Saving and loading models and drawing output curves in real time

Saving and loading models

After we complete the training of the neural network, we generally need to save the model. Otherwise, you need to train the model every time before using it. For data-hungry neural networks, depending on the amount of data and accuracy requirements, the training time ranges from a few minutes to hundreds of hours, which is a waste of time for anyone. Not worth it. Save the trained model and just load it when you need to use it.

A question that needs to be considered now is, when saving the model, what should we save?

As mentioned before, we can simply think of the weight matrix as the so-called model. Therefore, the weight matrix must be saved. What about other than that? One thing that cannot be forgotten is that we save the model so that we can use the model after loading. Obviously, after loading the model, input one or a group of samples to start the forward operation and backpropagation. This means that in the previous implementation, everything required before forward() is also required here, except that the weights are not initialized randomly, but are replaced by a trained weight matrix. Based on the above considerations, the following 4 contents were finally decided to be saved:

  1. layer_neuron_num, the number of neurons in each layer, which is the only parameter required to generate a neural network.

  2. weights, after the neural network is initialized, the trained weight matrix needs to be used to initialize the weights.

  3. activation_function, the process of using neural network is actually the process of forward calculation, obviously you need to know what the activation function is.

  4. learning_rate, if you want to continue training based on the existing model to get a better model, you need to use this function when updating the weights.

After deciding what needs to be saved, the next step is to implement it. It is still saved in the format of xml. Saving and loading have been mentioned abovexmlHow convenient:

    //Save model;
    void Net::save(std::string filename)
    {
        cv::FileStorage model(filename, cv::FileStorage::WRITE);
        model << "layer_neuron_num" << layer_neuron_num;
        model << "learning_rate" << learning_rate;
        model << "activation_function" << activation_function;

        for (int i = 0; i < weights.size(); i++)
        {
            std::string weight_name = "weight_" + std::to_string(i);
            model << weight_name << weights[i];
        }
        model.release();
    }

    //Load model;
    void Net::load(std::string filename)
    {
        cv::FileStorage fs;
        fs.open(filename, cv::FileStorage::READ);
        cv::Mat input_, target_;

        fs["layer_neuron_num"] >> layer_neuron_num;
        initNet(layer_neuron_num);

        for (int i = 0; i < weights.size(); i++)
        {
            std::string weight_name = "weight_" + std::to_string(i);
            fs[weight_name] >> weights[i];
        }

        fs["learning_rate"] >> learning_rate;
        fs["activation_function"] >> activation_function;

        fs.release();
    }

Draw output curve in real time

Sometimes in order to have an intuitive observation, we hope to use a curve to represent the output error in real time. But I couldn't find a satisfactory program available, so I wrote a very simple function to output the loss during training in real time. The ideal output would look something like this:

 Why is it said to be an ideal output? Because generally speaking, the error is very small. Maybe the curve starts directly from the lower left corner, and the large area above is not used. However, we can already see the general direction of loss.

The implementation of this function is actually to first draw two straight lines used as coordinates, and then connect adjacent points with straight lines:

    //Draw loss curve
    void draw_curve(cv::Mat& board, std::vector<double> points)
    {
        cv::Mat board_(620, 1000, CV_8UC3, cv::Scalar::all(200));
        board = board_;
        cv::line(board, cv::Point(0, 550), cv::Point(1000, 550), cv::Scalar(0, 0, 0), 2);
        cv::line(board, cv::Point(50, 0), cv::Point(50, 1000), cv::Scalar(0, 0, 0), 2);

        for (size_t i = 0; i < points.size() - 1; i++)
        {
            cv::Point pt1(50 + i * 2, (int)(548 - points[i]));
            cv::Point pt2(50 + i * 2 + 1, (int)(548 - points[i + 1]));
            cv::line(board, pt1, pt2, cv::Scalar(0, 0, 255), 2);
            if (i >= 1000)
            {
                return;
            }
        }
        cv::imshow("Loss", board);
        cv::waitKey(10);
    }

At this point, the neural network has been implemented. Complete code downloadUsing C++ to implement artificial intelligence neural network from 0 to 1 and practical cases.

The next step is to use the written neural network to start training with actual samples.

6. Practical handwritten digit recognition

After a series of previous efforts, actual combat can finally begin. How about trying the written neural network?

data preparation

MNIST dataset

Some people say that MNIST handwritten digit recognition is the Hello World in the field of machine learning, so this time I also started with handwritten font recognition. I found the handwritten digit recognition data set from Kaggle. The data has been saved in csv format, which is relatively easy to read.

The data set contains grayscale images of numbers 0-9. But this grayscale image is expanded. Before expansion, it is a 28x28 image, and after expansion it becomes a 1x784 row. In the csv file, each line has 785 elements. The first element is a numeric label, and the subsequent 784 elements are arranged with 184 expanded pixels. It looks like this:

 Maybe you have seen the labels 0-9 in the first column, but will wonder why the pixel values ​​​​are all 0, that is Because what can be displayed here is not even one line of the 28x28 image. The number should generally be at the center of the image, so of course there is nothing at the edge. You can see non-zero pixel values ​​by sliding back. Like this:

What needs to be noted here is that the range of pixel values ​​is 0-255. Generally, normalization is performed in the data preprocessing stage, all are divided by 255, and the values ​​are converted to between 0-1.

The csv file contains 42,000 samples. With so many samples, for the 4,000 yuan-level broken notebook I bought seven years ago, it would take half a day just to read them once, let alone use so many samples for iterative training. It is simply It’s a nightmare (and considering how many years a struggling student can earn enough money to replace a computer!). So I just extracted the first 1000 samples and saved both the normalized samples and labels into an xml file. The organizational form of input and output has been mentioned in a previous blog, so I copied it directly.

Now that we have talked about the organization of output, let us also mention the organization of input. When generating a neural network, each layer is represented by a single-column matrix. Obviously the first input layer is a single column matrix. So during the process of preprocessing the data, I arranged the input samples and labels column by column and stored them as a matrix. The first column of the label matrix is ​​the label of the sample in the first column. And so on.

Set the output layer to a matrix with a single column and ten rows. Set the element in the row of whatever label is to 1, and set the rest to 0. Since programming generally starts with 0 as the first position, the position corresponds exactly to the numbers 0-9. At that time, we only need to find the location of the maximum output value, and we will know what the output is. "

 Just repeat it here, this part of the code is incsv2xml.cpp:

#include<opencv2\opencv.hpp>
#include<iostream>
using namespace std;
using namespace cv;


//int csv2xml()
int main()
{
    CvMLData mlData;
    mlData.read_csv("train.csv");//读取csv文件
    Mat data = cv::Mat(mlData.get_values(), true);
    cout << "Data have been read successfully!" << endl;
    //Mat double_data;
    //data.convertTo(double_data, CV_64F);

    Mat input_ = data(Rect(1, 1, 784, data.rows - 1)).t();
    Mat label_ = data(Rect(0, 1, 1, data.rows - 1));
    Mat target_(10, input_.cols, CV_32F, Scalar::all(0.));

    Mat digit(28, 28, CV_32FC1);
    Mat col_0 = input_.col(3);
    float label0 = label_.at<float>(3, 0);
    cout << label0;
    for (int i = 0; i < 28; i++)
    {
        for (int j = 0; j < 28; j++)
        {
            digit.at<float>(i, j) = col_0.at<float>(i * 28 + j);
        }
    }

    for (int i = 0; i < label_.rows; ++i)
    {
        float label_num = label_.at<float>(i, 0);
        //target_.at<float>(label_num, i) = 1.;
        target_.at<float>(label_num, i) = label_num;
    }

    Mat input_normalized(input_.size(), input_.type());
    for (int i = 0; i < input_.rows; ++i)
    {
        for (int j = 0; j < input_.cols; ++j)
        {
            //if (input_.at<double>(i, j) >= 1.)
            //{
            input_normalized.at<float>(i, j) = input_.at<float>(i, j) / 255.;
            //}
        }
    }

    string filename = "input_label_0-9.xml";
    FileStorage fs(filename, FileStorage::WRITE);
    fs << "input" << input_normalized;
    fs << "target" << target_; // Write cv::Mat
    fs.release();


    Mat input_1000 = input_normalized(Rect(0, 0, 10000, input_normalized.rows));
    Mat target_1000 = target_(Rect(0, 0, 10000, target_.rows));

    string filename2 = "input_label_0-9_10000.xml";
    FileStorage fs2(filename2, FileStorage::WRITE);

    fs2 << "input" << input_1000;
    fs2 << "target" << target_1000; // Write cv::Mat
    fs2.release();

    return 0;
}

This is the code I used when using ReLU recently. I set the label to whatever number it is, and set all others to 0. In the end, just find the position of the maximum value.

The role of in the code Mat digit is to check whether the converted matrix and label correspond correctly. Here, col(3), that is, the fourth sample is re-transformed from one row to into a 28x28 image. Looking at the first column of the first picture above, you can see that the label of the fourth sample is 4. So what does the image it converts back to look like? It looks like this:

This also proves why the first picture looks like all pixels are 0. Can the edge be completely black and not be 0?

Then use the get_input_label() mentioned earlier to obtain a certain number of samples and labels.

 Practical digital recognition

Without further ado, let’s talk directly about the training process:

  1. Given the number of neurons in each layer, initialize the neural network and weight matrix

  2. Take the first 800 samples from the inputlabel1000.xml file as training samples, and the last 200 as test samples.

  3. These are some parameters of the neural network: termination conditions during training, learning rate, activation function type

  4. The first 800 samples train the neural network until the loss is less than the threshold loss_threshold and stop.

  5. The neural network is tested on the last 200 samples and the accuracy is output.

  6. Save the trained model.

The training code using sigmoid as the activation function is as follows:

#include"../include/Net.h"
//<opencv2\opencv.hpp>

using namespace std;
using namespace cv;
using namespace liu;

int main(int argc, char *argv[])
{
    //Set neuron number of every layer
    vector<int> layer_neuron_num = { 784,100,10 };

    // Initialise Net and weights
    Net net;
    net.initNet(layer_neuron_num);
    net.initWeights(0, 0., 0.01);
    net.initBias(Scalar(0.5));

    //Get test samples and test samples 
    Mat input, label, test_input, test_label;
    int sample_number = 800;
    get_input_label("data/input_label_1000.xml", input, label, sample_number);
    get_input_label("data/input_label_1000.xml", test_input, test_label, 200, 800);

    //Set loss threshold,learning rate and activation function
    float loss_threshold = 0.5;
    net.learning_rate = 0.3;
    net.output_interval = 2;
    net.activation_function = "sigmoid";

    //Train,and draw the loss curve(cause the last parameter is ture) and test the trained net
    net.train(input, label, loss_threshold, true);
    net.test(test_input, test_label);

    //Save the model
    net.save("models/model_sigmoid_800_200.xml");

    getchar();
    return 0;

}

Comparing the six processes mentioned above, the code should be very clear. The parameter output_interval is to output once every few iterations. This is set to output once after two iterations.

If trained according to the above parameters, the accuracy rate is 0.855:

With only 800 samples, I think this accuracy rate is still acceptable.

If you want to use the trained samples directly, it is even simpler:

    //Get test samples and the label is 0--1
    Mat test_input, test_label;
    int sample_number = 200;
    int start_position = 800;
    get_input_label("data/input_label_1000.xml", test_input, test_label, sample_number, start_position);

    //Load the trained net and test.
    Net net;
    net.load("models/model_sigmoid_800_200.xml");
    net.test(test_input, test_label);

    getchar();
    return 0;

If the activation function is a tanh function, since the value range of the tanh function is [-1,1], the label matrix needs to be slightly changed during training. The changes that need to be made are as follows:

    float loss_threshold = 0.2;
    net.learning_rate = 0.02;
    net.output_interval = 2;
    net.activation_function = "tanh";

    //convert label from 0---1 to -1---1,cause tanh function range is [-1,1]
    label = 2 * label - 1;
    test_label = 2 * test_label - 1;

Not only the labels have been changed here, but there are also several parameters that need to be changed.The learning rate is an order of magnitude smaller than that of sigmoid, and the effect will be better >. The accuracy rate trained in this way is about 0.88, which is acceptable.

7. Source code download

 Using C++ to implement artificial intelligence neural network from 0 to 1 and practical cases

Guess you like

Origin blog.csdn.net/qq_39312146/article/details/134485096