Detailed explanation of the principles of deep learning

 ChatGPT has become very popular recently, making more people realize that using deep learning of machines to complete things and make some decisions will become a trend in the future. Training these skills based on deep learning data will become one of the necessary professional skills for programmers. Therefore, we have reason to continue to pay attention to the field of deep learning and machine training. Today I will explain to you some basic process principles of current mainstream deep learning, as well as some of the current mainstream tools of deep learning.

1: Deep learning and data training

  Machine learning means that if a program is on task T, as the processing experience E increases, the effect P can also increase, then this program can learn from experience, and we can call it machine learning.

Traditional machine learning prepares input data, manually extracts some clearly marked features, and adds some weights to these features to predict the result. For example, barcode recognition uses the width ratio characteristics of the barcode to identify the data represented by the barcode.

  Deep learning is to prepare the input data, extract some of the most basic features, go through multiple layers of complex feature extraction, and then do some weight processing on the final features to predict the result. Multi-layer complex feature extraction here can be achieved by deploying a multi-layer neural network structure, and at the same time training the parameters of the neural network through a large amount of data training, thereby achieving " automatic multi-layer feature extraction ". After the parameters are trained, just throw the input into the neural network and you can predict the results you want. The depth in deep learning depends on the level and depth of your neural network.

To summarize, the main points that developers have to deal with in the process of deep learning are as follows:

  1. Prepare training data (original input data and correct results), prepare test data, and finally test.
  2. Combined with the problem itself, a network structure based on multi-layer neural network is constructed.
  3. The machine starts large-scale data training and updates neural network parameters (automatic multi-layer feature extraction) based on backpropagation algorithm and so on.
  4. After training, test the data, fine-tune the structure of the neural network, and repeatedly train to meet the requirements of the production environment.

2 Main problems and solutions of deep learning and deep neural networks

(1) Activation function realizes delinearization

  Many models are linear, and the output of the model is a weighted sum of the inputs, as follows:

y=a1*X1 + a2*X2 + …an*Xn+constant

It is difficult for linear models to solve nonlinear problems, so the activation function f is introduced.

y = f(a1*X1 + a2*X2 + …an*Xn+constant), which turns the model into a nonlinear model.

Commonly used activation functions are:

ReLU function: f(x)=max(x, 0)

sigmoid function: f(x)=1/(1+e^(-x))

tanh function: f(x)=(1-e^(-2x))/(1+e^(-2x))

There are many other activation functions, and you can choose a function model that fits the problem according to the problem model.

(2) Deep neural network extracts higher-dimensional features

When solving practical problems, we can build multi-layer deep neural networks. Multi-layer neural networks can extract more dimensional features from input features. Multi-level deep neural networks have the function of combining features. This feature is very helpful for problems where it is not easy to extract feature value vectors (image recognition, speech recognition, etc.), so recently deep learning has made great achievements in image recognition and voice recognition. Breakthrough progress.

(3) Loss function to evaluate training results

The training and optimization goals of neural networks need to be evaluated by results, and the evaluation of training results directly affects the effectiveness of training. We need to define a loss function to determine the result. For example, in handwritten digit recognition, we need to recognize a 28x28 pixel image data into the numbers 0 and 9. We can build such a neural network, turning the vector [data dimension of 784 pixels (28x28=784)] into 10 data dimensions [a0, a1, a2, a3, a4, a5, a6, a7, a8 , a9], and then make a loss determination to see which digital data vector the current data result is closest to.

For example, the standard vector corresponding to the number 1 is [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]. We need to define a function to describe which standard vector the output vector is closer to. In this way, we can predict which number the input picture corresponds to. This function is called the loss function. The classic loss function algorithm in deep learning is the cross-entropy algorithm.

For example, when vector A(0.5, 0.4, 0.1), vector B(0.8, 0.1, 0.1) and standard vector (1, 0, 0) are closer, we can use cross entropy to determine

HA = -(1*log0.5 + 0 * log0.4 + 0*log0.1) = 0.3

HB = -(1*log0.8 + 0 * log0.4 + 0*log0.1) = 0.1

So the vector 0.8, 0.1, 0.1 is closer to (1, 0, 0)

We can also customize the loss function according to the actual situation to make the training more realistic.

(5) Dynamically adjust the learning rate

  Learning rate refers to the speed and magnitude of parameter update. The greater the magnitude of parameter update, it may cause the parameters to move back and forth on both sides of the optimal solution, and the optimal solution will not be obtained or close to it. If the learning rate magnitude is too small, it will be easier to It is close to the optimal solution, but it will increase the amount of calculation and reduce the training speed. Therefore, the learning rate cannot be too large or too small and requires dynamic adjustment. Therefore, the learning rate will generally be attenuated. At the beginning, we can quickly approach the optimal solution range, and then we will stably converge to the optimal solution. Come and train nearby and get great results.

(6) Overfitting caused by sample noise

Sometimes due to the noise of the sample, the noise points are originally discarded, but during the later training and fitting, the noise points are also fitted, resulting in overfitting. To solve this kind of problem, we need to add index weights that describe the complexity of the model to the loss function to solve the pollution of the fitting by noise points and avoid overfitting.

3 Mainstream Deep Learning Open Source Tools

  There are many open source tools for deep learning that can help us quickly build and train neural networks. Among the more famous ones are: 

TensorFlow (maintained by Google);

Microsoft Cognitive Toolkit (CNTK) (maintained by Microsoft);

PaddlePaddle (maintained by Baidu);

Currently, most deep learning tools use TensorFlow. For example, the recently popular ChartGPT is trained using TensorFlow. The above-mentioned neural network-related tools are fully implemented, including data reading, neural network construction, neural network training, and implementation of functions used in deep learning, which greatly reduces the cost of doing AI training on our own. difficulty. At the same time, Tensorflow has some classic data sets to help us learn quickly. The main research areas of deep learning include the following:

  1. Computer vision and image recognition;
  2. Human speech recognition
  3. Processing of human natural language
  4. Man-machine game

For programmers, learning deep learning and machine training is one of the necessary skills in the future. I hope everyone will pay attention to this in the future. For game development, efficient generation of digital content is a future trend. In the future, a large part of the tools for generating digital content will be replaced by AI or a model of AI production + human intervention.

Guess you like

Origin blog.csdn.net/Unity_RAIN/article/details/134135562