Prediction algorithm|Gaussian process regression GPR algorithm principle and its optimization implementation

Insert image description here
Insert image description here

At present, commonly used machine learning methods mainly include non-probabilistic methods such as support vector machine (SVM) and backpropagation neural network (BPNN), and probabilistic methods such as Gaussian process regression (GPR). .

As a non-parametric probability kernel model [1], GPR can not only be used for prediction, but also provide a confidence interval for each point in the prediction, thereby quantifying the uncertainty of the prediction, and GPR has strict statistical learning It has a theoretical basis and has good adaptability to deal with complex problems such as high dimensions, small samples, and nonlinearity. Moreover, GPR has strong generalization ability. Compared with neural networks and support vector machines, it has the advantages of easy implementation, adaptive acquisition of hyperparameters, flexible non-parametric inference, and probabilistic meaning of output.

This article will introduce the principles of GPR and its optimized implementation (something a fan wanted to see a long time ago).

00 Catalog

1 GPR model principle

2 Overview of optimization algorithms and their improvements

3 GPR prediction model combined with optimization algorithm

4 Experimental results

5 Source code acquisition

01 GPR model principle

GPR is a popular non-parametric machine learning technique rooted in Bayesian statistics because it can have excellent prediction accuracy on small data sets [2] without over-fitting or under-fitting problems [3 ], and can give information about the uncertainty of the model output [4].

GPR has three main steps:

(1) Select an appropriate kernel function and define initial hyperparameters based on subjective prior knowledge;

(2) Use probability distribution to generate a priori model and train it, that is, find optimal hyperparameters through training samples;

(3) Predict the test sample and give the mean and variance of the estimated results.

Among them, how to obtain the optimal hyperparameters is a problem worth studying. Common methods such as traditional gradient-based algorithms or modern intelligent optimization algorithms can be used, and the negative log marginal likelihood (NLML) value is usually set as the target. function.

The following is a detailed explanation of the theoretical content of GPR [5]:

1.1 Forecast

A Gaussian process is defined as a set of random variables where any point obeys a joint Gaussian distribution. These variables are determined by the mean value u(x) and the covariance function (that is, the kernel function) k(x, x'), where the covariance function is expressed as the central moment of the random output variable corresponding to any two random input variables in the space, It can be used to measure the degree of similarity or correlation between the training set and the test set, which is a key factor affecting the prediction performance of the GPR model. They are defined as:

Insert image description here

Gaussian process (GP) can be expressed as follows:

Insert image description here

Since the Gaussian process is flexible and its properties are mainly determined by the kernel function, generally when using GPR for modeling, you can assume that its mean function μ (x) = 0 and select the form of the kernel function in advance.

Actual data usually contains a certain amount of noise, so for the regression model, the GPR model can be obtained by adding the noise ε to the observation target data y:
Insert image description here

Among them: x is the input vector, f is the function value, y is the observation value contaminated by noise, further assuming noise:
Insert image description here

The prior distribution of the observation value y can be obtained as:

Insert image description here

And the joint prior distribution of the observed value y and the predicted value f* is:
Insert image description here

Among them: K(X,X) is an n-order symmetric positive definite covariance matrix, K(X, x*) = K(x*, matrix.

From this, the posterior distribution of the predicted value f* can be calculated as:

Insert image description here

in,

Insert image description here

f- is the estimated value of f, and cov(f*) is the covariance matrix of the test sample.

1.2 Training

The training of GPR is actually the process of selecting an appropriate covariance function and determining its optimal hyperparameters. Different covariance functions determine the possible properties of the objective function under this Gaussian process prior. For example, the periodic covariance function indicates that the objective function is periodic; the square exponential covariance function indicates that the objective function has infinite derivatives, that is, it is smooth everywhere. In practical applications, the most widely used is the square exponential covariance function, whose specific form is as follows:

Insert image description here

In the formula, σ2f is the signal variance, S=dig(l^2), and l is the bandwidth of each dimension of the input vector, also called the variance scale. Each different covariance function has some hyperparameters like σ2f and l. The values ​​of the hyperparameters determine the specific shape of the covariance function and also determine the characteristics of the function sampled from the Gaussian process. The parameter set θ={S,σ2f,σ2n} is the hyperparameter, which is generally minimized by using optimization methods such as the conjugate gradient descent method to minimize the negative log marginal likelihood (NLML) after the partial derivative. Solve for hyperparameters. NLML is expressed as:
Insert image description here

After obtaining the optimal hyperparameters, use the formula in the last picture of the prediction part to obtain the predicted value of the test point and its variance.

The theoretical part is a brief explanation made by the author based on his own understanding, so there may be errors or unclear points. We also recommend that you read the following documents for in-depth study:

Nonparametric modeling of ship maneuvering motion based on Gaussian process regression optimized by genetic algorithm
MathWorks’s derivation of GPR: https://ww2.mathworks.cn/help/stats/gaussian-process-regression-models.html
Document 5

One of the areas where GPR can be optimized lies in its optimization method. Its own conjugate gradient optimization method has a strong dependence on the initial value. However, the current setting of the initial value has no theoretical basis. If the initial value is not set properly, it may It will cause it to fall into a local optimum during the search process and even fail to converge, so this part can be improved by introducing an optimization algorithm.

02 Overview of optimization algorithms and their improvements

In the previous article, the author introduced many optimization algorithms and their improved algorithms. In this article, the author uses PSO, GWO, WOA, and SSA as examples to demonstrate.

03 GPR prediction model combined with optimization algorithm

The main function of the optimization algorithm is to find the optimal hyperparameters. Taking the optimization process of GWO as an example, the flow chart of GWO-GPR is as follows:
Insert image description here

04 Experimental results

With root mean square error (Root Mean Square Error, RMSE), mean absolute percentage error (Mean Absolute Percentage Error, MAPE), mean absolute value error (Mean Absolute Error, MAE) and coefficient of determination (coefficient of determination, R^ 2) As an evaluation criterion.

Insert image description here

Insert image description here

Insert image description here

Insert image description here

05 Source code acquisition

The code comments are detailed. Generally, you only need to replace the data set. Note that the rows of the data are samples and the columns are variables. The source code provides 2 versions.

1. Free version

It is mainly a GPR prediction model. The data is the majority of input orders and output. It is written in Matlab. It is enough for students who need to make some simple predictions or want to learn the MATLAB implementation of the GPR algorithm.

How to obtain it - Official account (KAU's cloud experimental platform) backend reply **: GPR**

However, when calling the GPR model through MATLAB, Kaka found that some data sets had the problem of NaN in the predicted values. At present, I cannot solve this problem. Friends who know about it can also tell me. If the improvement is successful, Kaka can treat you to a cup of milk tea. hey-hey.

2. Paid version

Contains BP, GPR, PSO-GPR, GWO-GPR, WOA-GPR, SSA-GPR prediction model programs (MATLAB). The program has detailed annotations, and it is easier to replace the optimization algorithm with this program, as Kaka introduced before Intelligent optimization algorithms and their improvements can be replaced.

Due to the NaN value problem mentioned earlier, it is recommended to try the free version to see if your data set is available before purchasing this version, and then make a decision after observing the effect.

How to obtain it - Official account (KAU's cloud experimental platform) backend reply: IGPR or click "Read the original text" at the end of the article

Insert image description here

Insert image description here

[1] Rasmussen,C.E.,2004. Gaussian processes in machine learning.

[2] A. Kamath, R.A. Vargas-Hern´andez, R.V. Krems, T. Carrington, S. Manzhos, Neural networks vs Gaussian process regression for representing potential energy surfaces:A comparative study of fit quality and vibrational spectrum accuracy, J. Chem.Phys. 148 (2018).

[3] C.F.G.D. Santos, J.P. Papa, Avoiding overfitting: A survey on regularization methods for convolutional neural networks, ACM Comput. Surv. (CSUR) 54 (2022)1–25.

[4] Wenqi, F., Shitia, Z., Hui, H., Shaobo, D., Zhejun, H., Wenfei, L., Zheng, W., Tianfu, S.,Huiyun, L., 2018. Learn to make decision with small data for autonomous driving:deep gaussian process and feedback control. J. Adv. Transp., 8495264.

[5] He Zhikun, Liu Guangbin, Zhao Xijing, etc. A review of Gaussian process regression methods [J]. Control and Decision, 2013, 28(8): 1121-1129,1137.

Another note: If anyone has optimization problems to be solved (in any field), you can send them to me, and I will selectively update articles that use optimization algorithms to solve these problems.

If this article is helpful or inspiring to you, you can click Like/Reading (ง •̀_•́)ง in the lower right corner (you don’t have to click).

Guess you like

Origin blog.csdn.net/sfejojno/article/details/134807094