Proxy Model Optimization Algorithm

Edited from: https://zhuanlan.zhihu.com/p/99609634

WeChat official account: Mat story scientific research data analysis

☆Reading this article is best to know some knowledge points in advance (a little bit) (you can read it if you don’t understand it)

(1) Interpolation method: use the function f (x) to insert the function values ​​of several points in a certain interval, make an appropriate specific function, take known values ​​at these points, and use the value of this specific function at other points in the interval As an approximation to the function f(x).

(2) Neural network: a universal approximation model. The core of contemporary artificial intelligence and deep learning. Students who have never been exposed to this concept must not be able to explain it clearly in a sentence or two.

(3) Kriging model: It was first proposed in the field of geostatistics, a practical space estimation technique, and also a kind of interpolation method.

(4) Response surface model: multiple regression equations are used to fit the functional relationship between factors and response values, and optimal process parameters are sought through the analysis of regression equations.

(5) Radial basis function: a kind of Euclidean distance between the test point and the unknown point as the input of the radial function. By using the Euclidean distance as a transit mechanism, the multidimensional problem is converted into a one-dimensional problem, thereby reducing the complexity of the model. Functions that satisfy these conditions can be regarded as radial basis functions, so there are many kinds of radial basis functions.

When trying to build a mathematical model in an experiment, I sometimes encounter the following situation:

1. There is an actual mathematical model, but we do not know its specific and explicit expression (category), and cannot parameterize the expression. Only some data is obtained at hand, and the cost of obtaining data is very high. This ubiquitous phenomenon is called "black box". For example, designing a car and testing for crash safety, the exact mathematical model that exists between the design parameters (vehicle speed, frame structure, brakes, etc.) and the degree of collision is unknown. How to get more data? Just build a car with these parameters and hit it against a wall. Hey guys, getting a set of data requires scrapping a car. This is called the high cost of data acquisition.

2. The actual type of mathematical model is known, but it is very complex, so when the variables have high dimensions, the calculation speed is very slow (even with a computer). This is also called high data acquisition cost (time cost).

Proxy optimization, or surrogate model, refers to an approximate learning model that can replace those complicated and time-consuming models in the process of analysis and optimization design. At the earliest, the prototype of the proxy model was the polynomial response surface model (many people use it, in fact, often you are already adopting the way of thinking of the proxy model, but you don’t realize it yourself). With the development of technology, the proxy model is no longer just a simple replacement, but constitutes an optimization mechanism that drives the addition of sample points based on historical data to approach the global optimal solution. At the same time, surrogate models for complex multidimensional problems do not have to have high approximation in the entire design space, but only need to achieve high approximation near the global optimal solution [1].

The proxy model has developed many methods such as polynomial response surface, polynomial interpolation, kriging interpolation, radial basis function interpolation, neural network, support vector machine regression and so on.

Among them, polynomial response surface and polynomial interpolation are close to the field of regression fitting and are relatively simple. Support vector machine regression (relative to classification support vector machine) has fewer applications and will not be discussed here.

As an independent system, the neural network should be said to have the highest degree of approximation to the original model. With the increase of nodes and hidden layers, the perfect approximation of the model can be achieved. However, its calculation speed will not be very fast, and its explainability is poor. Many networks themselves are a black box. In the end, you use one black box to approach another black box.

Kriging interpolation and radial basis function interpolation are two methods that are widely used at present. Kriging model is also called Gaussian stochastic process model. I have described it in more detail in a previous article. ( Powerful regression model - Gaussian process regression ) Why is there a difference in the name? My personal understanding is that Kriging's name emphasizes the idea of ​​interpolation, and Gaussian process regression emphasizes the idea of ​​regression. Under the influence of a powerful "Gaussian process kernel (such as a matern kernel)", the difference between interpolation and regression is not so significant.

This paper mainly introduces the radial basis function proxy optimization algorithm (hereinafter referred to as proxy optimization).

The first thing to note is that the proxy optimization algorithm must have bound constraints. If the desired problem does not have boundary constraints, you can choose to use -1000 to 1000 as the default boundary. The objective function does not have to be smooth, but the algorithm works best when the objective function is continuous. The purpose of proxy optimization is to try to find the global minimum of the objective function using only a few calculations of the objective function. To this end, the algorithm will try to balance the optimization process between the two goals of "exploration" and "speed".

Agent optimization algorithms are divided into serial algorithms and parallel algorithms. Among them, the serial algorithm is the foundation and core, and this article only introduces serial proxy optimization.

The algorithm will alternate between two phases. Here is a brief version of the steps:

(1) Constructing agent stage: First, create several random points (less) within the boundary range, and calculate the objective function on these points. Through these points, a radial basis function is interpolated to construct a "proxy function" for the objective function.

(2) The stage of searching for the minimum value: within the boundary range, more points are randomly sampled. A merit function is estimated based on the "proxy values" of these points and their distances from points known to the objective function. The global optimal point of the merit function is used as the "candidate point". Calculate the objective function at the "candidate points". And call it the "adaptation point". Update the proxy with this value, and search again. Under certain conditions, perform "proxy reset" and return to step (1)

The following are the details of the algorithm:

1. Terminology

(1) Objective function: the real model. The ultimate goal of agent optimization is to require the global optimal value of the objective function. The biggest difference between proxy optimization and other models is that the objective function is unknown, or although it can be solved, it is expensive and time-consuming to calculate.

(2) Surrogate function (Surrogate function, "S"): the "surrogate" of the objective function, simplifying the complexity. Here, the radial basis interpolation function is used as the proxy function.

(3) Current value: the point at which the objective function was calculated last time.

(4) Incumbent point: The point at which the objective function value is the smallest after the most recent “agent reset”. This translation is tentatively translated by me. Incumbent point is a key concept to understand the model.

(5) Optimum point: the smallest point among all calculated objective function values ​​since the beginning of the algorithm. The best point at the end of the algorithm is our final global optimal solution.

(6) Initial point: Before the algorithm starts, you pass to the algorithm some known objective function values, the initial point is not necessary.

(7) Random points: In the stage of constructing agents, the algorithm will calculate the objective function at these points. Generally, the algorithm takes these points from a pseudo-random sequence, scales and shifts them. To ensure that it is within the required range of the boundary. In particular, when the number of variables is very large (more than 500 are often required), the algorithm will obtain points from the sequence of Latin hypercubes. (For the Latin hypercube, please see Experimental Design Method (2) - Introduction to the Latin Hypercube )

(8) Adaptive point: the point in the stage of "searching for the minimum value". Here, the algorithm evaluates the objective function.

(9) Merit function: It will be described in detail later. This translation is tentatively translated by me.

(10) Calculated points: or known value points, that is, all points known to the objective function, including initial points, construction proxy points, and points calculated during the minimum search stage.

(11) Sampling point: the point where the algorithm evaluates the "merit function" in the minimum value search phase. These points have calculated the merit function instead of the objective function.

(12) Scale: A quantitative parameter in the search for minimum stage.

2. Algorithm steps
The algorithm will alternate between two phases.

1. Construction agent stage

The algorithm first selects some quasi-random points within the boundary. If you have some initial points, then the algorithm will use those points. Get a total of N points (this number is also set by you), and then the algorithm will compute the objective function on these points. Because the calculation of the objective function is expensive, N does not need to be chosen too large.

The radial basis function proxy optimization algorithm employs a radial basis function (RBF) interpolator to construct an interpolating proxy of the objective function. Commonly used radial basis functions can be selected in the proxy optimization model. for example:

img

But Matlab chose the most suitable one. It uses a cubic RBF with a linear tail, which minimizes image jerking. After reading this, everyone already has a question. Why choose the RBF function? Because RBF has several convenient properties, it is the most suitable function for constructing agents.

(1) The RBF interpolator uses the same definition formula in the case of any dimension and any number of points.

(2) The specified value must be obtained at the calculated point. It means that these points must be interpolated and fitted with RBF.

(3) Computing an RBF interpolator takes very little time.

(4) When you already have an RBF interpolator, it takes relatively little time to add a point to the existing interpolation expression.

(5) Constructing an RBF interpolator will involve solving an N*N linear system of equations. where N is the number of proxy points. For RBF, the system has a unique solution.

img

Schematic diagram of the stage of constructing a proxy (1) taking points within the boundary in a pseudo-random manner (2) computing the objective function at these points (3) creating a "proxy" using interpolation

2. Search for the minimum value stage

The algorithm searches for the minimum value of the objective function through a local search process. "Scale" is similar to "radius" or "grid" size in pattern search. In Matlab, the initial value of the scale is generally set to 0.2. Algorithm will start from "on-the-job point". The incumbent point is the point at which the objective function has a minimum value since the last agent reset.

The algorithm searches for the minimum value of the "merit function". The merit function is a function that correlates the distance between the agent and the searched value at the same time, so as to balance the two tasks of "minimizing the agent" and "search space (broadness)".

The algorithm takes hundreds or thousands of pseudo-random vectors of length proportional (say 0.2). Added to incumbent points. to get sample points. These vectors are normally distributed, shifted and scaled by bounds in each dimension. If the sampling point falls outside the bounds, the algorithm will change the sampling point. Keep them within boundaries.

Then, the algorithm calculates the merit function at these sample points.

In the previous step, the point with the lowest merit function is called the adaptive point. The algorithm will compute the objective function value at the point of adaptation and update the "agent" with this value. If the objective function value at the "adaptive point" is lower than the incumbent point, the algorithm considers the search "successful". And set the adaptive point as the new in-service point. Otherwise, the algorithm considers the search a failure and does not change the incumbent. (Okay, the translation at the job site is really not very good, everyone will make up for it, remember the English Incumbent point)

The algorithm will change the value of "Scale" when the following conditions are met.

(1) A cumulative three successful searches have been conducted since the last scale change. Double the scale at this point. Up to the maximum scale, its length is 0.8 times the size of the box bounded by the bounds.

(2) There have been max(5,n) unsuccessful searches (n being the variable dimension) since the last scale change. At this point, halve the scale. Until the minimum scale length is 0.00001 times the size of the box bounded by the bounds.

Repeating this way, the random search will eventually focus on the in-service point where the objective function value is the smallest. And eventually the scale will be reduced to the smallest size.

It should be noted that the algorithm will not calculate any merit function within a certain distance from the known value point. And when all the sample points are within this specific range of known value points, the algorithm switches from the search minimum stage to the construction agent stage again. This is called a "proxy reset". Usually this reset occurs after the scale of the algorithm is reduced, so all sample points are tightly clustered around the incumbent point.

img

Schematic diagram of the search for the minimum phase (1) Sampling around the incumbent point (2) Computing the merit function (3) Computing the objective function at the adaptive point (4) Updating the interpolation agent and scale

3. Introduction to merit function
Merit function is an important concept in agent optimization. The merit function is a function that correlates the distance between the agent and the searched value at the same time, so as to balance the two tasks of "minimizing the agent" and "search space (broadness)". The merit function fmerit(x) is a weighted combination of:

(1) "Proportional proxy": define Smin as the "minimum proxy value" between sampling points. Smax is the "maximum proxy value" between sampling points. Sx is the proxy value at x. Then the proportional agent S(x) is:

img

S(x) is non-negative, and the sample point S(x) with the smallest proxy value is 0. (capacity to minimize agent)

(2) "Scaled distance": define xj as k known value points, define dij as point i and known value point

dmin = min(dim)

dmax = max(dij)

That is, for all maximum and minimum distances of ij, the normalized distance D(x) is

img

where d(x) is the minimum distance from point x to a point of known value

D(x) is non-negative, and D(x) is 0 when point x is farthest from all known value points.

Therefore, to minimize D(x), the algorithm looks for the points that are furthest away from the known values. (ability to search space)

The merit function is the weighted sum of the above two items. For a weight w less than 1, the merit function

img

(1) A larger w will make the algorithm pay more attention to the agent, which will cause the search to tend to the minimum value of the agent.

(2) Smaller w will make the algorithm pay more attention to exploring new areas farther away, thus directing the search to new areas. In Matlab, the default w will cycle through the four numbers 0.3, 0.5, 0.7, and 0.95. This achieves both purposes.

4. Summary
The proxy optimization algorithm is an approximation algorithm specially designed to deal with the costly and time-consuming objective function. A surrogate model is a function that approximates the objective function (at least around the minimum point). Surrogate models take very little time to compute. The point to search for that minimizes the objective function. The proxy function only needs to be evaluated at more points. And use the best value as an approximation of the value that minimizes the objective function. Surrogate optimization algorithms have been shown to converge to globally optimal solutions for continuous objective functions over bounded domains. However, its convergence is not fast (still much faster than the algorithm that directly calculates the objective function).

At present, the scope of application of the proxy optimization algorithm is relatively small, and it is a very unpopular algorithm. Correspondingly, there are fewer papers on proxy optimization algorithms. However, I am personally optimistic about its future development, which will be increasingly used in model tests in various fields. Eventually, proxy optimization developed into a general optimization model. It becomes a research strategy when faced with task-unknown problems. Yes, it's not just a model, it's a way of thinking strategically.

This should be my last article in 2019. Thank you for your unwavering support. Everyone, see you in 2020.

img

In the paper database, there are only 3 Chinese papers with the keyword "surrogate optimization" as the keyword. If you search for English papers with the keyword Surrogate optimization, you can find 513 papers

http://weixin.qq.com/r/QB21rSLE5H_brWbO90hS (QR code automatic recognition)

references:

1. Han Zhonghua. Research progress on Kriging model and agent optimization algorithm. Acta Aeronautics Sinica, 2016, 37 (11).

2.[https://ww2.mathworks.cn/help/gads/surrogate-optimization-algorithm.html

Guess you like

Origin blog.csdn.net/qq_41854911/article/details/130590046