Regression Model of High Frequency Application Technology in Intelligent Manufacturing

foreword

When we introduce technologies such as data-driven methods and artificial intelligence into the industrial and manufacturing fields, we will find that an important difference between such application scenarios and general application scenarios is the emphasis on regression models. In general AI application scenarios and commercial data analysis scenarios, we pay more attention to classification and clustering models: these models form qualitative judgments on the goals we want to recognize, such as judging whether the current market state is booming, or visually identifying whether a person has Door access and so on. Industrial scenarios usually have higher requirements.

Many industrial application scenarios require us to form quantitative judgments on observation targets. These application scenarios appear in the entire life cycle of R&D, manufacturing, operation and maintenance of industrial products. This requires us to use more quantifiable regression models in industrial scenarios. By building a regression model, we can summarize and grasp the laws that exist in each stage of the industrial product life cycle. Then, on this basis, cost reduction and efficiency increase, quality improvement, problem avoidance and even real-time control can be realized. Let's choose an example from each phase to illustrate these scenarios:

  1. When designing a product, different design parameter selections will result in different product performance. In theory, the relationship between design parameters and performance can be obtained through various simulation calculations, and on this basis, the optimal design parameters can be obtained by using mathematical programming (optimization) methods. However, in actual work, it takes a lot of time and hardware resources to perform each simulation, which makes it difficult to advance the optimization design. To this end, it is necessary to introduce the so-called proxy model. The proxy model is based on several simulation results and simulates a simulation process with a relatively very small amount of calculation instead, thereby greatly reducing the call of the real simulation process in the optimization design. The proxy model here is a regression model based on the data of several simulation calculation results.

  2. The production line of the manufacturing industry has a large number of industrial parameters and influencing factors, which affect the yield of finished products. Then, under the condition of perceiving the values ​​of these parameters and influencing factors, we hope to establish some approximate functional relationship between these elements and the yield rate of finished products, which requires us to establish a regression between the perception data and the yield rate of products Model.

  3. For complex and expensive industrial products such as nuclear energy facilities, aerospace, aviation, marine products, and advanced manufacturing production lines, using digital twin technology to monitor their operating status in real time and based on this to estimate the possible operating risks of products is an important development and construction at present. direction. This technology requires rapid calculation of the state of the overall system and the state changes of the system in the next period of time after obtaining the reality perception data. Due to the high real-time nature of the digital twin, it is often not feasible to use a complex mechanism model to solve this state. At this time, it is required that the digital twin system contains an approximate representation of the system mechanism relationship. This representation combines the mechanism relationship of the system with the actual state information of the system to form a system approximation model with low computational complexity, good robustness and high precision. The solution of this approximate model mostly belongs to regression model construction.

These examples demonstrate the generality of regression model construction in various stages of smart manufacturing. The large number of applications of such regression models is an important feature of the use of artificial intelligence technology in the industrial field. Although many articles have introduced the regression model, they often focus on the introduction of specific algorithms and lack of systematic induction.

For this reason, we specially write this article to sort out the relevant technical issues of the regression model, in order to help relevant practitioners understand the overall technical framework of the regression model. The following part of this article discusses the following: First, introduce the concept of regression, including its difference and connection with "fitting"; secondly, we introduce a series of classic regression methods starting from the least square method, and introduce the common calculation and evaluation of regression models methods; finally, we summarize all the methods and give their feature comparisons for actual industrial intelligence practitioners to select models according to their needs.

Definition of regression

In statistics, regression analysis refers to a statistical analysis method to determine the quantitative relationship between two or more variables. -Baidu Encyclopedia

The term "regression" was first proposed by Galton, a famous British biologist and statistician, cousin of Darwin (who proposed the theory of evolution). He found that parents whose height is significantly higher than the average, their offspring will generally be lower than their parents; on the other hand, parents whose height is significantly lower than the average have statistically higher backgrounds than their parents. He described this law as the tendency for the height of human descendants to "return" to the mean value, thus using the word "regression". What is noteworthy in this anecdote is that the concept of "regression" first came from the field of statistics. It can be said that, in a sense, regression is to seek the quantitative relationship between variables that best conforms to a certain statistical probability. Mainly solve the following problems:

  1. Determine whether there is a correlation between several specific variables, and if so, find out the appropriate mathematical expression between them.
  2. According to the value of one or several variables, predict or control the value of another variable, and know what kind of accuracy this prediction or control can achieve.

A mathematical concept very close to the concept of regression is "curve fitting". Since the actual computing methods covered by these two concepts overlap a lot, in many cases we do not distinguish between them and treat them as the same thing. This is fine in most cases, except for the following:

  1. Regression analysis does not require a priori model, and may choose different representation methods according to the characteristics of the data or different optimization goals. Curve fitting usually has an a priori model, so the main task of fitting is to determine reasonable parameters for its model, which is also the meaning of the word fitting.
  2. Regression analysis is based on statistics, so in addition to obtaining the relationship between data, it is usually necessary to estimate the statistical characteristics of this relationship, such as the statistical characteristics of random residuals that exist between data other than determining the relationship. The task of curve fitting is usually only to calculate reasonable parameters such that its deviation is minimized.
  3. For some problems where there is no such curve to draw, the terminology for such cases tends to use regression rather than curve fitting, such as computing the eigenvalues ​​of a statistical distribution from sampled data.

Many forms of regression models

Regression models can be expressed in many forms, and different forms often have different functions, performance and solution methods.

In general, we can divide regression models into three categories:

  1. analytical model
  2. nonparametric model
  3. Neural Networks

Each of these models can of course be further divided into many subcategories. Below we introduce them one by one.

Analytical models (parametric models)

Obviously, analytical models refer to those models that can be represented by mathematical expressions. This kind of model is the first form that we think of when we talk about "regression". Students are most exposed to it in class, and scholars most want to build it in research. It is the most concise and convincing expression form of scientific laws.

However, there are not so many problems that analytical models can solve in real life. Please imagine how an analytical expression can be used to represent a quilt
, or a face. Perhaps for the sake of simplicity, a quilt will be simplified into a plane, or more complicatedly, a two-dimensional manifold; but it may be difficult for most people to simplify a face into a sphere.

Despite such limitations, it is still common and effective to use analytical models to solve regression problems in the industrial field. This is because the analytical model has three characteristics that are highly valued in the industrial field compared with other regression models. With the blessing of these characteristics, it is very suitable to use the analytical model in relatively simple application scenarios:

  1. Analytical models require less data volume.
  2. The analytical model embodies the prior cognition of the law of the data to a certain extent.
  3. Once the analytical model parameters are determined, using the model to solve for new inputs is very fast.

Analytical models can be roughly divided into two types:

  1. The form of the sum of linear parametric functions
  2. other

The first type of analytical model is composed of the sum of functions of many independent variables, and each item of the sum has only one linear weighting function; other models refer to models that do not have such characteristics, such as exponential regression, and the following For this Lorenz function model, if we know exactly that the relationship between the observed variables conforms to this law, we need to solve the function curve of the two parameters in formula (1) and the Gaussian error based on the sampled data
insert image description here
. The shape of the sampled data is like this under certain parameters.

insert image description here
— The following mathematical formulas do not need to be read carefully, we just want to illustrate the complexity of this work.

Solving the parameters of such a function is not very easy, and it reminds us of terms such as Newton's method, Newton's downhill method or gradient descent method. Here we take the gradient descent method as an example. Gradient descent is also commonly known as steepest descent. To use the gradient descent method to find the local minimum of a function, it is necessary to iteratively search for the specified step distance point in the opposite direction of the gradient (or approximate gradient) corresponding to the current point on the function. In regression problems, the independent variables of this function are the coefficients. Gradient descent method, which iteratively calculates the required coefficients. Its specific steps include:

The difference between the calculated results of the Lorenz function and the measured results is expressed as a loss function in the form of MSE. As shown in Equation 2:
insert image description here
when the value of the undetermined coefficient is correct, the minimum value of the above loss function is 0. At this time, the partial derivative of the loss function with respect to each coefficient is zero. Therefore, it is necessary to calculate the partial derivative of the loss function with respect to each coefficient, so the form of the partial derivative of each parameter needs to be written as follows:
insert image description here
On this basis, first initialize the parameters to be requested, such as assigning all values ​​to 1, and then define the gradient descent calculation step size, such as setting is 0.01. Then use the above formulas 3, 4, and 5 to iteratively calculate the gradient, and adjust the initial value with the specified step size to make the parameters gradually approach the reasonable value. Under the above settings, the 50-step iteration of the three parameter change trajectories looks like the following.
insert image description here
In this example, the parameter is correctly iterated to near the theoretical value (array([2.99990868,2.00008539, 3.9999999 ] ) ), but this method is actually very fragile. First of all, such a composite form is very difficult to define in practical problems, and even with such a form, if an inappropriate initial value and step size, as well as an inappropriate calculation method are used in the solution, it is difficult to obtain a stable of the result. Interested friends can try to change the initial value and step size based on the above formula, and it is easy to see the divergent results.

Therefore, in industry, the first model is more used in applications, because the first model has two important features:

  1. It can fit a large number of (continuous) functions in a certain range of values ​​(or on a certain neighborhood).
  2. This model is easy to solve.

Let me explain the first feature first. Looking back at the Taylor expansion in advanced mathematics, we should be able to understand the properties of such polynomials. This also helps us determine the terms of the model used to fit the data: terms with smaller powers should be preferred, and higher order terms should be adaptively added. Like the Taylor expansion, terms with higher powers are likely to have smaller coefficients. And in the general case if we get the opposite, it's time to see if something went wrong without further prior knowledge.

Then discuss the second feature, for polynomials, in general, the most commonly used solution method is the least squares method. Statistical theory can prove that the least squares method realizes the maximum likelihood estimation in the given sample and mapping form. By formally minimizing the sum of squares of the error, the result is that the data error is minimized and the solution is also very convenient. It was originally used to estimate multivariate linear equations without exact solutions, but in fact, as long as each item in the polynomial has only one multiplication parameter, this form is linear for the parameters to be sought, so the least squares method can also be used Estimate optimal values ​​for these parameters.

It is worth mentioning that when the least squares method is actually used to solve polynomial coefficients, because the calculation of the inversion of a huge matrix with a sample size is too expensive, it is usually not used to solve it in matrix form, but to obtain the solution by means of SVD decomposition of the matrix .

The form of summation of linear parameter functions can solve a large part of regression problems in the industrial field. However, there are often specific requirements in some specific application scenarios, and these requirements may be reflected in various stages of the analytical model from modeling to solution. for example:

  1. Uncertain prior relationships: Data encountered in real industrial scenarios often have many variables, and modeling with these variables produces a considerable number of terms in this model form. In fact, if only the quadratic term is taken, there will be Cn2 + n terms for n variables. For example n = 10, which is not uncommon in an industrial scenario, and if rational quadratic polynomials were chosen to model these variables, there would be 55 quadratic terms. We know that many of these quadratic terms are in fact unnecessary, but often don't know which ones they should be. This general situation makes us expect that, given a general prior form, a relatively specific and relatively accurate form that does not contain redundant items can be obtained through regression modeling.

This expectation is expressed in mathematical form, which may be reflected in the definition of the model loss function (adding a regularization term), or it may only be reflected in the solution process. By adding different requirements and solving them, we can obtain analytical models with very different parameter values. In some of the regularized analytical models, the parameters of a large number of summation terms are set to zero, so that we get a relatively compact expression model. For example, the common lasso regression, elastic network regression and sparse regression are all manifestations of this concept. It should be noted that, on the one hand, different regularization term definitions express certain prior knowledge, and on the other hand, they will also change the method of parameter solution. For example, lasso regression uses the minimum angle algorithm for solution. These solution methods have different advantages and disadvantages, which will affect the final modeling quality.

  1. The target variable is difficult to separate from the independent variable: this is also a common situation in industrial scenarios. For example, the mechanical test of the two-dimensional failure performance of a certain material generally presents an elliptical shape on the image where the two principal stresses are the horizontal and vertical axes. This situation contains two situations in which the target variable and the independent variable are difficult to separate. On the one hand, the relationship between the force variables in these two directions presents an ellipse, which means that the target variable and the independent variable constitute an expression of an implicit function, so it is necessary to consider the regression model in the form of an implicit function; on the other hand, in different stress states, For example, when two directions are under tension or two directions are under pressure at the same time, the relationship between the two variables is not the same, and the same regression model cannot be used, so you will encounter the problem of constructing a regression model with a piecewise function, which is different from that mentioned later. The segmentation of the data by the segmental function model is clear a priori, and the main problem lies in the continuity of the regression model between segments. There is no unified solution strategy for these specific situations that are closely related to the business, and corresponding solutions must be given according to the actual business and the mathematical characteristics of the model.

Statistical based nonparametric models

The analytical model is indeed a powerful regression paradigm, but it also has several shortcomings, such as the following three aspects:

  1. This modeling method is mainly for global data fitting, and it is often difficult to deal with local special data.
  2. When solving an actual regression problem, the given analytical form is equivalent to specifying a definite prior form of the data, which may not match the actual situation. For example, for a model in polynomial form, it implies that the relationship between data is continuous or even differentiable. However, the actual situation may not meet this assumption.
  3. The regression conclusion given by the analytical model is usually certain, and it often lacks the description of the statistical information that may exist in the relationship between variables. Even if a probability density function is used to construct the statistical characteristics of a set of sample data, the degree of agreement between the final probability density function and the actual data distribution is usually unable to express.

For this reason, many non-parametric regression models based on statistics have been proposed to try to solve these problems. Nonparametric regression models are also called nonparametric regression models. This kind of model does not require the user to provide a very specific prior form of the relationship between variables, but builds a model based on general data distribution laws and sample data. Many intelligent regression models can be classified into non-parametric models, such as order-preserving regression, decision tree regression, etc. Two representative model frameworks are Gaussian process regression (GPR) and local polynomial regression (LPR).


Gaussian Process Regression (GPR) is a nonparametric model that uses Gaussian Process ( GP) priors to perform regression analysis on data.

Gaussian Process (Gaussian Process, GP) is a kind of stochastic process in probability theory and mathematical statistics. It is a combination of a series of random variables (random variables) subject to normal distribution in an index set (index set) .

Gaussian process regression was proposed by two scholars in 1996 with a systematic description and corresponding solution method, but its variants, or as a practical technique in specific fields, have existed for more than 50 years. When technicians related to industrial product R&D use optimization design software such as isight to solve the optimal design scheme, they will encounter the problem that simulation calculations are too expensive and slow down the iterative optimization process. To solve this problem, they often use a surrogate model called Kriging to partially replace simulation calculations. This kriging method is an implementation of Gaussian process regression in geostatistics.

Since this article is aimed at industrial intelligent manufacturing practitioners, the mathematical process of Gaussian process regression is not described in detail here, but only the characteristics of this regression algorithm are introduced:

  1. The prior implied by Gaussian process regression is that the function to be regressed is a (multivariate) Gaussian process.
  2. In Gaussian process regression, the correlation that exists between variables is defined by their covariance. Generally, this covariance matrix is ​​represented by a radial basis kernel function, which means that the solution to unknown data outside the sample point is determined based on the distance between the data and each sample point, usually the information provided by the sample point closer to it The greater the weight. So it is an algorithm similar to interpolation of neighboring points.
  3. Based on the second characteristic, it can be deduced that the more accurate model performance comes from the sufficient sampling of the sample data in the range of values.
  4. Based on the second feature, it can be inferred that the model has better interpolation performance, but the extrapolation performance is difficult to guarantee.
  5. In the actual fitting calculation, it is necessary to use the kernel function to calculate the co-correlation matrix between the variable to be solved and the sample variable every time, so the calculation amount is relatively large.

The local polynomial regression method can be known from the name, it uses different polynomials for fitting in different local areas. The algorithm hopes to divide the data appropriately so that the polynomial coverage of each area can generally obtain better fitting accuracy. After the model is built, the calculation complexity of the prediction of each region of this algorithm is obviously less than that of the Gaussian process regression algorithm, and its accuracy is better than that of the global polynomial regression. It seems that local polynomial regression is a relatively good compromise, but the model solution of this method is relatively cumbersome, and its performance is affected by many parameters, so there are certain requirements for the skills of modelers. But this method can be called directly by calling Merrill Data's Tempo AI and other products, using its built-in local polynomial function node.
insert image description here

Neural Network Models for Regression

The development of artificial intelligence has demonstrated the great potential of neural networks. Therefore, in addition to traditional regression modeling methods, it is also necessary to consider using neural network methods to construct regression models. One of the great advantages of neural network models over traditional regression methods is that they have very flexible fitting capabilities and thus have the ability to express complex inter-variable relationships. But on the other hand, there are also some limitations in the construction of neural network models, such as requiring a large number of training sample data, and for example, prior knowledge does not have a better mechanism and data fusion strategy except as an added item of the cost function. , and the computational complexity of the neural network is usually relatively high.

Based on this, the current use of neural networks in the field of regression model building is not widespread. The relatively common neural network used for regression model construction is the rbf neural network model, which uses a fixed three-layer network model
with a radial basis function (generally, a Gaussian function) as the activation function and the number of samples as the output number of neurons. This neural network model is actually quite different from the current common BP neural network, and is closer to the principle of Gaussian process regression. According to the research of J.-P. Costa et al., the performance of rbf neural network is slightly inferior to Gaussian process regression in actual use.

Evaluation of Regression Models: Mathematical and Business Requirements

The evaluation of regression models generally has the following orientations.

  1. Performance of the model on sample data

If the sample data does not contain errors, it is conceivable that this difference must be as small as possible. The general regression model describes this difference using the mean squared error, which we also used in equation (2) above.

Mean-square error (mean-square error, MSE) is a measure that reflects the degree of difference between the estimator and the estimated quantity. Let t be an estimator of the population parameter θ determined according to the sample, and the mathematical expectation of (θ-t)2 is called the mean square error of the estimator t. It is equal to σ2+b2, where σ2 and b are the variance and bias of t, respectively.

In fact, this evaluation criterion is not only used to measure the performance of the model, MSE is usually the direct optimization goal for solving the regression model. Therefore, when the basic form of the model is determined, the solution result is the one that can obtain the minimum MSE among the optional function families determined by the basic form.

If the sample data contains errors, the MSE is usually not zero, but when we choose an analytical model with a high vc dimension, or a neural network with a large number of layers, we may mistake the noise contained in the sample as part of the law Incorporating it into the regression model forms overfitting. In this case, when MSE is zero or extremely small, we are often not sure whether overfitting has occurred or the regularity of the sample data is strong enough. For this, other standards also need to be introduced.

  1. Performance of the model on unknown data

In order to identify whether the regression model is overfitting, we need to introduce training set and test set. These two concepts have become familiar to many people with the development of machine learning. Simply put, the training set and the test set are randomly sampled from the sample data, and have the same statistical distribution characteristics (so the number of samples is required to be large enough), there is no overlap between the two data sets, and their combination is total sample set.

When building the regression model, we only use the data in the training set, and when evaluating the model after building the model, we use the data in the training set and the test set separately, and compare the performance of the two data sets on the same regression model ( such as MSE) difference. If the two data sets have similar performance on the model, we say that the model is not overfitting, conversely if the MSE value of the training set data on the model is low, and the MSE value of the test set data on the model is high If , the model is overfit
. It is important to note that an overfitted model cannot be used at all, it usually has no guiding significance for reality.

Sometimes the sample data we obtain can only be limited to a limited range of values, which is common in industrial production. We hope to infer the situation of the target variable in other value ranges through this limited value range. Therefore, in addition to splitting the sample set to verify that the model has not been over-fitted, it is also necessary to study the model's coverage of the sample data. The difference between performance outside the range of values ​​and performance within the range. That is, the ability of the model to extrapolate in the sample statistical distribution interval. Intuitively, the analytical model will have a mean-square error (mean-square error, MSE) is a measure that reflects the degree of difference between the estimator and the estimated quantity than the non-parametric model. Let t be an estimator of the population parameter θ determined according to the sample, and the mathematical expectation of (θ-t)2 is called the mean square error of the estimator t. It is equal to σ2+b2, where σ2 and b are the variance and bias of t, respectively.

Better extrapolation performance, but this requires the model to express the variable relationship in an analytical form intrinsically. This is not easy to verify. In addition, even if the known data are divided into intervals, we can only know the extrapolation performance of the obtained model on the intentionally covered value range, but not the real extrapolation performance on the unknown value interval. Some situations require data analysts and business experts to conduct specific analysis on specific issues.

  1. Model Computational Complexity

Based on engineering practice, we always hope that the obtained regression model can be calculated quickly and the results are accurate. But reality often does not allow us to have both. This involves the computational complexity of the regression model. In the algorithm course we divide computational complexity into two categories: time complexity and space complexity. But for model building, there are more aspects to consider, including:

  • Computational complexity (time, space) when building the model
  • Computational complexity (time, space) when using the model
  • The model's requirements for the amount of sample data

For different types of models, there are some differences in their performance in the above three aspects. However, since the construction of the model is usually offline and limited in number, the use of the model is often high-frequency and time-limited. So in general we focus on the computational complexity of model usage. However, in the field of intelligent manufacturing, based on the actual situation where data is difficult to obtain, the requirement of the model for the amount of sample data is usually an important consideration.

model selection

Finally, we summarize the above-mentioned form of regression model, difficulty of solving, performance characteristics, etc. and give the following selection recommendations.

insert image description here

Guess you like

Origin blog.csdn.net/qq_42963448/article/details/131520426
Recommended