One hundred face machine learning notes -5

Model Assessment

  Super tuning parameters

  question: Which method of tuning parameters over there?

  answer: General trellis search, random search, Bayesian optimization.

■ grid search
  grid search is probably the simplest, most widely used super-parametric search algorithm that determines the optimal value by finding all points within the search range. If a larger search range and smaller steps, grid search has a great probability of finding the global optimum. However, this search scheme is computationally expensive and time, in particular, more needs tuning parameters over time. Thus, in practice, the grid search method will normally use a wide search range and a larger step size, to find the global optimum position possible; then gradually narrow the search range and step size, to find more the exact optimal value. This operation scheme may reduce the time and amount of calculation required, but in general non-convex objective function, it is likely to miss the global optimum.
■ random search
  thought random search grid search comparison with similar, but not all test values between the upper and lower bounds, but rather randomly selected sample point in the search range. It's the theory that, if the sample is large enough set of points, then by random sampling can be a large probability to find the global optimum, or an approximation thereof. Random searches will generally be faster than the grid search, but the search grid and rapid version of the same, its result is not guaranteed.

■ Bayesian optimization algorithm
  Bayesian optimization algorithm in finding the most optimal parameter values, using a grid search, random search completely different approach. Grid search and random search when testing a new point of information before ignores a point; Bayesian optimization algorithm takes full advantage of the information before. Bayesian optimization algorithm by learning objective function shape, find the optimal value of the objective function to enhance the global parameters. Specifically, it is a method of learning the shape of the target function is, according to the first prior distribution is assumed that a collection function; and, each time using a new test samples to the objective function, use this information to update the prior distribution of the objective function ; Finally, global test algorithm given by the value of the most posterior distribution point most likely position. For Bayesian optimization algorithm, there is a need to pay attention, once a local optimum value is found, it will continue sampling in the region, so it is easy to fall into local optimum. To compensate for this, Bayesian optimization algorithm to find a balance between exploration and exploitation, "exploration" is to get the sampling point in the region have not yet sampled; and "use" is based on the posterior distribution after the most likely most area of the global value sampled.

  Over-fitting and underfitting

  question: In the model evaluation process, the over-fitting and underfitting specifically refers to what phenomenon?

  answer: over-fitting means fitting the model to the training data when the case was over, reflected in the evaluation index, is a good model performance on the training set, but poor performance on the test set and the new data. Underfitting refers to a model in training and forecasting performance is not a good situation. Figure 2.5 vividly describes the difference between over-fitting and less fit.

It can be seen in FIG. 2.5 (a) is a case where poor fitting, the yellow line is not well fitted to the capture characteristics of the data, it does not fit the data well. FIG. 2.5 (c) is the case of over-fitting, the model is too complex, the characteristics of the noise data to learn the model, the model resulting in a reduced ability generalization, late application process is easy to output the prediction result of the error.

  question: Can speak several methods to reduce over-fitting and underfitting risk?

  answer:

■ reduce "over-fitting" risk-based approach

(1) Data from the start, to get more training data. Use more training data is the most effective means to solve the problem of over-fitting, because the more samples that allows the model to learn more and more efficient features to reduce the influence of noise. Of course, a direct increase in the experimental data is generally very difficult, but possible to expand the training data by certain rules. For example, on the issue of image classification, data can be expanded by panning the image, rotate, zoom, etc; Further, the formula can be used to synthesize a large number of network against new training data.
(2) reduce the complexity of the model. When less data, the model is too complex to fit produced a major factor, appropriate to reduce the complexity of the model to avoid excessive sampling noise model fitting. For example, reducing the number of layers of the network, and other neural element number of neural network model; reduction in the depth of the tree of the model tree, pruning the like.
(3) regularization method. To the parameters of the model with a certain regularity constraints, such as the size of the added weight to the loss of function. To L2 regularization as an example:

In this way, while optimizing the original objective function C0, but also to avoid the risk of over-fitting weights too large to bring. (4) an integrated approach to learning. Integrated learning is to integrate multiple models to reduce the risk of over-fitting a single model, such as Bagging method.

■ reduce the "less fit" risk-based approach

(1) add new features. When lack of existing features or characteristics associated with a sample of the label is not strong, the model underfitting prone. By digging "feature context" new features "ID-based features," "combination of features", they are often able to achieve better results. In depth study trend, there are many models that can help complete the project features, such as factoring machine, gradient enhance decision tree, Deep-crossing and so can become rich features of the method.
(2) increase the complexity of the model. Poor ability to learn simple model, the model may have a stronger ability to fit the model by increasing complexity. For example, the linear model adding higher order terms, increasing the number of neural network layers, or the like membered neural network model.

(3) reducing the regularization coefficient. Regularization is used to prevent over-fitting, under-fitting but the phenomenon occurs when the model is required to reduce the targeted regularization coefficient.

 

Guess you like

Origin www.cnblogs.com/tsy-0209/p/12635210.html