Depth hands-on learning _ learning model selection, fitting and less over-fitting --2020.2.26

First, the training error and generalization error

      For a model, not accuracy in this training set as high as possible, probably because of over-fitting, resulting in good training set of the model test results, but poor results in other data sets. Thus, there are two error concepts:

  • Training error:
         it refers to the error of the model's performance on the training data set.
  • Generalization error:
         refers to any error performance of the model into the next data set, approximated error. Generalization error is more informative.

Second, the model selection

     To effectively model, we typically choose the number of hidden layers, the number of cells and aspects of each hidden layer activation function or the like hidden considered. In the following, commonly used model selection validation data set.

(A) a validation data set

     Part of the data set is verified in addition to the training and test data sets reserved outside. Model can be used for evaluation, selection.
     In the actual process, because the data is difficult to obtain, will also test set used for the validation set.

(B) K-fold cross-validation

     Since the model validation data set does not participate in training, when training is not enough data Using the reserved amount zoomed validation data unaffordable. ⼀ species is improved Remedies \ (K \) fold cross-validation ( \ (K \) -fold Cross-Validation). In \ (K \) fold cross validation, we divided the original training data set to \ (K \) th data set does not overlap Submenu, then we do \ (K \) times model training and validation. Each ⼀ times, we have two submenus that uses one set of data to validate the model, and use other \ (K-1 \) th submenus data sets to train the model. In this \ (K \) time training and validation, each time Use submenus to verify the model data sets are different. Finally, we have this \ (K \) times the training error and validation error are averaged.

(C) under-fitting and over-fitting

     There are two classic model training issues:

  • Model ⽆ method to get lower training error, we
    these ⼀ phenomenon known as ⽋ fit (underfitting);
  • Training error is far smaller than the model it error on the test data set, we call this phenomenon as over-fitting (overfitting).

     In practice, at the same time we have to deal with ⽋ fitting and over-fitting as possible. There are many factors that could cause both fitting problem will be discussed in the following two factors: the complexity of the model and the training dataset zoomed ⼩.

1. model complexity

     Let's take the polynomial function fitting. Given ⼀ data scalar feature \ (X \) and a corresponding scalar label \ (Y \) training data set consisting of a polynomial function fitting destination time scale is to find ⼀ a \ (K \) order polynomial function approximated \ (Y \) , as follows:
\ [\ Hat Y = B + \ sum_ {K =. 1} ^ {K} X ^ kw_k \]
     in the above formula, \ (W_k \) is the weighting parameter model, \ ( B \) is the deviation parameter. The same as the linear regression, polynomial fitting also can use the level ⽅ loss function. In particular, ⼀ order polynomial function fitting a separate warranty called a linear function fitting.
     Because more ADVANCED order polynomial function model parameters, the choice of the model function even larger, so ADVANCED order polynomial function ⽐ complexity of the low-order polynomial function uses a higher. Accordingly, High-order polynomial function is a low order polynomial function ⽐ easier to get lower training error on the same training data set.
     And the error between model complexity of the relationships shown in Figure 3.4 is generally as follows:


2. The training data set is too small

     The impact ⽋ fitting and for another time ᯿ factor in over-fitting the training data set is zoomed ⼩. Shoots as usual, if the number of samples in the training data set is too small, especially when the number of model parameters ⽐ (calculated as element) less, more prone to overfitting ⽣. Moreover, not with the number of generalization error ⾥ training sample dataset by increasing ⽽ zoomed. Therefore, within the scope of computing resources permit, we usually expect the training data set zoomed ⼀ more, especially when compared with the complexity of the model ADVANCED, such as many layers deep learning model.



Guess you like

Origin www.cnblogs.com/somedayLi/p/12369644.html