Machine Learning (1) - Model Evaluation and Selection

     I started studying machine learning about a year ago. I have been learning deep learning recently and found that many things are similar. In particular, some optimization methods, model evaluation and selection in machine learning will be introduced today.

   1 Introduction

     We all know that when we humans identify something, we have a lot of experience. Experience is the process of our constant familiarity with things, which allows us to grasp the differences between things, allows us to see a thing, and we can Based on the experience you have accumulated, you can quickly judge your correct prediction results. Machine learning is also this discipline, which is dedicated to improving its own system through empirical learning. The experience is the data, that is, the algorithm that generates the model from the data, that is, the learning algorithm.

   2. Model evaluation and overfitting

        2.1 Empirical error and overfitting

      First, we will introduce a few terms:

      Precision: Precision = 1-error rate. The error rate is the proportion of samples that are incorrectly classified when classifying samples.

      Error: The difference between the actual output and the true output during the learning period. We call the error on the training set training error and experience error, and call it generalization error on new samples.

      Generally, we hope to choose a small generalization error, but we don't know in advance what our new samples will look like. We think it will be better to choose a small empirical error, or to classify the training samples 100% correctly. This is This will lead to overfitting of the learner, and in most cases the test performance will not be very good. Because the learner is too powerful, some unique features of the sample itself are regarded as general features of the potential samples. This will lead to a decrease in generalization ability and overfitting. The mathematical manifestation of overfitting is an increase in variance. Since the learner learns non-global features, subtle changes in the training data lead to large changes in the learner. At the same time, we say that the following is underfitting, that is, the learner we get is not powerful enough. When we are in the early stage of training, there is a big difference between the classification we learned and the real classification, that is, the deviation is very large. As we follow The purpose of deepening training is to reduce the size of this deviation. Often bias and variance are in conflict. We all hope that the smaller the bias and variance, the better. However, as the bias shrinks, we will continue to fit, leading to overfitting.

      2.2 Assessment methods

    Usually we use experimental testing to evaluate the generalization error of the learner and then make a choice. For this reason, a test set needs to be used to test the learner's ability to discriminate new samples, and the test error of the test set is regarded as the generalization error. Generally, the test set is obtained by sampling from independent and identical distributions from the true distribution of samples. Here are some methods:

    1 Set aside method

      The disadvantage of randomly drawing two mutually exclusive sets according to proportion is that the selection distribution is uncertain, so it is necessary to repeat the selection of the average multiple times. In addition, some trainings have requirements for training samples, and a part of the test is separated, and the training samples are different from the requirements. Generally, approximately 2/3 and 4/5 are taken. The remaining samples are used for testing.

   2.
      The principle of cross-validation method (K-fold cross-validation) is to first divide the data set into K mutually exclusive subsets of similar size, and try to maintain the consistency of distribution in each subset. Then use the union of k-1 subsets as the training set, and the remaining subset as the test set. The selection of k is usually 5, 10, 20, etc.

   3. Leave 1 method

    The leave-one-out method is a method of cross-validation, where the training set is at least 1 sample larger than the initial data set. Advantages: It is possible to evaluate all data with expectations to train the model. Therefore, the evaluation results of the leave-one-out method are often considered more accurate. Disadvantages: When the data is relatively large, the model needs to be trained with the number of sample data. If you consider parameter adjustment, it will be a huge project.

   4. Self-help method (bootstraping)

   Each time a sample is selected from the sample sample data, and then the data sample is put back, and then another sample is selected. In the end, we select the same number of samples as the sample data. We will find that some samples may be drawn multiple times, and some samples will never be drawn. Let’s calculate the probability of not being drawn m times as (1-1/m)^m. After taking the limit, the value is 1/e=0.368; we found that 1/3 of the university’s data sampled through the bootstrap method was Participate in training. Advantages: It is very practical when the data set is small and it is difficult to divide the test set. In addition, the bootstrap method can obtain multiple different training sets from the data set, which is of great benefit to methods such as integrated learning. However, the data set generated by the bootstrapping method changes the distribution of the initial data set, which will introduce estimation errors.

Summary: When the data set is sufficient, the hold-out method and the k-fold cross-validation method are the most commonly used.

2.3 Parameter adjustment and final model

      Most learning algorithms require some parameter settings. With different parameter configurations, the learned modules are often very different. Therefore, when evaluating and selecting models, we need to set learning parameters, which is commonly referred to as parameter adjustment. Parameters include super parameters and model parameters. Model parameters (deep learning) automatically generate multiple candidate parameters through learning. Super parameters need to be set manually.

   Given a data set D, during the process of model estimation and selection, we need to set aside a part as a validation set. The validation set is used for parameter adjustment. Unlike the test set, the test set is used to evaluate the actual generalization ability of the model. After we complete the model selection and parameter adjustment, it means that the model has been determined. We need to retrain the training set and verification set as training samples, and then obtain the final trained model.

 

   

Guess you like

Origin blog.csdn.net/qq_37100442/article/details/81868185