Training set, proof of concept and test sets of principles and division

Deep learning, often the available data set into a training set (training set), validation set (development set / validation set) and test set (test set) mainly to answer the following questions below: First, why should data set into three sets as described above, the difference between what a set of three; two principle what our division Yes.

1. The concept of the training set, validation set and test set

  • Training Set: As the name suggests refers to a collection of training samples are used, the main parameters used to train the neural network.

  • Validation set: understood from the literal meaning is the set of samples used to verify the performance of different models in the neural network after training the training set, is determined by comparing the performance of the validation set of different models each model here mainly refers corresponding to different super. neural network parameters, the neural network can also refer to completely different structures.

  • Test set: For the neural network training is completed, the test set for an objective assessment of the performance of neural networks.

So, between the training set, validation and test sets have what difference does? Generally speaking, easier to distinguish between training set and after proof of concept and test sets between easily confused individuals from approach to understanding the following:

  • In the case of the neural network to determine the network configuration, there are two models ultimately affect performance parts, one of ordinary parameters (such as bias weights w and B), the other super parameters (e.g. learning rate, network layers). Common Parameters we trained on the training set, we generally hyper-parameters manually specify the (ultra-performance model to compare different parameters in the validation set). Why we are not the same as an ordinary argument over training parameters on the training set it? (flower book to the answer) First, the hyper-parameters are generally difficult to optimize (not the same as an ordinary argument by way of gradient descent optimization). Second, hyper-parameters often unsuitable for training in the training set, for example, if the training in the training set can super capacity control model parameters, these parameters will always be over trained model parameters such that the maximum capacity (since the larger capacity models, training error is smaller), so the hyper-parameters of the training set of training the model is the result of an absolute overfitting.

  • Because of hyper-parameters can not be trained on the training set, we set up a separate validation set is used to select (manual training) the best super parameters because the validation set is used to select the hyper-parameters, so the training and validation set set is independent non-overlapping.

  • Performance data on the test set is used to train the neural network after the completion of the process, in order to evaluate the model which are not seen (never affect normal and ultra parameter selection parameters), and thus between the test and training set and validation set are independently do not overlap, and the test set can not propose amendments to the parameter or parameters of super, only as an indicator of network performance evaluation.

At this point, we can complete the neural network training process boils down to about two steps:

  1. Training in parameter utilized in the training set (parameter given super) learning algorithm, training general parameters, such that the error model in the training set is reduced to an acceptable level (level generally near human).

  2. 'Training' super parameters. Verify network in the authentication set generalization error (generalization), and adjusting the model parameters over performance.

Repeat steps 1 and 2 are two, made on the network until the validation set lower generalization error. In this case the end of the training process is complete. After completion of the training parameters and super parameters, test network performance on the test set.

2. The training set, validation and test sets the principle of division of

This section summarizes the main principles of self Andrew Ng curriculum, courses are given:

  • For small sample set, with the proportion of non-common is trianing set / dev set / test set 6: 2: 2, for example, a total of 10,000 samples, the training set is divided into 6000 samples, 2000 samples for the validation set, the test set is. 2000 samples.

  • For large sample set, the dev / test set ratio will be reduced a lot, because the validation (comparison) model performance and performance test model a certain sample size is sufficient. For example a total of 1 million samples, the training set is divided into 9.98 million samples, 10,000 samples are divided into validation set, the test set is divided into 10,000 samples.

When we can not get enough training samples of interest, utilizing a number of other similar data to train the network, how to divide training, dev and test set?
For example, we are doing a cat recognition program, our goal is to identify the user to take pictures Upload a picture of a cat, but APP uploaded we can get the limited data (such as 10000), it is ready to download by using the web crawler cat pictures (200,000) to help train the network. and because the network crawl pictures is quite different from the user to upload pictures, how this time should be divided into training / dev / test set?

  • One option is a mixed picture with the web app, and then divided according to the principle of division of big data, that is, 205,000 training set, 2500 Zhang dev set, 2500 Zhang test set.
  • Another solution is the app pictures in 2500 points to dyev set, 2500 Zhang points to the test set, 5000 Zhang app photographs and 200,000 web picture mixed as the training set.

Andrew Ng pointed out that the second option is better because the second option dev set all the data from the app, with the data we really care about the same distribution. The first option, dev set in only about 120 images from app while most of the rest from the web, model evaluation criteria will inevitably lead to offset the bull's-eye.

Of course, the second option would lead to problems with different distributions of dev / test set and the training set, error analysis which will bring solutions to problems .Andrew Ng given is divided as part of train-dev set in the training set, the this portion is not used for training, as the evaluation model generalization error, and the error between the train-dev set dev set as the data mismatch error, represents the error due to the difference data distribution.

Guess you like

Origin www.cnblogs.com/hello-ai/p/11099824.html