The method of separating the data set

(. 1) K-fold cross validation (KFold): Select the K value is typically 3,5,10

   When the authentication method can not be determined which may be employed K off validation;

   I do not know when the determined value of K, the optimal situation is selected 10.

(2) separating the training data set data set for evaluation (train_test_split)

      The efficiency is very high, can effectively solve the problem of slow implementation of certain algorithms, you can solve the problem of large volumes of data .

      While separate data size specified can be specified (SEED) random its size, can ensure that each execution of the same result can be obtained, it can be used to compare the results of different algorithms to generate the model .

(3) a fold cross-validation separation (LeaveOneOut)

    If there are N samples, then there will be N models, so the evaluation of results obtained very reliable, but the cost is very high.

    Commonly used in the balance evaluation algorithm, velocity model training and the amount of data size.

(4) Evaluation repeated randomly isolated training data set and the data set (ShuffleSplit)

    The separation process is similar to a multiple cross validation;

              Commonly used in the balance evaluation algorithm, velocity model training and the amount of data size.

Guess you like

Origin www.cnblogs.com/Cheryol/p/11485451.html