Machine Learning (6) - Random Forest

1. What is random sampling?

  There is indeed no connection between the weak learners of bagging like boosting. It is characterized by "random sampling".

  Random sampling (bootsrap) is to collect a fixed number of samples from our training set, but after each sample is collected, the samples are put back. In other words, it is possible that the samples collected before will continue to be collected after being put back. For our bagging algorithm, we generally randomly collect the same number of samples as the number of training set samples m. The number of samples obtained in this way is the same as that of the training set, but the content of the samples is different. If we randomly sample the training set with m samples T times, then due to randomness, the T sample sets are different.

  Note that this is not the same as subsampling for GBDT. The subsampling of GBDT is sampling without replacement, while the subsampling of Bagging is sampling with replacement.

2. What is Out Of Bag (OOB)?

  For a sample, in a random sampling of a training set containing m samples, the probability of being collected each time is 1/ m. The probability of not being collected is 1 1/m . If m samples are not collected, the probability is ( 1 1/ m )^m . ( 1 1/ m )^ m 1/ e ≃0.368 as m .

  That is to say, in each round of random sampling of bagging, about 36.8% of the data in the training set is not collected in the sampling set, and about 64% of the data is finally used.

  For this part of about 36.8% of the data that has not been sampled, we often call it Out Of Bag (OOB). These data are not involved in the fitting of the training set model, so they can be used to test the generalization ability of the model .

3. The generation method of random forest:

  1) For t=1,2...,T:

      a) The t-th random sampling is performed on the training set, and m times are collected in total to obtain a sampling set D t containing m samples;

      b) Use the sampling set D t to train the t-th decision tree model G t ( x ) , when training the nodes of the decision tree model , select a part of the sample features from all the sample features on the node, and select some sample features in these randomly selected part of the samples Select an optimal feature from the features to divide the left and right subtrees of the decision tree;

    2) If it is predicted by the classification algorithm, the category or one of the categories with the most votes cast by the T weak learners is the final category. If it is a regression algorithm, the regression results obtained by the T weak learners are arithmetically averaged to obtain the final model output.

  Notice:

    (1) RF uses the CART decision tree as a weak learner;

    (2) RF randomly selects a part of the sample features on the node, the number is less than n, and selects an optimal feature to divide the left and right subtrees of the decision tree, which further enhances the generalization ability of the model.

4. The advantages and disadvantages of RF:

  The main advantages of RF are:

    1) The training can be highly parallelized, which is advantageous for the training speed of large samples in the era of big data. Personally I think this is the main advantage.

    2) Since the decision tree nodes can be randomly selected to divide the features, the model can still be trained efficiently when the sample feature dimension is very high.

    3) After training, the importance of each feature to the output and the choice of feature importance can be given .

    4) Due to the use of random sampling, the variance of the trained model is small and the generalization ability is strong.

    5) Compared with Adaboost and GBDT of Boosting series, RF implementation is relatively simple.

    6) Insensitive to missing part of features.

  The main disadvantages of RF are:

    1) On some noisy sample sets, the RF model is prone to overfitting.

    2) Features with more value divisions are likely to have a greater impact on RF decision-making, thereby affecting the effect of the fitted model.

 

Reprinted from the blog: http://www.cnblogs.com/pinard/p/6156009.html

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325330195&siteId=291194637