Random Forest (Random Forest, RF)

Adhering to bagging;

Multiple satellites configured independently CART decision tree, forming a forest, a common decision outputs;

Two random:

1) random input data: back into the selected portion of data from the entire data;

2) Each particle is characterized by a decision tree constructed from all feature randomly; (M feature selected from the m, and then select the best features from the m-th node as)

advantage:

1) easy to overfitting, strong noise resistance;

2) a highly parallel, fast operation;

3) unbiased estimate;

4) deletion insensitive to partial feature;

Random Forests parameter adjustment

1, algorithm type: ID3, C4.5, CART

2, number (n_estimator) tree

  (0,100]

  More sub-tree model to improve performance, reduce speed;

3, a random number of attributes (max_features)

  logN, N / 3, sqrt (N), N

  Random number of attributes increases, improving model performance, reduce the diversity of a single tree, reduce the speed;

4, the maximum depth of the tree

  $[1,\infty )$

  -1 represents the fully grown tree;

5, the leaf node minimum number of records (min_sample_leaf):

  Minimum number of leaf nodes in the data, a minimum of two, usually about 50

  Smaller leaves are easier to catch noise model training data, the training data better, more complex models;

6, recording a minimum percentage of leaf nodes

  Data representing the number of leaf nodes in the minimum proportion of the parent node;

Guess you like

Origin www.cnblogs.com/danniX/p/10719752.html