Random Forest algorithm OOB_SCORE best feature selection

Sklearn RandomForest algorithm (supervised learning), the input data may select the best combination of features to reduce the redundancy feature;

Principle: Since the random decision tree generation process Boostrap employed, so in a tree generation process does not use all of the samples, the samples called outer bag unused sample (Out_of_bag), through the outer bag samples, the tree can be evaluated accuracy, according to this principle leaves the other sub-evaluation, finally be averaged, i.e., the performance of the random forest algorithm;

Feature selection principle: a sample due to the presence of the outer bag, so no cross test (to save time), by successively imparting a random number for each feature, to observe changes in performance of the algorithm, if a large change, it indicates that the important feature, each feature in sklearn be given a score, the greater the score, the more important features, and therefore, can be characterized in accordance with the order of importance, and then select the best combination of features;

RandomForestClassifier(n_estimators=200,oob_score=True)

oob_score : bool (default=False) Whether to use out-of-bag samples to estimate the generalization accuracy.

oob_score: bool (Default = False)  if the sample is estimated using the outer bag  generalization accuracy.

Guess you like

Origin www.cnblogs.com/dinol/p/11614352.html