Sklearn RandomForest algorithm (supervised learning), the input data may select the best combination of features to reduce the redundancy feature;
Principle: Since the random decision tree generation process Boostrap employed, so in a tree generation process does not use all of the samples, the samples called outer bag unused sample (Out_of_bag), through the outer bag samples, the tree can be evaluated accuracy, according to this principle leaves the other sub-evaluation, finally be averaged, i.e., the performance of the random forest algorithm;
Feature selection principle: a sample due to the presence of the outer bag, so no cross test (to save time), by successively imparting a random number for each feature, to observe changes in performance of the algorithm, if a large change, it indicates that the important feature, each feature in sklearn be given a score, the greater the score, the more important features, and therefore, can be characterized in accordance with the order of importance, and then select the best combination of features;
RandomForestClassifier(n_estimators=200,oob_score=True)
oob_score : bool (default=False) Whether to use out-of-bag samples to estimate the generalization accuracy.
oob_score: bool (Default = False) if the sample is estimated using the outer bag generalization accuracy.