Random Forests and the difference adaboost

1. Random Forests:

   In machine learning, a random forest classifier comprising a plurality of decision trees, and the output categories are categories of the mode output from the individual tree may be. An algorithm performed on the basis of the revised strategy on Bagging.

2. The characteristics of random forests

   From the selected sample set of n samples with Bootstrap samples;

   Randomly select from all the properties of K properties, selected as the best properties split node tree created;

    Repeat the above two steps m times, i.e. m build decision trees;

    The m random forest tree formed by the voting result of the decision that the data belongs to the class

3. The advantages and disadvantages of random forests

  advantage:

   1. The training can be parallelized for large-scale training sample has the advantage of speed;

   2. As a result of the decision tree is divided randomly selected feature list, so that when the sample dimensions relatively high, still has a relatively high performance training;

   3. give importance given list of various features;

   4. Because of random sampling, small trained variance model, strong generalization ability;

    5. RF simple;

    6. insensitive to the missing portion of the feature.

  Disadvantages:

    1. In some relatively large noise characteristic, RF vulnerable to over-fitting the model;

    2. The more characteristic value division will have a greater impact on the RF decisions, which may affect the results of the model

4.Adaboost algorithm

    Adaptive Boosting is an iterative algorithm. Each iteration will produce a new study in the training set, and then use that learning is forecast for all samples to assess the importance of each sample (Informative). In other words is concerned, the algorithm will assign a weight to each sample, each with

  Trained learner mark / prediction of individual samples, if a sample point is the more correct predictions, it will reduce its weight; otherwise the right to increase the weight of the sample. The higher the weight of the sample weight training next iteration in the greater proportion, that is more difficult to distinguish between samples in the training process becomes, the more important;
     the entire iterative process until the error rate is small enough or reach a certain number of iterations until

5.Adaboost algorithm advantages and disadvantages:

  advantage:

   Value can handle continuous and discrete values; relatively strong robustness of the model; strong explanation, a simple structure.

  Disadvantages:

    Abnormal samples sensitive to abnormal samples may get a higher weight value in the iterative process, the final impact of model results

6. The difference between the two

   adaboost:

     Improve those rights were misclassified samples of the previous round of weak classifiers value, while reducing those rights were correctly classified sample values.

     Weighted majority voting process, increase the error rate classification weak classifiers small weights, so that it significantly affects the vote, to reduce the weight of a large classification weak classifier error rate value, since it in the vote minor role.

   Random Forests:

     Training random sample selection. Although the total number of samples is the number of training samples of every tree in N, but each sample is randomly selected with replacement of all selected. In this way, the training samples per tree is almost not the same.

     Random Feature Selection. Suppose there are M wherein the training data, random forest tree selects only every m (m <M) feature used to construct the decision tree. Each tree selected feature may not be identical.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  

Guess you like

Origin www.cnblogs.com/lvhongwi/p/12499634.html