Data pre - processing mode unbalanced samples (SMOTE-- be added)

     A general experience

     1. 1:20 These are the equalization process needs to be done, the general data recall rate is low, then you can do 1:10 equalization  

      2. If it is not generally a serious imbalance, the imbalance is both a normal reaction or business, you do not need to deal with unbalanced sample handling is not required

      3. Multi-sample classification is not balanced, the oversampling process only (usually experimental data or the game will be over-sampling is generally not over-sampling process, because there are a lot of problems.)

    II. Treatment

1. oversampling: increasing the number of minority samples, easy to over-fitting increase the sample with the raw data
2. undersampled: reduce the number of the majority of samples, like most important information is easy to lose, easy underfitting
3. SMOTE algorithm, merge minority over-sampling technique neighbors KNN increase instead of the original sample is not a real sample

Three .SMOTE- only for dichotomous model

SMOTE algorithm steps:
1. a random observation point to find minority class
2. KNN recent observation points calculated sample
3. a randomly selected sample of the observation point from which neighbors
were randomly extracted points 4. calculated difference. So randomly reflected in two ways here, linearly reflected in the Operational above difference.

# pip install imblearn

 

Guess you like

Origin www.cnblogs.com/jing-yan/p/12337912.html