What is not balanced data
In the form of uneven data is very simple. There are apples and pears, when you find yourself in the hands of data to say to you, almost the whole world only pear,
If you grab a casual passerby, let you guess he ate an apple or pear, pear guess normal people.
Uneven forecast data is simple. Never have guessed that more than one hand to be sure, especially red and more accounted for 90% of that party. Only once every
Guess predict when they are red, the prediction accuracy rate has already reached a very high 90%. Yes, the machine also know this little trick, so finally learned the machine,
Go astray, every prediction majority. There are several solutions, we talk about.
Get more data
First of all, we have to think about it, I still can not get more data. Sometimes just because most of the data presented in the preceding period of a trend,
Trend until the second half of the period is a different matter. If the data is not acquired during the second half, the overall prediction may not so accurate.
Way to replace judge
Usually, we will use the accuracy of accuracy, cost or error to judge the results of machine learning. However, these methods are not balanced judgment in the face of data,
High accuracy and low error becomes less important. So we have to put it another way judge. To calculate the precision and recall by confusion matrix,
Then recalculated by precision and recall f1 score. Uneven data points in this way can the success of the region, give better judgment scores.
Recombinant Data
The third method is the most simple and crude method. Recombined imbalance data, so that equilibrium.
One way: copying samples minority portion or synthetic, like most similar number portion.
Second way: cut some of the most part, the number was much the same.
Other machine learning methods
If the use of machine learning methods like neural networks, etc., in the face of unbalanced data, usually helpless.
But some machine learning methods, such as decision trees, decision trees will not be affected very unevenly data.
Modified algorithm
The last method is to allow yourself to become creative, try to modify the algorithm. If you are using a Sigmoid activation function,
activation function, he predicted there will be a threshold, usually falls on this period if the output threshold, predictions for the pears, if falls on this period,
Predictions for Apple, but because pears are now the majority, we have to adjust the position of the threshold so that the threshold of Apple bias here, since only very
When the letter, the model predicted that Apple will let the machine learning, learning to better results.