1 data binning
Definition data binning technique Pandas given official: Bin values into discrete intervals, it refers to a discrete value divided intervals. Like apples of different sizes grouped into several boxes arranged in advance; people of different ages divided into several age groups.
This technique is useful when the data processing.
2 Examples
Let's look at an example
import numpy as np
import pandas as pd
ages = np.array([5,10,36,12,77,89,100,30,1]) #年龄数据
Now the data is divided into three sections, and marked with the old, young label. Pandas provides easy to use API, it is easy to implement.
pd.cut(ages, 3, labels=['青','中','老'])
The results are as follows, line of code will be realized.
[青, 青, 中, 青, 老, 老, 老, 青, 青]
In operation cut, the statistical minimum one-dimensional array, a maximum value, to obtain a length interval, as three sections need to be divided, it will give a uniform three sections, as follows.
pd.cut(ages, 3 )
>>>区间如下:
Categories (3, interval[float64]):
[(0.901, 34.0] < (34.0, 67.0] < (67.0, 100.0]]
A minimum value for a given data, right-left opening section is closed by default, so in order to include 1, the need to extend the leftmost section left 0.1% (total interval length), default precision three decimal places.
3 function prototype
After an initial understanding cut through the example above, then cut prototype analysis easier.
Parameters are as follows:
X : array-sliced data, attention must be one-dimensional;
bins : binning rule is simple to understand, that the tub. Support int scalar sequence;
right : Indicates whether the range contains the right border, included by default;
Labels : bins divided tagging;
retbins : indicate whether the bins after the split will return to the default does not return. As is True, then:
-----------------------------------------------------
注:我这有个学习基地,里面有很多学习资料,感兴趣的+Q群:895817687
-----------------------------------------------------
array([ 0.901, 34. , 67. , 100. ]))
include_lowest : left section is open or closed, is on by default;
Duplicates ; whether to allow repeat interval. raise: do not allow, drop: allowed.