Data binning technology in Python

1 data binning

Definition data binning technique Pandas given official: Bin values ​​into discrete intervals, it refers to a discrete value divided intervals. Like apples of different sizes grouped into several boxes arranged in advance; people of different ages divided into several age groups.

This technique is useful when the data processing.

2 Examples

Let's look at an example

import numpy as np
import pandas as pd
ages = np.array([5,10,36,12,77,89,100,30,1]) #年龄数据

Now the data is divided into three sections, and marked with the old, young label. Pandas provides easy to use API, it is easy to implement.

pd.cut(ages, 3, labels=['青','中','老'])

The results are as follows, line of code will be realized.

[青, 青, 中, 青, 老, 老, 老, 青, 青]

In operation cut, the statistical minimum one-dimensional array, a maximum value, to obtain a length interval, as three sections need to be divided, it will give a uniform three sections, as follows.

pd.cut(ages, 3 )
>>>区间如下:
Categories (3, interval[float64]): 
[(0.901, 34.0] < (34.0, 67.0] < (67.0, 100.0]]

A minimum value for a given data, right-left opening section is closed by default, so in order to include 1, the need to extend the leftmost section left 0.1% (total interval length), default precision three decimal places.

3 function prototype

After an initial understanding cut through the example above, then cut prototype analysis easier.

Here Insert Picture Description
Parameters are as follows:
X : array-sliced data, attention must be one-dimensional;
bins : binning rule is simple to understand, that the tub. Support int scalar sequence;
right : Indicates whether the range contains the right border, included by default;
Labels : bins divided tagging;
retbins : indicate whether the bins after the split will return to the default does not return. As is True, then:

-----------------------------------------------------
注:我这有个学习基地,里面有很多学习资料,感兴趣的+Q群:895817687
-----------------------------------------------------

    array([   0.901,   34.   ,   67.   ,  100.   ]))

include_lowest : left section is open or closed, is on by default;
Duplicates ; whether to allow repeat interval. raise: do not allow, drop: allowed.

Guess you like

Origin blog.csdn.net/weixin_42625143/article/details/95063137