pd.cut
pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False)
- x : The input array to bin, must be one-dimensional.
- bin : int or sequence of scalars
- If bins is an int, it defines the number of equal-width bins in the range of x . In this case, however, the range of x extends 0.1% on each side to include the minimum or maximum value of x
- If bin is a sequence, it defines the bin edges that allow non-uniform bin widths. In this case no expansion of the range of x is performed
- right : bool, optional: determines the opening and closing of the interval, if right == True (default), the interval [1,2,3,4] indicates (1,2], (2,3], (3,4 ]
- labels : array or boolean, default None: labels to use as generated intervals. Must be the same length as the resulting interval. If False, only the integer indicator of the bin is returned
- retbins : bool, optional: whether to return bins. may be useful if bin is given as a scalar
- precision : int: The precision for storing and displaying container labels, three decimal places are reserved by default
- include_lowest : bool: whether the first interval should include the left
1 import numpy as np 2 import pandas as pd 3 #Use pandas' cut function to divide age groups 4 ages = [20,22,25,27,21,23,37,31,61,45,32 ] 5 bins = [ 18,25,35,60,100 ] 6 cats = pd.cut(ages,bins) 7 print (cats) #When classifying, when the data is not in the interval, it will become nan 8 #Count the number of values that fall in each interval 9 print ( pd.value_counts(cats)) 10 #Use codes to label the age data 11 print (cats.codes) 12 #Set the face element name you want 13 group_names = [ 'Youth ' , ' YoungAdult ' , ' MiddleAged ' , ' Senior ' ] 14 print (pd.cut(ages, bins, labels= group_names)) 15 #Set the interval mathematical symbol to left closed and right open 16 print (pd.cut(ages , bins, right= False)) 17 #Pass the number of bins to cut, and the equal-length bins will be calculated according to the minimum and maximum values of the data 18 print (pd.cut(ages, 4, precision=2)) # precision=2 indicates the set precision
pd.qcut
Similar to cut, it can bin the data according to the sample quantile
pandas.qcut(x, q, labels=None, retbins=False, precision=3)
- x:ndarray或Series
- q : integer or quantile array quantiles. 10 for decile, 4 for quartile or, array of quantiles, such as [0, .25, .5, .75, 1.] quartiles
- labels : array or boolean, default None: labels to use as generated intervals. Must be the same length as the resulting interval. If False, only the integer indicator of the bin is returned.
- retbins : bool, optional: whether to return bins. May be useful if bin is given as a scalar.
- precision : int: precision for storing and displaying container labels
1 import numpy as np 2 import pandas as pd 3 4 # qcut can divide the data into bins according to the sample quantiles 5 # data = np.random.randn(20) # normal distribution 6 data = [1,2, 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 ] 7 cats = pd.qcut(data, 4) #Press four Quantiles are cut 8 print (cats) 9 print (pd.value_counts(cats)) 10 print ( " ------------------------- ----------------------- " ) 11 #By specifying quantiles (values between 0 and 1, including endpoints), surfel division 12 cats_2 = pd.qcut(data, [0, 0.5, 0.8, 0.9, 1]) 13 print(cats_2) 14 print(pd.value_counts(cats_2))