Discretization of pandas, surfel division

pd.cut

pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False)

 

 

  • x : The input array to bin, must be one-dimensional.
  • bin : int or sequence of scalars
  1. If bins is an int, it defines the number of equal-width bins in the range of x . In this case, however, the range of x extends 0.1% on each side to include the minimum or maximum value of x
  2. If bin is a sequence, it defines the bin edges that allow non-uniform bin widths. In this case no expansion of the range of x is performed
  • right : bool, optional: determines the opening and closing of the interval, if right == True (default), the interval [1,2,3,4] indicates (1,2], (2,3], (3,4 ]
  • labels : array or boolean, default None: labels to use as generated intervals. Must be the same length as the resulting interval. If False, only the integer indicator of the bin is returned
  • retbins : bool, optional: whether to return bins. may be useful if bin is given as a scalar
  • precision : int: The precision for storing and displaying container labels, three decimal places are reserved by default
  • include_lowest : bool: whether the first interval should include the left
1  import numpy as np
 2  import pandas as pd
 3  #Use pandas' cut function to divide age groups 
4 ages = [20,22,25,27,21,23,37,31,61,45,32 ]
 5 bins = [ 18,25,35,60,100 ]
 6 cats = pd.cut(ages,bins)
 7  print (cats)   #When classifying, when the data is not in the interval, it will become nan 
8  #Count the number of values ​​that fall in each interval 
9  print ( pd.value_counts(cats))
 10  #Use codes to label the age data 
11  print (cats.codes)
 12  #Set the face element name you want 
13 group_names = [ 'Youth ' , ' YoungAdult ' , ' MiddleAged ' , ' Senior ' ]
 14  print (pd.cut(ages, bins, labels= group_names))
 15  #Set the interval mathematical symbol to left closed and right open 
16  print (pd.cut(ages , bins, right= False))
 17  #Pass the number of bins to cut, and the equal-length bins will be calculated according to the minimum and maximum values ​​of the data 
18  print (pd.cut(ages, 4, precision=2))   # precision=2 indicates the set precision

pd.qcut

Similar to cut, it can bin the data according to the sample quantile

 

pandas.qcut(x, q, labels=None, retbins=False, precision=3)  

 

  • x:ndarray或Series
  • q : integer or quantile array quantiles. 10 for decile, 4 for quartile or, array of quantiles, such as [0, .25, .5, .75, 1.] quartiles
  • labels : array or boolean, default None: labels to use as generated intervals. Must be the same length as the resulting interval. If False, only the integer indicator of the bin is returned.
  • retbins : bool, optional: whether to return bins. May be useful if bin is given as a scalar.
  • precision : int: precision for storing and displaying container labels

 

 

1  import numpy as np
 2  import pandas as pd
 3  
4  # qcut can divide the data into bins according to the sample quantiles 
5  # data = np.random.randn(20) # normal distribution 
6 data = [1,2, 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 ]
 7 cats = pd.qcut(data, 4)   #Press four Quantiles are cut 
8  print (cats)
 9  print (pd.value_counts(cats))
 10  print ( " ------------------------- ----------------------- " )
 11  #By specifying quantiles (values ​​between 0 and 1, including endpoints), surfel division 
12 cats_2 = pd.qcut(data, [0, 0.5, 0.8, 0.9, 1])
13 print(cats_2)
14 print(pd.value_counts(cats_2))

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325879377&siteId=291194637