pandas cut() function

Foreword:

In pandas, the `cut()` function is used to divide continuous numerical data into discrete intervals according to specified intervals. It can divide a set of values ​​into different intervals and assign a corresponding interval label to each value. The usage of `cut()` function is as follows:

pandas.cut(x, bins, labels=None, right=True, include_lowest=False, precision=3)

Parameter description:
- `x`: the numerical data to be divided, which can be a DataFrame column, Series or array.
- `bins`: Specify the boundaries of the divided intervals. It can be an integer, representing the number of divided intervals; it can also be an array, representing a custom interval boundary.
- `labels`: optional parameter, used to specify the labels of each interval.
- `right`: optional parameter, specifies whether the interval boundary is included on the right side. The default is True, indicating a right closed interval.
- `include_lowest`: optional parameter, specifies whether the lowest boundary is included in the interval. The default is False, which means the lowest boundary is not included.
- `precision`: Optional parameter, specifying the number of decimal places for the label.

Example:

import pandas as pd

# 创建一个示例DataFrame
data = {'Age': [25, 30, 35, 40, 45]}
df = pd.DataFrame(data)

# 使用cut()函数划分年龄区间,并指定标签
df['AgeGroup'] = pd.cut(df['Age'], bins=[0, 30, 40, 50], labels=['<30', '30-40', '40+'])

print(df)
```

输出结果:
```
   Age AgeGroup
0   25      <30
1   30    30-40
2   35    30-40
3   40      40+
4   45      40+

In the above example, we use the `cut()` function to divide the age data into three intervals: "<30", "30-40", "40+". divided

Guess you like

Origin blog.csdn.net/m0_69097184/article/details/131905363
cut