Foreword:
In pandas, the `cut()` function is used to divide continuous numerical data into discrete intervals according to specified intervals. It can divide a set of values into different intervals and assign a corresponding interval label to each value. The usage of `cut()` function is as follows:
pandas.cut(x, bins, labels=None, right=True, include_lowest=False, precision=3)
Parameter description:
- `x`: the numerical data to be divided, which can be a DataFrame column, Series or array.
- `bins`: Specify the boundaries of the divided intervals. It can be an integer, representing the number of divided intervals; it can also be an array, representing a custom interval boundary.
- `labels`: optional parameter, used to specify the labels of each interval.
- `right`: optional parameter, specifies whether the interval boundary is included on the right side. The default is True, indicating a right closed interval.
- `include_lowest`: optional parameter, specifies whether the lowest boundary is included in the interval. The default is False, which means the lowest boundary is not included.
- `precision`: Optional parameter, specifying the number of decimal places for the label.
Example:
import pandas as pd
# 创建一个示例DataFrame
data = {'Age': [25, 30, 35, 40, 45]}
df = pd.DataFrame(data)
# 使用cut()函数划分年龄区间,并指定标签
df['AgeGroup'] = pd.cut(df['Age'], bins=[0, 30, 40, 50], labels=['<30', '30-40', '40+'])
print(df)
```
输出结果:
```
Age AgeGroup
0 25 <30
1 30 30-40
2 35 30-40
3 40 40+
4 45 40+
In the above example, we use the `cut()` function to divide the age data into three intervals: "<30", "30-40", "40+". divided