pandas-data discretization

Insert picture description here

#!/usr/bin/env python
# coding: utf-8
# #    第三课 Pandas类别型数据分析
# ## 第一节 数据的离散化及分箱操作

# In[1]:
import pandas as pd
# * pandas.cut()
# In[2]:
# 创建数据
df = pd.DataFrame({
    
    'Name':['George','Andrea','micheal','maggie','Ravi','Xien','Jalpa','Tyieren'],
                    'Score':[63,48,56,75,32,77,85,22]})
# In[3]:
# 对得分进行分箱操作, bins为整数
pd.cut(df['Score'], bins=3)
# In[4]:
# 对得分进行分箱操作,bins为边界列表
pd.cut(df['Score'], bins=[0, 25, 50, 75, 100])
# In[5]:
# 对得分进行分箱操作,指定labels
pd.cut(df['Score'], bins=3, labels=['C', 'B', 'A'])
# In[6]:
# 合并结果
df['Level'] = pd.cut(df['Score'], bins=3, labels=['C', 'B', 'A'])
df
# In[7]:
df.groupby('Level').mean()
# In[ ]:



The result of title
bins=[1, 2, 3, 4, 5], pd.cut([0, 1, 1.5, 2.5, 3.5], bins) is:

[NaN, NaN, (1, 2], (2, 3], (3, 4]]

Guess you like

Origin blog.csdn.net/lildn/article/details/115015214