Data analysis | Pandas 200 practice questions, 10 questions per day, after learning, you will become a master (3)

1. Read this dataset

# 读取本地的数据集
# 数据集可以私信我我发给你们,同样也可以
df = pd.read_excel('data1.xlsx')  
df

insert image description here

2. View the first 5 rows of the data

Use the head() function to view the first few lines of the data, you can pass in a specific number, the default is 5

# 查看数据的前5行
df.head()

insert image description here

3. Convert the data in the salary column to the average of the maximum and minimum values

The parameters accepted by the map and apply functions are the number of rows, and neither will directly change the original data, but return a new DataFrame object

# 将salary列数据转换为最大值和最小值的平均值
# 方式一 使用map函数
def fun(x):
    a,b = x.split('-')
    a = int(a.strip('k'))*1000
    b = int(b.strip('k'))*1000
    return int((a+b)/2)
df['salary'].map(fun)

# 方式二使用apply函数
df['salary'] = df['salary'].apply(fun)
df

4. Group the data according to the education background to calculate the average value

Grouping using the groupby() function

# 将数据根据学历进行分组并计算平均值
df.groupby('education').mean()

insert image description here

5. Convert the createTime column to month day

# 将create Time列转换为月日

for i in range(len(df)):
    df.iloc[i,0] = df.iloc[i,0].to_pydatetime().strftime('%m-%d')

df.head()

insert image description here

6. View the index, data type and memory information

info() function

# 查看索引,数据类型,和内存信息
df.info()

insert image description here

7. View summary statistics for numeric columns

The data returned by describe() includes, quantity, mean value, standard deviation, minimum value, maximum value, quantiles of 25%, 50% and 75% of the data

# 查看数值型列的汇总统计
df.describe()

insert image description here

8. Add a new column to divide the data into three groups according to salary

# 新增一列根据salary将数据分为三组,并且设置等级
bins = [0,5000,20000,50000]
group_names = ['底','中','高']

df['categories'] = pd.cut(df['salary'],bins,labels=group_names)
df

insert image description here

9. Sort the data in descending order according to the salary queue

sort_values ​​is ascending by default

# 按照salary列对数据降序排列
# ascending=False降序
# ascending=True升序

df.sort_values('salary',ascending=False)

insert image description here

10. Take out the data in line 33

Select the 33rd row according to the index, the index starts from 0

# 取出第33行的数据
df.loc[32]

insert image description here

Today's 10 questions involve a lot more things, groupby, describe, cut, sort_values, infoetc. If you want to understand them all, these 10 questions alone are far from enough. I hope you can find some extra test questions to practice, or you can use the Follow the blogger's article to write questions ✨✨✨

It is recommended to use Niuke.com to practice directly to Niuke, one step faster

insert image description here

welcome your attention
insert image description here

Guess you like

Origin blog.csdn.net/qq_52007481/article/details/127559191