Article directory
-
- 1. Read this dataset
- 2. View the first 5 rows of the data
- 3. Convert the data in the salary column to the average of the maximum and minimum values
- 4. Group the data according to the education background to calculate the average value
- 5. Convert the createTime column to month day
- 6. View the index, data type and memory information
- 7. View summary statistics for numeric columns
- 8. Add a new column to divide the data into three groups according to salary
- 9. Sort the data in descending order according to the salary queue
- 10. Take out the data in line 33
1. Read this dataset
# 读取本地的数据集
# 数据集可以私信我我发给你们,同样也可以
df = pd.read_excel('data1.xlsx')
df
2. View the first 5 rows of the data
Use the head() function to view the first few lines of the data, you can pass in a specific number, the default is 5
# 查看数据的前5行
df.head()
3. Convert the data in the salary column to the average of the maximum and minimum values
The parameters accepted by the map and apply functions are the number of rows, and neither will directly change the original data, but return a new DataFrame object
# 将salary列数据转换为最大值和最小值的平均值
# 方式一 使用map函数
def fun(x):
a,b = x.split('-')
a = int(a.strip('k'))*1000
b = int(b.strip('k'))*1000
return int((a+b)/2)
df['salary'].map(fun)
# 方式二使用apply函数
df['salary'] = df['salary'].apply(fun)
df
4. Group the data according to the education background to calculate the average value
Grouping using the groupby() function
# 将数据根据学历进行分组并计算平均值
df.groupby('education').mean()
5. Convert the createTime column to month day
# 将create Time列转换为月日
for i in range(len(df)):
df.iloc[i,0] = df.iloc[i,0].to_pydatetime().strftime('%m-%d')
df.head()
6. View the index, data type and memory information
info() function
# 查看索引,数据类型,和内存信息
df.info()
7. View summary statistics for numeric columns
The data returned by describe() includes, quantity, mean value, standard deviation, minimum value, maximum value, quantiles of 25%, 50% and 75% of the data
# 查看数值型列的汇总统计
df.describe()
8. Add a new column to divide the data into three groups according to salary
# 新增一列根据salary将数据分为三组,并且设置等级
bins = [0,5000,20000,50000]
group_names = ['底','中','高']
df['categories'] = pd.cut(df['salary'],bins,labels=group_names)
df
9. Sort the data in descending order according to the salary queue
sort_values is ascending by default
# 按照salary列对数据降序排列
# ascending=False降序
# ascending=True升序
df.sort_values('salary',ascending=False)
10. Take out the data in line 33
Select the 33rd row according to the index, the index starts from 0
# 取出第33行的数据
df.loc[32]
Today's 10 questions involve a lot more things,
groupby
,describe
,cut
,sort_values
,info
etc. If you want to understand them all, these 10 questions alone are far from enough. I hope you can find some extra test questions to practice, or you can use the Follow the blogger's article to write questions ✨✨✨
It is recommended to use Niuke.com to practice directly to Niuke, one step faster
welcome your attention