Pandas——练习题一

作业一:(使用jupyter notebook 工具)

Step 1. 导入相应的模块
import pandas as pd
import numpy as np
from pandas import Series,DataFrame
Step 2. 给定的原始数据集
 # Create an example dataframe about a fictional army
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
            'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
            'deaths': [523, 52, 25, 616, 43, 234, 523, 62, 62, 73, 37, 35],
            'battles': [5, 42, 2, 2, 4, 7, 8, 3, 4, 7, 8, 9],
            'size': [1045, 957, 1099, 1400, 1592, 1006, 987, 849, 973, 1005, 1099, 1523],
            'veterans': [1, 5, 62, 26, 73, 37, 949, 48, 48, 435, 63, 345],
            'readiness': [1, 2, 3, 3, 2, 1, 2, 3, 2, 1, 2, 3],
            'armored': [1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1],
            'deserters': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
            'origin': ['Arizona', 'California', 'Texas', 'Florida', 'Maine', 'Iowa', 'Alaska', 'Washington', 'Oregon', 'Wyoming', 'Louisana', 'Georgia']}
Step 3. 根据原始数据集创建一个DataFrame,并赋值给变量army
army = DataFrame(raw_data)
army
      armored battles company deaths deserters origin readiness regiment size veterans
0 1 5 1st 523 4 Arizona 1 Nighthawks 1045 1
1 0 42 1st 52 24 California 2 Nighthawks 957 5
2 1 2 2nd 25 31 Texas 3 Nighthawks 1099 62
3 1 2 2nd 616 2 Florida 3 Nighthawks 1400 26
4 0 4 1st 43 3 Maine 2 Dragoons 1592 73
5 1 7 1st 234 4 Iowa 1 Dragoons 1006 37
6 0 8 2nd 523 24 Alaska 2 Dragoons 987 949
7 1 3 2nd 62 31 Washington 3 Dragoons 849 48
8 0 4 1st 62 2 Oregon 2 Scouts 973 48
9 0 7 1st 73 3 Wyoming 1 Scouts 1005 435
10 1 8 2nd 37 2 Louisana 2 Scouts 1099 63
11 1 9 2nd 35 3 Georgia 3 Scouts 1523 345
Step 4. 设定指定列为索引:设定数据中的origin字段为索引
army1 = army.set_index(["origin"])
army1
        armored battles company deaths deserters readiness regiment size veterans
origin
Arizona 1 5 1st 523 4 1 Nighthawks 1045 1
California 0 42 1st 52 24 2 Nighthawks 957 5
Texas 1 2 2nd 25 31 3 Nighthawks 1099 62
Florida 1 2 2nd 616 2 3 Nighthawks 1400 26
Maine 0 4 1st 43 3 2 Dragoons 1592 73
Iowa 1 7 1st 234 4 1 Dragoons 1006 37
Alaska 0 8 2nd 523 24 2 Dragoons 987 949
Washington 1 3 2nd 62 31 3 Dragoons 849 48
Oregon 0 4 1st 62 2 2 Scouts 973 48
Wyoming 0 7 1st 73 3 1 Scouts 1005 435
Louisana 1 8 2nd 37 2 2 Scouts 1099 63
Georgia 1 9 2nd 35 3 3 Scouts 1523 345
Step 5. 打印列名为veterans的所有值
army1["veterans"]
origin
Arizona         1
California      5
Texas          62
Florida        26
Maine          73
Iowa           37
Alaska        949
Washington     48
Oregon         48
Wyoming       435
Louisana       63
Georgia       345
Name: veterans, dtype: int64
Step 6. 打印列名为 ‘veterans’ 和 ‘deaths’ 的所有数据

  veterans deaths
origin
Arizona 1 523
California 5 52
Texas 62 25
Florida 26 616
Maine 73 43
Iowa 37 234
Alaska 949 523
Washington 48 62
Oregon 48 62
Wyoming 435 73
Louisana 63 37
Georgia 345 35
Step 7. 打印出所有的列索引的值
army1.columns
Index(['armored', 'battles', 'company', 'deaths', 'deserters', 'readiness',
       'regiment', 'size', 'veterans'],
      dtype='object')
Step 8. 筛选出列 regiments 的值不为”Dragoons”的所有数据
army1.loc[army1["regiment"] != "Dragoons"]
  armored battles company deaths deserters readiness regiment size veterans
origin
Arizona 1 5 1st 523 4 1 Nighthawks 1045 1
California 0 42 1st 52 24 2 Nighthawks 957 5
Texas 1 2 2nd 25 31 3 Nighthawks 1099 62
Florida 1 2 2nd 616 2 3 Nighthawks 1400 26
Oregon 0 4 1st 62 2 2 Scouts 973 48
Wyoming 0 7 1st 73 3 1 Scouts 1005 435
Louisana 1 8 2nd 37 2 2 Scouts 1099 63
Georgia 1 9 2nd 35 3 3 Scouts 1523 345

Step 9.筛选出 第 3 到 7 行,第 3 到 6 列的所有数据
army1.iloc[2:6,[2,6]]
  company regiment
origin
Texas 2nd Nighthawks
Florida 2nd Nighthawks
Maine 1st Dragoons
Iowa 1st Dragoons

作业二:

在校生饮酒消费数据分析
Step 1. 导入相关的模块
import pandas as pd
import numpy as np
from pandas import Series,DataFrame

Step 2. 导入数据,并赋值给变量df
df = pd.read_csv("./datasets/Student_Alcohol.csv")
df
  school sex age address famsize Pstatus Medu Fedu Mjob Fjob absences G1 G2 G3
0 GP F 18 U GT3 A 4 4 at_home teacher 6 5 6 6
1 GP F 17 U GT3 T 1 1 at_home other 4 5 5 6
2 GP F 15 U LE3 T 1 1 at_home other 10 7 8 10
3 GP F 15 U GT3 T 4 2 health services 2 15 14 15
4 GP F 16 U GT3 T 3 3 other other 4 6 10 10
5 GP M 16 U LE3 T 4 3 services other 10 15 15 15
391 MS M 17 U LE3 T 3 1 services services 3 14 16 16
392 MS M 21 R GT3 T 1 1 other other 3 10 8 7
393 MS M 18 R LE3 T 3 2 services other 0 11 12 10
394 MS M 19 U LE3 T 1 1 other at_home 5 8 9 9

395 rows × 33 columns


Step 3. 连续切片(获取[school:guardian]两列以及中间的所有数据)
df.iloc[:,0:12]    
  school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason guardian
0 GP F 18 U GT3 A 4 4 at_home teacher course mother
1 GP F 17 U GT3 T 1 1 at_home other course father
2 GP F 15 U LE3 T 1 1 at_home other other mother
3 GP F 15 U GT3 T 4 2 health services home mother
4 GP F 16 U GT3 T 3 3 other other home father
391 MS M 17 U LE3 T 3 1 services services course mother
392 MS M 21 R GT3 T 1 1 other other course other
393 MS M 18 R LE3 T 3 2 services other course mother
394 MS M 19 U LE3 T 1 1 other at_home course father

395 rows × 12 columns


Step 5. 将数据列 Mjob 和 Fjob中所有数据实现首字母大写
data2 = df.iloc[:,[8,9]]                         #获取 "Mjob","Fjob" 两列
data21 = Series(data2["Mjob"])                   #将两列转成Series格式
data22 = Series(data2["Fjob"])
df["Mjob"] =data21.map(lambda x:x.capitalize())  #将"Mjob"列所有值 首字母大写
df["Fjob"] =data22.map(lambda x:x.capitalize())  #将"Fjob"列所有值 首字母大写
df         #查看效果
  school sex age address famsize Pstatus Medu Fedu Mjob Fjob absences G1 G2 G3
0 GP F 18 U GT3 A 4 4 At_home Teacher 6 5 6 6
1 GP F 17 U GT3 T 1 1 At_home Other 4 5 5 6
2 GP F 15 U LE3 T 1 1 At_home Other 10 7 8 10
3 GP F 15 U GT3 T 4 2 Health Services 2 15 14 15
4 GP F 16 U GT3 T 3 3 Other Other 4 6 10 10
5 GP M 16 U LE3 T 4 3 Services Other 10 15 15 15
390 MS M 20 U LE3 A 2 2 Services Services 11 9 9 9
391 MS M 17 U LE3 T 3 1 Services Services 3 14 16 16
392 MS M 21 R GT3 T 1 1 Other Other 3 10 8 7
393 MS M 18 R LE3 T 3 2 Services Other 0 11 12 10
394 MS M 19 U LE3 T 1 1 Other At_home 5 8 9 9

395 rows × 12 columns


Step 6.创建一个名为majority函数,并根据age列数据返回一个布尔值添加到新的数据列,列名为 legal_drinker (根据年龄这一列数据,大于17岁为合法饮酒)
majority = lambda x:["合法" if x>17 else "不合法"] 
df["legal_drinker"] = df["age"].map(majority)
df
  school sex age address famsize Pstatus Medu Fedu Mjob Fjob G1 G2 G3 legal_drinker
0 GP F 18 U GT3 A 4 4 At_home Teacher 5 6 6 [合法]
1 GP F 17 U GT3 T 1 1 At_home Other 5 5 6 [不合法]
2 GP F 15 U LE3 T 1 1 At_home Other 7 8 10 [不合法]
3 GP F 15 U GT3 T 4 2 Health Services 15 14 15 [不合法]
4 GP F 16 U GT3 T 3 3 Other Other 6 10 10 [不合法]
391 MS M 17 U LE3 T 3 1 Services Services 14 16 16 [不合法]
392 MS M 21 R GT3 T 1 1 Other Other 10 8 7 [合法]
393 MS M 18 R LE3 T 3 2 Services Other 11 12 10 [合法]
394 MS M 19 U LE3 T 1 1 Other At_home 8 9 9 [合法]

395 rows × 12 columns

猜你喜欢

转载自blog.csdn.net/wsp_1138886114/article/details/80768986