Python学习 -- Pandas中行列选择的十大技能

       Pandas中的关于行列选择的十大技能,这些技能,绝对是你使用Pandas的过程中,需要用到的,因为,你肯定也想像Excel一样,任性地操作Python中的数据框。

                                               

       先来导入演示数据

# 导入pandas模块
import pandas as pd
# 创建一个关于fictional army的dataframe示例
raw_data = {
    'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons',
    'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
    'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
    'deaths': [523, 52, 25, 616, 43, 234, 523, 62, 62, 73, 37, 35],
    'battles': [5, 42, 2, 2, 4, 7, 8, 3, 4, 7, 8, 9],
    'size': [1045, 957, 1099, 1400, 1592, 1006, 987, 849, 973, 1005, 1099, 1523],
    'veterans': [1, 5, 62, 26, 73, 37, 949, 48, 48, 435, 63, 345],
    'readiness': [1, 2, 3, 3, 2, 1, 2, 3, 2, 1, 2, 3],
    'armored': [1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1],
    'deserters': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
    'origin': ['Arizona', 'California', 'Texas', 'Florida', 'Maine', 'Iowa', 'Alaska', 'Washington', 'Oregon', 'Wyoming', 'Louisana', 'Georgia']
}

df = pd.DataFrame(
    raw_data, 
    columns = ['origin','regiment', 'company', 'deaths', 'battles', 'size', 'veterans', 'readiness', 'armored', 'deserters']
)

df = df.set_index('origin')

df.head()
origin regiment company deaths battles size veterans readiness armored deserters
Arizona Nighthawks 1st 523 5 1045 1 1 1 4
California Nighthawks 1st 52 42 957 5 2 0 24
Texas Nighthawks 2nd 25 2 1099 62 3 1 31
Florida Nighthawks 2nd 616 2 1400 26 3 1 2
Maine Dragoons 1st 43 4 1592 73 2 0 3
Iowa Dragoons 1st 234 7 1006 37 1 1 4
Alaska Dragoons 2nd 523 8 987 949 2 0 24
Washington Dragoons 2nd 62 3 849 48 3 1 31
Oregon Scouts 1st 62 4 973 48 2 0 2
Wyoming Scouts 1st 73 7 1005 435 1 0 3
Louisana Scouts 2nd 37 8 1099 63 2 1 2
Georgia Scouts 2nd 35 9 1523 345 3 1 3

1、选择一列

df['size']

    输出结果

origin
Arizona       1045
California     957
Texas         1099
Florida       1400
Maine         1592
Iowa          1006
Alaska         987
Washington     849
Oregon         973
Wyoming       1005
Louisana      1099
Georgia       1523
Name: size, dtype: int64

2、选择多列

df[['size', 'veterans']]

    输出结果

origin size veterans
Arizona 1045 1
California 957 5
Texas 1099 62
Florida 1400 26
Maine 1592 73
Iowa 1006 37
Alaska 987 949
Washington 849 48
Oregon 973 48
Wyoming 1005 435
Louisana 1099 63
Georgia 1523 345

3、根据一个行索引,选择出一行

# 选择索引标签为“Arizona”的所有行
df.loc[:'Arizona']

   输出结果

origin regiment company deaths battles size veterans readiness armored deserters
Arizona Nighthawks 1st 523 5 1045 1 1 1 4

4、根据一个行序号,选择出从开始到这个序号的行

# 选择每一列的2行数据
df.iloc[:2]

   输出结果

origin regiment company deaths battles size veterans readiness armored deserters
Arizona Nighthawks 1st 523 5 1045 1 1 1 4
California Nighthawks 1st 52 42 957 5 2 0 24

5、根据两个行序号,选择出从第一个序号到第二个序号的行

df.iloc[1:2]

   输出结果

origin regiment company deaths battles size veterans readiness armored deserters
California Nighthawks 1st 52 42 957 5 2 0 24

6、根据一个行序号,选择出从这个行序号开始到结束的行

df.iloc[2:]

    输出结果

origin regiment company deaths battles size veterans readiness armored deserters
Texas Nighthawks 2nd 25 2 1099 62 3 1 31
Florida Nighthawks 2nd 616 2 1400 26 3 1 2
Maine Dragoons 1st 43 4 1592 73 2 0 3
Iowa Dragoons 1st 234 7 1006 37 1 1 4
Alaska Dragoons 2nd 523 8 987 949 2 0 24
Washington Dragoons 2nd 62 3 849 48 3 1 31
Oregon Scouts 1st 62 4 973 48 2 0 2
Wyoming Scouts 1st 73 7 1005 435 1 0 3
Louisana Scouts 2nd 37 8 1099 63 2 1 2
Georgia Scouts 2nd 35 9 1523 345 3 1 3

7、根据一个列序号,选择出从开始列到这个序号的所有列

# 选择前三列
df.iloc[:,:3]

   输出结果

origin regiment company
Arizona Nighthawks 1st
California Nighthawks 1st
Texas Nighthawks 2nd
Florida Nighthawks 2nd
Maine Dragoons 1st
Iowa Dragoons 1st
Alaska Dragoons 2nd
Washington Dragoons 2nd
Oregon Scouts 1st
Wyoming Scouts 1st
Louisana Scouts 2nd
Georgia Scouts 2nd

8、条件过滤

   1.选择死亡人数大于50的行

# 选择df.deaths大于50的行
df[df['deaths'] > 50]

   输出结果

origin regiment company deaths battles size veterans readiness armored deserters
Arizona Nighthawks 1st 523 5 1045 1 1 1 4
California Nighthawks 1st 52 42 957 5 2 0 24
Florida Nighthawks 2nd 616 2 1400 26 3 1 2
Iowa Dragoons 1st 234 7 1006 37 1 1 4
Alaska Dragoons 2nd 523 8 987 949 2 0 24
Washington Dragoons 2nd 62 3 849 48 3 1 31
Oregon Scouts 1st 62 4 973 48 2 0 2
Wyoming Scouts 1st 73 7 1005 435 1 0 3

   2.选择死亡人数大于500或小于50的行

# 选择df.death大于500或小于50的行
df[(df['deaths'] > 500) | (df['deaths'] < 50)]

   输出结果

origin regiment company deaths battles size veterans readiness armored deserters
Arizona Nighthawks 1st 523 5 1045 1 1 1 4
Texas Nighthawks 2nd 25 2 1099 62 3 1 31
Florida Nighthawks 2nd 616 2 1400 26 3 1 2
Maine Dragoons 1st 43 4 1592 73 2 0 3
Alaska Dragoons 2nd 523 8 987 949 2 0 24
Louisana Scouts 2nd 37 8 1099 63 2 1 2
Georgia Scouts 2nd 35 9 1523 345 3 1 3

   3.选择所有没有命名为"龙骑兵"的兵团

# 选择所有没有命名为"Dragoons"的兵团
df[~(df['regiment'] == 'Dragoons')]

   输出结果

origin regiment company deaths battles size veterans readiness armored deserters
Arizona Nighthawks 1st 523 5 1045 1 1 1 4
California Nighthawks 1st 52 42 957 5 2 0 24
Texas Nighthawks 2nd 25 2 1099 62 3 1 31
Florida Nighthawks 2nd 616 2 1400 26 3 1 2
Oregon Scouts 1st 62 4 973 48 2 0 2
Wyoming Scouts 1st 73 7 1005 435 1 0 3
Louisana Scouts 2nd 37 8 1099 63 2 1 2
Georgia Scouts 2nd 35 9 1523 345 3 1 3

9、根据行字符串索引,进行行选择

# 选择称为Texas和Arizon的行
df.ix[['Arizona', 'Texas']]

    输出结果

origin regiment company deaths battles size veterans readiness armored deserters
Arizona Nighthawks 1st 523 5 1045 1 1 1 4
Texas Nighthawks 2nd 25 2 1099 62 3 1 31

10、根据行索引/行位置,列名/列位置,进行具体位置的值选择

# 选择行中名为Arizona的第三个单元格
df.ix['Arizona', 'deaths']
523
# 选择行中名为Arizona的第三个单元格
df.ix['Arizona', 2]
523
# 在“deaths”列中选择第三个单元格
df.ix[2, 'deaths']
25

参考文章:文章原文

猜你喜欢

转载自blog.csdn.net/qq_38328378/article/details/81166518
今日推荐