Python学习 -- Pandas中行列选择的十大技能

Pandas中的关于行列选择的十大技能，这些技能，绝对是你使用Pandas的过程中，需要用到的，因为，你肯定也想像Excel一样，任性地操作Python中的数据框。

先来导入演示数据

# 导入pandas模块
import pandas as pd
# 创建一个关于fictional army的dataframe示例
raw_data = {
    'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons',
    'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
    'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
    'deaths': [523, 52, 25, 616, 43, 234, 523, 62, 62, 73, 37, 35],
    'battles': [5, 42, 2, 2, 4, 7, 8, 3, 4, 7, 8, 9],
    'size': [1045, 957, 1099, 1400, 1592, 1006, 987, 849, 973, 1005, 1099, 1523],
    'veterans': [1, 5, 62, 26, 73, 37, 949, 48, 48, 435, 63, 345],
    'readiness': [1, 2, 3, 3, 2, 1, 2, 3, 2, 1, 2, 3],
    'armored': [1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1],
    'deserters': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
    'origin': ['Arizona', 'California', 'Texas', 'Florida', 'Maine', 'Iowa', 'Alaska', 'Washington', 'Oregon', 'Wyoming', 'Louisana', 'Georgia']
}

df = pd.DataFrame(
    raw_data, 
    columns = ['origin','regiment', 'company', 'deaths', 'battles', 'size', 'veterans', 'readiness', 'armored', 'deserters']
)

df = df.set_index('origin')

df.head()

origin	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
Arizona	Nighthawks	1st	523	5	1045	1	1	1	4
California	Nighthawks	1st	52	42	957	5	2	0	24
Texas	Nighthawks	2nd	25	2	1099	62	3	1	31
Florida	Nighthawks	2nd	616	2	1400	26	3	1	2
Maine	Dragoons	1st	43	4	1592	73	2	0	3
Iowa	Dragoons	1st	234	7	1006	37	1	1	4
Alaska	Dragoons	2nd	523	8	987	949	2	0	24
Washington	Dragoons	2nd	62	3	849	48	3	1	31
Oregon	Scouts	1st	62	4	973	48	2	0	2
Wyoming	Scouts	1st	73	7	1005	435	1	0	3
Louisana	Scouts	2nd	37	8	1099	63	2	1	2
Georgia	Scouts	2nd	35	9	1523	345	3	1	3

1、选择一列

df['size']

输出结果

origin
Arizona       1045
California     957
Texas         1099
Florida       1400
Maine         1592
Iowa          1006
Alaska         987
Washington     849
Oregon         973
Wyoming       1005
Louisana      1099
Georgia       1523
Name: size, dtype: int64

2、选择多列

df[['size', 'veterans']]

输出结果

origin	size	veterans
Arizona	1045	1
California	957	5
Texas	1099	62
Florida	1400	26
Maine	1592	73
Iowa	1006	37
Alaska	987	949
Washington	849	48
Oregon	973	48
Wyoming	1005	435
Louisana	1099	63
Georgia	1523	345

3、根据一个行索引，选择出一行

# 选择索引标签为“Arizona”的所有行
df.loc[:'Arizona']

输出结果

origin	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
Arizona	Nighthawks	1st	523	5	1045	1	1	1	4

4、根据一个行序号，选择出从开始到这个序号的行

# 选择每一列的2行数据
df.iloc[:2]

输出结果

origin	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
Arizona	Nighthawks	1st	523	5	1045	1	1	1	4
California	Nighthawks	1st	52	42	957	5	2	0	24

5、根据两个行序号，选择出从第一个序号到第二个序号的行

df.iloc[1:2]

输出结果

origin	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
California	Nighthawks	1st	52	42	957	5	2	0	24

6、根据一个行序号，选择出从这个行序号开始到结束的行

df.iloc[2:]

输出结果

origin	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
Texas	Nighthawks	2nd	25	2	1099	62	3	1	31
Florida	Nighthawks	2nd	616	2	1400	26	3	1	2
Maine	Dragoons	1st	43	4	1592	73	2	0	3
Iowa	Dragoons	1st	234	7	1006	37	1	1	4
Alaska	Dragoons	2nd	523	8	987	949	2	0	24
Washington	Dragoons	2nd	62	3	849	48	3	1	31
Oregon	Scouts	1st	62	4	973	48	2	0	2
Wyoming	Scouts	1st	73	7	1005	435	1	0	3
Louisana	Scouts	2nd	37	8	1099	63	2	1	2
Georgia	Scouts	2nd	35	9	1523	345	3	1	3

7、根据一个列序号，选择出从开始列到这个序号的所有列

# 选择前三列
df.iloc[:,:3]

输出结果

origin	regiment	company
Arizona	Nighthawks	1st
California	Nighthawks	1st
Texas	Nighthawks	2nd
Florida	Nighthawks	2nd
Maine	Dragoons	1st
Iowa	Dragoons	1st
Alaska	Dragoons	2nd
Washington	Dragoons	2nd
Oregon	Scouts	1st
Wyoming	Scouts	1st
Louisana	Scouts	2nd
Georgia	Scouts	2nd

8、条件过滤

1.选择死亡人数大于50的行

# 选择df.deaths大于50的行
df[df['deaths'] > 50]

输出结果

origin	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
Arizona	Nighthawks	1st	523	5	1045	1	1	1	4
California	Nighthawks	1st	52	42	957	5	2	0	24
Florida	Nighthawks	2nd	616	2	1400	26	3	1	2
Iowa	Dragoons	1st	234	7	1006	37	1	1	4
Alaska	Dragoons	2nd	523	8	987	949	2	0	24
Washington	Dragoons	2nd	62	3	849	48	3	1	31
Oregon	Scouts	1st	62	4	973	48	2	0	2
Wyoming	Scouts	1st	73	7	1005	435	1	0	3

2.选择死亡人数大于500或小于50的行

# 选择df.death大于500或小于50的行
df[(df['deaths'] > 500) | (df['deaths'] < 50)]

输出结果

origin	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
Arizona	Nighthawks	1st	523	5	1045	1	1	1	4
Texas	Nighthawks	2nd	25	2	1099	62	3	1	31
Florida	Nighthawks	2nd	616	2	1400	26	3	1	2
Maine	Dragoons	1st	43	4	1592	73	2	0	3
Alaska	Dragoons	2nd	523	8	987	949	2	0	24
Louisana	Scouts	2nd	37	8	1099	63	2	1	2
Georgia	Scouts	2nd	35	9	1523	345	3	1	3

3.选择所有没有命名为"龙骑兵"的兵团

# 选择所有没有命名为"Dragoons"的兵团
df[~(df['regiment'] == 'Dragoons')]

输出结果

origin	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
Arizona	Nighthawks	1st	523	5	1045	1	1	1	4
California	Nighthawks	1st	52	42	957	5	2	0	24
Texas	Nighthawks	2nd	25	2	1099	62	3	1	31
Florida	Nighthawks	2nd	616	2	1400	26	3	1	2
Oregon	Scouts	1st	62	4	973	48	2	0	2
Wyoming	Scouts	1st	73	7	1005	435	1	0	3
Louisana	Scouts	2nd	37	8	1099	63	2	1	2
Georgia	Scouts	2nd	35	9	1523	345	3	1	3

9、根据行字符串索引，进行行选择

# 选择称为Texas和Arizon的行
df.ix[['Arizona', 'Texas']]

输出结果

origin	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
Arizona	Nighthawks	1st	523	5	1045	1	1	1	4
Texas	Nighthawks	2nd	25	2	1099	62	3	1	31

10、根据行索引/行位置，列名/列位置，进行具体位置的值选择

# 选择行中名为Arizona的第三个单元格
df.ix['Arizona', 'deaths']
523
# 选择行中名为Arizona的第三个单元格
df.ix['Arizona', 2]
523
# 在“deaths”列中选择第三个单元格
df.ix[2, 'deaths']
25

参考文章：文章原文

Python学习 -- Pandas中行列选择的十大技能

1、选择一列

2、选择多列

3、根据一个行索引，选择出一行

4、根据一个行序号，选择出从开始到这个序号的行

5、根据两个行序号，选择出从第一个序号到第二个序号的行

6、根据一个行序号，选择出从这个行序号开始到结束的行

7、根据一个列序号，选择出从开始列到这个序号的所有列

8、条件过滤

9、根据行字符串索引，进行行选择

10、根据行索引/行位置，列名/列位置，进行具体位置的值选择

猜你喜欢