5.访问数据

DataFrame的数据结构分为行和列，一行和一列的交叉位置是一个cell，该cell的位置是由行索引和列索引共同确定的。可以通过at/iat，或loc/iloc属性来访问数据框的元素，该属性后跟一个中括号：[row,col]，中括号内 row表示行索引或行标签，col表示列索引或列标签。如果省略row，那么row维度使用“:”代替，格式是 [ :, col] ，表示访问所有行的特定列；如果省略col ，格式是[row]，表示访问特定行的所有列。

In [39]:

import pandas as pd
data = {'user':['小王','小李','小明'],'shcool':['清华','北大','科大'],'class':['数学','历史','计算机']}
df = pd.DataFrame(data,index=['a','b','c'])
df

Out[39]:

	user	shcool	class
a	小王	清华	数学
b	小李	北大	历史
c	小明	科大	计算机

5.1.访问单个元素

通过元素的行和列索引对来访问单个cell，at 和 iat 属性只能访问当个cell，区别是at可以使用字符串和整数，而iat只能使用整数。

at 和 iat 的格式是：[row, column]，第一个维度是行索引，第二个维度是列索引。

In [43]:

df.iat[1,1]

Out[43]:

'北大'

In [44]:

df.at['b','shcool']

Out[44]:

'北大'

5.2.访问多个元素

loc 和 iloc 属性可以访问多个cell，区别是loc可以使用标签和布尔（掩码）数组，不能使用整数位置（整数代表元素的位置），而iloc只能使用整数位置。

loc 和 iloc的格式是：

# 访问单行或多行，包括所有列
[row] 
# 访问单行或多行，但column 确定的列元素
[row, column]

5.2.1.iloc

这里的访问的实际是隐式索引值。

In [46]:

df.iloc[[0,1]]

Out[46]:

	user	shcool	class
a	小王	清华	数学
b	小李	北大	历史

In [45]:

df.iloc[[0,1],[1,2]]

Out[45]:

	shcool	class
a	清华	数学
b	北大	历史

5.2.2.loc

主要基于标签，但也可以与布尔数组一起使用。

基于标签label或label组的访问

In [47]:

df.loc[['a','b']]

Out[47]:

	user	shcool	class
a	小王	清华	数学
b	小李	北大	历史

In [48]:

df.loc[['a','b'],['shcool','class']]

Out[48]:

	shcool	class
a	清华	数学
b	北大	历史

索引切片方式
注意这里的切片，a:b a和b是都包含在内的，为左闭右闭。row和column都一样。

In [53]:

df.loc['a':'b','user':'class']

Out[53]:

	user	shcool	class
a	小王	清华	数学
b	小李	北大	历史

这里也可以通过iloc使用隐式索引值切片
这里通过索引值访问时 1 就没有包含在内，为左闭右开

In [60]:

df.iloc[0:1,0:1]

Out[60]:

	user
a	小王

axis等长的布尔数组

In [61]:

df.loc[[True, False, True],[True, False, True]]

Out[61]:

	user	class
a	小王	数学
c	小明	计算机

有些方式可以混合使用

In [84]:

df.loc['a':'b',[True, False, True]]

Out[84]:

	user	class
a	小王	数学
b	小李	历史

5.3.访问整列数据

通过列名来访问DataFrame的数据，选择特定列的所有数据行。有几种种格式是：

5.3.1.单列索引方式

In [62]:

df['user']

Out[62]:

a    小王
b    小李
c    小明
Name: user, dtype: object

5.3.2.属性方式

In [63]:

df.user

Out[63]:

a    小王
b    小李
c    小明
Name: user, dtype: object

5.3.3.列索引数组方式

In [66]:

df[['user','class']]

Out[66]:

	user	class
a	小王	数学
b	小李	历史
c	小明	计算机

5.4.访问整行数据

5.4.1.ix

在pandas的1.0.0版本开始，移除了Series.ix and DataFrame.ix 方法。故此种方法不能使用了。

5.4.2.索引切片方式

In [69]:

df[0:1]

Out[69]:

	user	shcool	class
a	小王	清华	数学

In [70]:

df['a':'b']

Out[70]:

	user	shcool	class
a	小王	清华	数学
b	小李	北大	历史

通过下列方式也可以将访问单个元素或多个元素

In [75]:

df[0:1]['user']

Out[75]:

a    小王
Name: user, dtype: object

In [80]:

df[0:1][['user','class']]

Out[80]:

	user	class
a	小王	数学

6.遍历DataFrame

6.1.iterrows()

iterrows()返回值为元组(index,row) ，每次迭代返回一行数据

In [87]:

for index,row in df.iterrows():
    print(index,'----',row[0],row[1],row[2])

a ---- 小王 清华 数学
b ---- 小李 北大 历史
c ---- 小明 科大 计算机

6.2.itertuples()

遍历数据框，返回的是命名元组

In [96]:

for row in df.itertuples():
    print(row,'--',row[0],'-',row.user)

Pandas(Index='a', user='小王', shcool='清华', _3='数学') -- a - 小王
Pandas(Index='b', user='小李', shcool='北大', _3='历史') -- b - 小李
Pandas(Index='c', user='小明', shcool='科大', _3='计算机') -- c - 小明

6.3.items()或 iteritems()

使用items()或 iteritems()遍历数据框，返回值为元组(column,Series)，每次迭代返回一列数据

In [100]:

for col_name,col in df.items():
    print(col_name,col[0],col[1],col[2])

user 小王 小李 小明
shcool 清华 北大 科大
class 数学 历史 计算机

Python数据分析-第6章DataFrame(下)