pandas: DataFrame

DataFrame是一个表格型的数据结构，含有一组有序的列

DataFrame可以被看作是有Series组成的字典，并且共用一个索引

创建方式

 1 df = pd.DataFrame({'one':[1,2,3,4], 'two':[5,6,7,8]})
 2 In [3]: df
 3 Out[3]: 
 4    one  two
 5 0    1    5
 6 1    2    6
 7 2    3    7
 8 3    4    8
 9 
10 In [4]: df2 = pd.DataFrame({'one':pd.Series([1,2,3,4],index=['a','b','c','d']),'
11    ...: tow':pd.Series([4,5,6],index=['a','c','e'])})
12 
13 In [5]: df2
14 Out[5]: 
15    one  tow
16 a  1.0  4.0
17 b  2.0  NaN
18 c  3.0  5.0
19 d  4.0  NaN
20 e  NaN  6.0
21 
22 In [6]: df3 = pd.DataFrame({'one':pd.Series([1,2,3,4],index=list('abcd')),'tow':
23    ...: pd.Series([4,5,6],index=list('acd'))})
24 
25 In [7]: df3
26 Out[7]: 
27    one  tow
28 a    1  4.0
29 b    2  NaN
30 c    3  5.0
31 d    4  6.0

csv文件读取与写入

cd 到csv文件目录下
df = pd.read_csv('fiename.csv')  # 读取csv文件方式一
 # 读取csv文件方式er
df = open('filename.csv')
df.read()


#写入文件
df.to_csv("newfilename.csv")

pandas:DataFrame查看数据

查看数据常用属性及方法

index  获取行索引
In [29]: df2
Out[29]: 
   one  tow
a  1.0  4.0
b  2.0  NaN
c  3.0  5.0
d  4.0  NaN
e  NaN  6.0

In [30]: df2.index
Out[30]: Index([u'a', u'b', u'c', u'd', u'e'], dtype='object')

columns  获取列索引
In [32]: df2.columns
Out[32]: Index([u'one', u'tow'], dtype='object')

values  获取值数组
In [33]: df2.values
Out[33]: 
array([[ 1.,  4.],
       [ 2., nan],
       [ 3.,  5.],
       [ 4., nan],
       [nan,  6.]])

T  转置  # 行和列交换
In [35]: df2.T
Out[35]: 
       a    b    c    d    e
one  1.0  2.0  3.0  4.0  NaN
tow  4.0  NaN  5.0  NaN  6.0

describe()  获取快速统计
In [34]: df2.describe()
Out[34]: 
            one  tow
count  4.000000  3.0
mean   2.500000  5.0
std    1.290994  1.0
min    1.000000  4.0
25%    1.750000  4.5
50%    2.500000  5.0
75%    3.250000  5.5
max    4.000000  6.0

DataFrame各列name属性:列名

rename(columns={'旧的列名':'新的列名'})

In [37]: df2
Out[37]: 
   one  tow
a  1.0  4.0
b  2.0  NaN
c  3.0  5.0
d  4.0  NaN
e  NaN  6.0

In [38]: df2.rename(columns={'one':'first'})
Out[38]: 
   first  tow
a    1.0  4.0
b    2.0  NaN
c    3.0  5.0
d    4.0  NaN
e    NaN  6.0

pandas:DataFrame索引和切片

DataFrame有行索引和列索引

通过标签获取

df = pd.read_csv('601318.csv')
df['open']  # 获取指定列
df[['open', 'high']]  # 花式列索引
df['open'][0]  # 获取open列的下标为第0行的数据
df[0:10]  #获取0-10行下标的数据
df[0:10][['date', 'close']]  # 获取下标0-10行并且列为'date', 'close'的数据


df.loc[:,['open','close','low']]  #获取所有行，列为'open','close','low'的数据
df.loc[:,'open':'close']  #获取所有行，列为'open','close'的数据
df.loc[0,'open']  #获取下标为0的行，open列的数据
df.loc[0:10,['open','low']]  # 获取下标为0-10行。列为open,low的数据

通过位置索引(index)

df.iloc[3]  # 获取下标为3的数据
df.iloc[3,3]  #获取第三行的第三列数据
df.iloc[0:3,4:6]  #获取0-3行的4到6列数据
df.iloc[1:5,:]  # 获取1到5行的所有数据
df.iloc[[1,2,4],[0,3,6]]  #获取1，2，4行，0，3，6列数据

通过布尔值过滤

df[df['open']>20]  #获取open列大于20的数据
df[df<50]  # 获取df小于50的数据
df[df['date'].isin(['2007-03-01','2007-03-06'])]  # 获取date,在['2007-03-01','2007-03-06']里的数据

 df[df<50].fillna(0)  # 将df大于50的缺失值改为0，未符合查找条件的值系统显示为缺失值NaN

猜你喜欢