Python学习(6):pandas

一.Series创建
1.直接创建

s=pd.Series([1,3,6,np.nan,44,1])

2.可以在创建Series时添加index，并可使用Series.index查看具体的index。需要注意的一点是，当从数组创建Series时，若指定index，那么index长度要和data一致。

k=pd.Series(np.random.randn(5),index=['a','b','c','d','e'])
print (k.index)

3.Series还可以从字典(dict)创建

d ={'a': 0.,'b': 1,'c':2}
s=Series(d)

创建时指定index

Series(d.,index =['b','c','d','a'] )

使用字典创建Series时指定index，数据将按index的顺序重新排列，且index长度可以和字典长度不一致，多了的话pandas自动为多余的index分配NaN（not a number），当然index少的话就截取部分的字典内容

d ={'a': 0.,'b': 1,'c':2}
m=pd.Series(d)
pd.Series(m,inde=['b','c','d','a'])

如果数据就是一个单一的变量，那么Series将重复这个变量

Series(4,index=['a','b','c','d','e'])

二.Series数据的访问
1.Series可以使用下标，也可以像字典一样使用索引，还可以使用条件过滤

s=pd.Series(np.random.randn(10),index=['a','b','c','d','e','f','g','h','i','j'])
print (s[0])
#从0开始输出两个元素
s[:2]
#按第3，1，5的顺序输出
print (s[[2,0,4]])
#按索引的顺序输出
s[['e','i']]
#输出大于0.5的数据
s[s > 0.5]

三.创建DataFrame
DataFrame是一个二维的数据结构，是多个Series的几何体。我们先创建一个值是Series的字典，并转换维DataFrame:
利用Series创建:

d = {'one':pd.Series([1.,2.,3.],index=['a','b','c']),'two':pd.Series([1.,2.,3.,4.],index=['a','b','c','d'])}
df2 = pd.DataFrame(d)
print (df2)

可以指定所需的行和列，若字典中不含有对应的元素，则置为NaN。

d = {'one':pd.Series([1.,2.,3.],index=['a','b','c']),'two':pd.Series([1.,2.,3.,4.],index=['a','b','c','d'])}
df2 =pd.DataFrame(d,index=['r','d','a'],columns=['two','three'])

DataFrame也可以从值是数组的字典创建，但是各个数组的长度需要相同,值非数组时，没有这一限制。

在这里插入代码片

在实际处理数据时，有时需要创建一个空的DataFrame，可以这么做：

猜你喜欢