python基础学习（八）-Pandas

Pandas是一个强大的时间序列数据处理工具包。

1 基本数据结构

Pandas最基础的数据结构是Series，用它来表达一行数据，可以理解为一维的数组。

import pandas as pd
#Series可以理解为一维数组
s=pd.Series([1,3,5,7,9])

print(s)

另外一个关键的数据结构为DataFrame，它表示的是二维数组。

#DataFrame是二维数组对象。DataFrame里数据实际上是用Numpy的array对象保存的，可以输入df.values来查看原始数据。DataFrame对象的每一行和列都是一个Series对象。我们可以使用行索引来访问一行数据，可以用列名称来索引一列数据。

df=pd.DataFrame(np.random.randn(6,4),columns=list('ABCD'))
print(df)
print(df.iloc[0])
print(df.A)
#DataFrame.shape可以查看数据的维度信息。
print(df.shape)
#通过DataFrame.head()和DataFrame.tail()方法可以访问前n行和后n行的数据
print(df.head(3),df.tail(3))
#可以通过DataFrame.index和DataFrame.columns属性，可以访问数据的行索引和列索引信息。
print(df.index)
print(df.columns)
#通过DataFrame.describe()可以简单的统计数据信息。

print(df.describe())

2数据排序

#通过DataFrame.sort_index()函数可以方便的对索引排序。
print(df.sort_index(axis=1,ascending=False))
#可以通过DataFrame.sort_values()对数值进行排序。

print(df.sort_values(by='B'))

3 数据访问

Pandas可以方便的对数据进行选择和访问

print(df)
print(df[3:5])
print(df[['A','B','C']])
#使用DataFrame.loc()函数通过标签来选择某个元素，或者使用DataFrame.iloc()函数通过数组索引来访问某个元素
print(df.loc[3,'A'])
print(df.iloc[3,0])
print(df.iloc[2:5,0:2])
#还可以通过布尔值来选择
print(df[df.C>0])
#添加一列
df['TAG']=['cat','dog','rabbit','cat','dog','rabbit']
print(df)
#根据TAG做分组统计

print(df.groupby('TAG').sum())

4时间序列

Pandas提供强大的时间序列处理功能，我们可以创建以时间序列为索引的数据集。

n_items=366
ts=pd.Series(np.random.randn(n_items),index=pd.date_range('20000101',periods=n_items))
print(ts.shape)
print(ts.head())
#这些数据按月份聚合

print(ts.resample("1m").sum())

5数据可视化

#以2000年1月1日为起始日期，创建366条数据记录
n_items=366
ts=pd.Series(np.random.randn(n_items),index=pd.date_range('20000101',periods=n_items))
print(ts.shape)
print(ts.head())
#这些数据按月份聚合
print(ts.resample("1m").sum())
import matplotlib.pyplot as plt
# plt.figure(figsize=(10,6),dpi=144)
# cs=ts.cumsum()
# cs.plot()
plt.figure(figsize=(10,6),dpi=144)
ts.resample("1m").sum().plot.bar()
plt.show()

python基础学习（八）-Pandas

猜你喜欢