python之Pandas库——数据结构初识

一、关于pandas

pandas里面的两大数据结构Series和Dataframe。

Series类似于有索引一维数组的对象，与值不同的是，它多了一个标签，所以可以根据标签取数据。同时，可以将Series看作是一个有序的字典。

Dataframe 是一个表格型的数据结构，含有一个有序的列，不同的列的数据结构可以不一样，同一列的数据类型可以是一样的。

二、Series一些常见操作

import numpy as np
import pandas as pd
import sys
from pandas import Series, DataFrame

obj = Series([4, 7, -5, 3])
obj
Out[129]:
0    4
1    7
2   -5
3    3
dtype: int64
In [130]:


obj.values
Out[130]:
array([ 4,  7, -5,  3], dtype=int64)
In [131]:

obj.index#获取索引值
Out[131]:
RangeIndex(start=0, stop=4, step=1)
In [132]:


obj2 = Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])
obj2
Out[132]:
d    4
b    7
a   -5
c    3
dtype: int64
In [133]:


obj2.index
Out[133]:
Index(['d', 'b', 'a', 'c'], dtype='object')
In [134]:


obj2['a']#根据索引获取对应的值
Out[134]:
-5
In [135]:


obj2['d'] = 6
obj2[['c', 'a', 'd']]
Out[135]:
c    3
a   -5
d    6
dtype: int64
In [136]:


obj2[obj2 > 0]
Out[136]:
d    6
b    7
c    3
dtype: int64
In [137]:


obj2 * 2
Out[137]:
d    12
b    14
a   -10
c     6
dtype: int64
In [138]:


np.exp(obj2)
Out[138]:
d     403.428793
b    1096.633158
a       0.006738
c      20.085537
dtype: float64
In [139]:


#索引在不在series索引值中
'b' in obj2#索引在不在series索引值中
Out[139]:
True
In [140]:


#通过字典创建series
sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj3 = Series(sdata)
obj3
Out[140]:
Ohio      35000
Oregon    16000
Texas     71000
Utah       5000
dtype: int64
In [141]:


states = ['California', 'Ohio', 'Oregon', 'Texas']
obj4 = Series(sdata, index=states)
obj4
Out[141]:
California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64
In [142]:


pd.isnull(obj4)#检测确实值
Out[142]:
California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool
In [143]:


pd.notnull(obj4)
Out[143]:
California    False
Ohio           True
Oregon         True
Texas          True
dtype: bool
In [144]:


obj3 
Out[144]:
Ohio      35000
Oregon    16000
Texas     71000
Utah       5000
dtype: int64
In [145]:


obj4
Out[145]:
California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64
In [146]:


obj3 + obj4
Out[146]:
California         NaN
Ohio           70000.0
Oregon         32000.0
Texas         142000.0
Utah               NaN
dtype: float64
In [147]:


obj4.name = 'population'
obj4.index.name = 'state'
obj4
Out[147]:
state
California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
Name: population, dtype: float64
In [148]:


#通赋值修改索引值
obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan']
obj
Out[148]:
Bob      4
Steve    7
Jeff    -5
Ryan     3
dtype: int64
In [ ]:

三、Dataframe常见操作

python之Pandas库——数据结构初识

猜你喜欢