python库之pandas库

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接: https://blog.csdn.net/sandalphon4869/article/details/100585362


一、pandas

1.as

import pandas as pd

2.体系

pandas中有两类非常重要的数据结构,即序列Series和数据框DataFrame。

  • Series类似于numpy中的一维数组,除了通吃一维数组可用的函数或方法,而且其可通过索引标签的方式获取数据,还具有索引的自动对齐功能。

  • DataFrame类似于numpy中的二维数组,同样可以通用numpy数组的函数和方法

数据结构1
数据结构2
pandas
Series
DataFrame

二、序列Series和数据框DataFrame

1.Series创建

左边是下标索引,右边是对应的值

name=pd.Series(…)

(1)一维np数组

#ndarray类型的一维数组
import pandas as pd
import numpy as np

a=pd.Series(np.array([3,4,5]));
print(a)
'''
0    3
1    4
2    5
dtype: int32
'''

print(type(a))
#<class 'pandas.core.series.Series'>

(2)列表

#列表类型的一维数组
import pandas as pd

a=pd.Series([3,4,5]);
print(a)
'''
0    3
1    4
2    5
dtype: int64
'''

(3)字典

import pandas as pd

dict = {'a':3,'b':4,'c':5}

ds=pd.Series(dict)
print(ds)
'''
a    3
b    4
c    5
dtype: int64
'''

(4)通过DataFrame中的某一行或某一列创建序列

import pandas as pd

dic3 = {'one':{'a':1,'b':2,'c':3,'d':4},'two':{'a':5,'b':6,'c':7,'d':8},'three':{'a':9,'b':10,'c':11,'d':12}}
df3 = pd.DataFrame(dic3)

s3 = df3['one']
print(s3)
'''
a    1
b    2
c    3
d    4
Name: one, dtype: int64
'''
print(type(s3))
#<class 'pandas.core.series.Series'>

2.DataFrame

左边和上边都是索引

name=pd.DataFrame(…)

(1)二维np数组

#ndarray的二维数组
import pandas as pd
import numpy as np

arr1 = np.array(np.arange(12)).reshape(4,3)
df1 = pd.DataFrame(arr1)
print(df1)
'''
   0   1   2
0  0   1   2
1  3   4   5
2  6   7   8
3  9  10  11
'''

print(type(df1))
#<class 'pandas.core.frame.DataFrame'>

(2)嵌套列表

#列表类型的二维数组
import pandas as pd
arr2 = [[1,2,3],[4,5,6]];
df2 = pd.DataFrame(arr2)
print(df2)
'''
   0  1  2
0  1  2  3
1  4  5  6
'''

(3)字典

以下以两种字典来创建数据框,一个是字典列表,一个是嵌套字典。

#字典列表
import pandas as pd

dic1 = {'a':[1,2,3,4],'b':[5,6,7,8],'c':[9,10,11,12],'d':[13,14,15,16]}

df1 = pd.DataFrame(dic1)
print(df1)
'''
   a  b   c   d
0  1  5   9  13
1  2  6  10  14
2  3  7  11  15
3  4  8  12  16
'''
#嵌套字典
import pandas as pd

dic3 = {'one':{'a':1,'b':2,'c':3,'d':4},'two':{'a':5,'b':6,'c':7,'d':8},'three':{'a':9,'b':10,'c':11,'d':12}}

df3 = pd.DataFrame(dic3)
print(df3)
'''
   one  three  two
a    1      9    5
b    2     10    6
c    3     11    7
d    4     12    8
'''

(4)通过数据框的方式创建数据框

import pandas as pd

dic3 = {'one':{'a':1,'b':2,'c':3,'d':4},'two':{'a':5,'b':6,'c':7,'d':8},'three':{'a':9,'b':10,'c':11,'d':12}}
df3 = pd.DataFrame(dic3)

df4 = df3[['one','three']]
print(df4)
'''
   one  three
a    1      9
b    2     10
c    3     11
d    4     12
'''

三、索引

1.设置索引

(1)默认索引

如果不给一个指定的索引值,则自动生成一个从0开始的自增索引。

import pandas as pd

s4 = pd.Series(np.array([0,1,2,3,4,5]))
print(s4)
'''
0    0
1    1
2    2
3    3
4    4
5    5
dtype: int32
'''
import pandas as pd
import numpy as np

arr1 = np.array(np.arange(12)).reshape(4,3)

df1=pd.DataFrame(arr1)
print(df1)
'''
   0   1   2
0  0   1   2
1  3   4   5
2  6   7   8
3  9  10  11
'''

(2)查看索引

Series

可以通过index查看序列的索引:从start开始,不到stop,步长为step

print(s4.index)
#RangeIndex(start=0, stop=6, step=1)

DataFrame

index查看纵列的索引,columns查看横行的索引

import pandas as pd
import numpy as np

arr1 = np.array(np.arange(12)).reshape(4,3)

df1=pd.DataFrame(arr1,index=['a','b','c','d'],columns=[3,4,5])
print(df1)
'''
   3   4   5
a  0   1   2
b  3   4   5
c  6   7   8
d  9  10  11
'''

print(df1.index)
#Index(['a', 'b', 'c', 'd'], dtype='object')

print(df1.columns)
#Int64Index([3, 4, 5], dtype='int64')

(3)自定义索引

自定义索引的概念

自定义索引后,不仅自定义索引可以使用,也可以使用原来的默认索引

分开创建

import pandas as pd

s4 = pd.Series(np.array([0,1,2,3,4,5]))
print(s4)
'''
0    0
1    1
2    2
3    3
4    4
5    5
dtype: int32
'''

s4.index = ['a','b','c','d','e','f']
print(s4)
'''
a    0
b    1
c    2
d    3
e    4
f    5
dtype: int32
'''
import pandas as pd
import numpy as np

arr1 = np.array(np.arange(12)).reshape(4,3)
df1 = pd.DataFrame(arr1)
print(df1)
'''
   0   1   2
0  0   1   2
1  3   4   5
2  6   7   8
3  9  10  11
'''

df1.index=['a','b','c','d']
print(df1)
'''
   0   1   2
a  0   1   2
b  3   4   5
c  6   7   8
d  9  10  11
'''

df1.columns=[3,4,5]
print(df1)
'''
   3   4   5
a  0   1   2
b  3   4   5
c  6   7   8
d  9  10  11
'''

在初始化时创建

import pandas as pd

s=pd.Series(np.array([1,2,3]),index=['a','b','c'])
print(s)
'''
a    1
b    2
c    3
dtype: int32
'''
print(s.index)
#Index(['a', 'b', 'c'], dtype='object')
import pandas as pd
import numpy as np

arr1 = np.array(np.arange(12)).reshape(4,3)

df1=pd.DataFrame(arr1,index=['a','b','c','d'],columns=[3,4,5])
print(df1)
'''
   3   4   5
a  0   1   2
b  3   4   5
c  6   7   8
d  9  10  11
'''

2.通过索引值获取数据

单个索引:索引值,如s4[1]或者s4[‘b’]

多个索引:一维列表,如s4[1,3,5]或者s4[‘b’,‘d’,‘f’]

花式索引:通过自定义索引标签获取数据的话,末端标签所对应的值是可以返回的!默认索引标签不返回。
[4:]和[’d’:]、[:2]和[:‘c’]效果是一样的。但[‘b’:‘d’]是以从b到包含d,[1:3]是从1不到3。

import pandas as pd

s4 = pd.Series(np.array([0,1,2,3,4,5]))
print(s4)
'''
0    0
1    1
2    2
3    3
4    4
5    5
dtype: int32
'''

s4.index = ['a','b','c','d','e','f']
print(s4)
'''
a    0
b    1
c    2
d    3
e    4
f    5
dtype: int32
'''

#单个索引
print('s4[3]: ',s4[3])
print('s4[e]: ',s4['e'])
'''
s4[3]:  3
s4[e]:  4
'''

#多个索引
print("s4[[1,3,5]]: ",s4[[1,3,5]])
'''
s4[1,3,5]:  b    1
d    3
f    5
dtype: int32
'''
print("s4[['b','d','f']]: ",s4[['b','d','f']])
'''
s4[['b','d','f']]:  b    1
d    3
f    5
dtype: int32
'''

#花式索引-一样的
print('s4[:4]: ',s4[:4])
print("s4[:'d']:",s4[:'d'])
print('s4[2:]',s4[2:])
print("s4['c':]: ",s4['c':])
'''
s4[:4]:  a    0
b    1
c    2
d    3
dtype: int32
s4[:'d']: a    0
b    1
c    2
d    3
dtype: int32
s4[2:] c    2
d    3
e    4
f    5
dtype: int32
s4['c':]:  c    2
d    3
e    4
f    5
dtype: int32
'''

#花式索引-不同的
print("s4['b':'d']: ",s4['b':'d'])
print("s4[1:3]:",s4[1:3])
'''
s4['b':'d']:  b    1
c    2
d    3
dtype: int32
s4[1:3]: b    1
c    2
dtype: int32
'''

3.自动化对齐

如果有两个序列,需要对这两个序列进行算术运算,这时索引的存在就体现的它的价值了—自动化对齐.

当有对应的索引时,结果为索引之间的结果。

当缺乏对应的索引时,结果为NaN。

import pandas as pd
import numpy as np

s1=pd.Series(np.array([1,2,3]),index=['a','b','c'])
s2=pd.Series(np.array([10,20,30]),index=['b','a','d'])

print(s1+s2)
'''
a    21.0
b    12.0
c     NaN
d     NaN
dtype: float64
'''
import pandas as pd

d1=pd.DataFrame({'a':[1,2,3],'b':[4,5,6],'c':[7,8,9]})
d2=pd.DataFrame({'b':[10,10,10],'a':[20,20,20],'d':[30,30,30]})
print(d1+d2)
'''
    a   b   c   d
0  21  14 NaN NaN
1  22  15 NaN NaN
2  23  16 NaN NaN
'''

猜你喜欢

转载自blog.csdn.net/sandalphon4869/article/details/100585362