14-索引扩展--数据分析

索引扩展

In [2]:
import pandas as pd
import numpy as np
In [5]:
 
            
s = pd.Series(np.arange(5),index = np.arange(5)[::-1],dtype='int64')
s
Out[5]:
4    0
3    1
2    2
1    3
0    4
dtype: int64

isin([1,3,4]) 查看数据表中是否有查看的数据

In [8]:
 
            
s.isin([1,3,4])
Out[8]:
4    False
3     True
2    False
1     True
0     True
dtype: bool
In [11]:
s[s.isin([1,3,4])]
Out[11]:
3    True
1    True
0    True
dtype: bool

MultiIndex.from_product([[0,1],['a','b','c']]) 二重索引

In [13]:
s2 = pd.Series(np.arange(6),index = pd.MultiIndex.from_product([[0,1],['a','b','c']]))
s2
Out[13]:
0  a    0
   b    1
   c    2
1  a    3
   b    4
   c    5
dtype: int32
In [14]:
s2.iloc[s2.index.isin([(1,'a'),(2,'b')])]
Out[14]:
1  a    3
dtype: int32

查看数据表中1a和0b两条数据

In [15]:
s2.iloc[s2.index.isin([(1,'a'),(0,'b')])]
Out[15]:
0  b    1
1  a    3
dtype: int32
In [18]:
 
            
s = pd.Series(np.arange(5),index = np.arange(5)[::-1],dtype='int64')
s
Out[18]:
4    0
3    1
2    2
1    3
0    4
dtype: int64

使用where帅选符合条件的数据

In [19]:
s[s>2]
Out[19]:
1    3
0    4
dtype: int64
In [20]:
dates = pd.date_range('20171022',periods=8)
In [22]:
df = pd.DataFrame(np.random.randn(8,4),index=dates,columns=['A','B','C','D'])
df
Out[22]:
  A B C D
2017-10-22 -0.429368 -0.097481 0.643293 -0.984100
2017-10-23 0.223474 -0.897888 -0.240347 0.016184
2017-10-24 -0.961633 -1.063981 1.074710 0.360836
2017-10-25 1.173998 1.105386 -0.412135 -1.536320
2017-10-26 -0.656568 -0.938052 0.188153 -0.979891
2017-10-27 1.153825 -0.255656 1.194725 0.686401
2017-10-28 1.778536 1.809798 0.915557 -0.805165
2017-10-29 1.565753 0.705639 0.398115 1.356791
In [23]:
 
            
df.select(lambda x:x=='A',axis='columns')
d:\pythons\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: 'select' is deprecated and will be removed in a future release. You can use .loc[labels.map(crit)] as a replacement
  """Entry point for launching an IPython kernel.
Out[23]:
  A
2017-10-22 -0.429368
2017-10-23 0.223474
2017-10-24 -0.961633
2017-10-25 1.173998
2017-10-26 -0.656568
2017-10-27 1.153825
2017-10-28 1.778536
2017-10-29 1.565753
  • where() 在数据表中查看小于0的数据,大于0的数据默认显示NaN
In [24]:
df.where(df<0)
Out[24]:
  A B C D
2017-10-22 -0.429368 -0.097481 NaN -0.984100
2017-10-23 NaN -0.897888 -0.240347 NaN
2017-10-24 -0.961633 -1.063981 NaN NaN
2017-10-25 NaN NaN -0.412135 -1.536320
2017-10-26 -0.656568 -0.938052 NaN -0.979891
2017-10-27 NaN -0.255656 NaN NaN
2017-10-28 NaN NaN NaN -0.805165
2017-10-29 NaN NaN NaN NaN
 
           
* where()  在数据表中将不符合条件的数据,默认显示为NaN替换成其他的数据。
In [26]:
 
            
df.where(df<0,'A')
Out[26]:
  A B C D
2017-10-22 -0.429368 -0.0974806 A -0.9841
2017-10-23 A -0.897888 -0.240347 A
2017-10-24 -0.961633 -1.06398 A A
2017-10-25 A A -0.412135 -1.53632
2017-10-26 -0.656568 -0.938052 A -0.979891
2017-10-27 A -0.255656 A A
2017-10-28 A A A -0.805165
2017-10-29 A A A A
In [27]:
df = pd.DataFrame(np.random.rand(10,3),columns = list('abc'))
df
Out[27]:
  a b c
0 0.364628 0.665812 0.403687
1 0.142510 0.243596 0.809217
2 0.135084 0.884445 0.148034
3 0.129592 0.768869 0.163191
4 0.680117 0.216741 0.375793
5 0.324470 0.770352 0.522526
6 0.228491 0.985218 0.771736
7 0.813559 0.095543 0.009304
8 0.065755 0.262572 0.142525
9 0.603809 0.427960 0.414141

query() 查询数据

In [28]:
 
            
df.query('(a<b)')
Out[28]:
  a b c
0 0.364628 0.665812 0.403687
1 0.142510 0.243596 0.809217
2 0.135084 0.884445 0.148034
3 0.129592 0.768869 0.163191
5 0.324470 0.770352 0.522526
6 0.228491 0.985218 0.771736
8 0.065755 0.262572 0.142525
In [30]:
 
            
df.query('(a<b)&(b<c)')
Out[30]:
  a b c
1 0.14251 0.243596 0.809217
In [ ]:

猜你喜欢

转载自blog.csdn.net/m0_38039437/article/details/80771134