Pandas library of python - first understanding of data structure

1. About pandas

    The two major data structures in pandas are Series and Dataframe.

    Series is similar to an object with an indexed one-dimensional array. Unlike the value, it has an additional label, so data can be obtained according to the label. At the same time, Series can be thought of as an ordered dictionary.

    Dataframe is a tabular data structure that contains an ordered column. The data structure of different columns can be different, and the data type of the same column can be the same.



2. Some common operations of Series

import numpy as np
import pandas as pd
import sys
from pandas import Series, DataFrame

obj = Series([4, 7, -5, 3])
obj
Out[129]:
0    4
1    7
2   -5
3    3
dtype: int64
In [130]:


obj.values
Out[130]:
array([ 4,  7, -5,  3], dtype=int64)
In [131]:

obj.index#Get the index value
Out[131]:
RangeIndex(start=0, stop=4, step=1)
In [132]:


obj2 = Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])
obj2
Out[132]:
d    4
b    7
a   -5
c    3
dtype: int64
In [133]:


obj2.index
Out[133]:
Index(['d', 'b', 'a', 'c'], dtype='object')
In [134]:


obj2['a']#Get the corresponding value according to the index
Out[134]:
-5
In [135]:


obj2['d'] = 6
obj2[['c', 'a', 'd']]
Out[135]:
c    3
a   -5
d    6
dtype: int64
In [136]:


obj2[obj2 > 0]
Out[136]:
d    6
b    7
c    3
dtype: int64
In [137]:


obj2 * 2
Out[137]:
d    12
b    14
a   -10
c     6
dtype: int64
In [138]:


np.exp(obj2)
Out[138]:
d     403.428793
b    1096.633158
a       0.006738
c      20.085537
dtype: float64
In [139]:


#Index is not in the series index value
'b' in obj2#The index is not in the series index value
Out[139]:
True
In [140]:


#Create series from dictionary
sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj3 = Series(sdata)
obj3
Out[140]:
Ohio      35000
Oregon    16000
Texas     71000
Utah       5000
dtype: int64
In [141]:


states = ['California', 'Ohio', 'Oregon', 'Texas']
obj4 = Series(sdata, index=states)
obj4
Out[141]:
California NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64
In [142]:


pd.isnull(obj4)#Detect true value
Out[142]:
California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool
In [143]:


pd.notnull(obj4)
Out[143]:
California    False
Ohio           True
Oregon         True
Texas          True
dtype: bool
In [144]:


obj3
Out[144]:
Ohio      35000
Oregon    16000
Texas     71000
Utah       5000
dtype: int64
In [145]:


obj4
Out[145]:
California NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64
In [146]:


obj3 + obj4
Out[146]:
California NaN
Ohio           70000.0
Oregon         32000.0
Texas         142000.0
Utah NaN
dtype: float64
In [147]:


obj4.name = 'population'
obj4.index.name = 'state'
obj4
Out[147]:
state
California NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
Name: population, dtype: float64
In [148]:


#Modify the index value through assignment
obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan']
obj
Out[148]:
Bob      4
Steve    7
Jeff    -5
Ryan     3
dtype: int64
In [ ]:

3. Common operations of Dataframe




















Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324572306&siteId=291194637