1. About pandas
The two major data structures in pandas are Series and Dataframe.
Series is similar to an object with an indexed one-dimensional array. Unlike the value, it has an additional label, so data can be obtained according to the label. At the same time, Series can be thought of as an ordered dictionary.
Dataframe is a tabular data structure that contains an ordered column. The data structure of different columns can be different, and the data type of the same column can be the same.
2. Some common operations of Series
import numpy as np import pandas as pd import sys from pandas import Series, DataFrame obj = Series([4, 7, -5, 3]) obj Out[129]: 0 4 1 7 2 -5 3 3 dtype: int64 In [130]: obj.values Out[130]: array([ 4, 7, -5, 3], dtype=int64) In [131]: obj.index#Get the index value Out[131]: RangeIndex(start=0, stop=4, step=1) In [132]: obj2 = Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c']) obj2 Out[132]: d 4 b 7 a -5 c 3 dtype: int64 In [133]: obj2.index Out[133]: Index(['d', 'b', 'a', 'c'], dtype='object') In [134]: obj2['a']#Get the corresponding value according to the index Out[134]: -5 In [135]: obj2['d'] = 6 obj2[['c', 'a', 'd']] Out[135]: c 3 a -5 d 6 dtype: int64 In [136]: obj2[obj2 > 0] Out[136]: d 6 b 7 c 3 dtype: int64 In [137]: obj2 * 2 Out[137]: d 12 b 14 a -10 c 6 dtype: int64 In [138]: np.exp(obj2) Out[138]: d 403.428793 b 1096.633158 a 0.006738 c 20.085537 dtype: float64 In [139]: #Index is not in the series index value 'b' in obj2#The index is not in the series index value Out[139]: True In [140]: #Create series from dictionary sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000} obj3 = Series(sdata) obj3 Out[140]: Ohio 35000 Oregon 16000 Texas 71000 Utah 5000 dtype: int64 In [141]: states = ['California', 'Ohio', 'Oregon', 'Texas'] obj4 = Series(sdata, index=states) obj4 Out[141]: California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 dtype: float64 In [142]: pd.isnull(obj4)#Detect true value Out[142]: California True Ohio False Oregon False Texas False dtype: bool In [143]: pd.notnull(obj4) Out[143]: California False Ohio True Oregon True Texas True dtype: bool In [144]: obj3 Out[144]: Ohio 35000 Oregon 16000 Texas 71000 Utah 5000 dtype: int64 In [145]: obj4 Out[145]: California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 dtype: float64 In [146]: obj3 + obj4 Out[146]: California NaN Ohio 70000.0 Oregon 32000.0 Texas 142000.0 Utah NaN dtype: float64 In [147]: obj4.name = 'population' obj4.index.name = 'state' obj4 Out[147]: state California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 Name: population, dtype: float64 In [148]: #Modify the index value through assignment obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan'] obj Out[148]: Bob 4 Steve 7 Jeff -5 Ryan 3 dtype: int64 In [ ]:
3. Common operations of Dataframe