table of Contents
1 Overview
In the last article we first met Pandas have made some basic introduction to the Pandas, the paper we use further to learn some of the Pandas.
2. missing items
In reality, we obtained data sometimes there is a missing term problem for such data, we usually need to do some basic processing, by way of example let's take a look.
import numpy as np
from pandas import Series, DataFrame
s = Series(['1', '2', np.nan, '3'])
df = DataFrame([['1', '2'], ['3', np.nan], [np.nan, 4]])
print(s)
print(df)
# 清除缺失项
print(s.dropna())
print(df.dropna())
# 填充缺失项
print(df.fillna('9'))
print(df.fillna({0:'5', 1:'6'}))
3. Packet aggregation
We see examples of how grouping, aggregation operations.
from pandas import DataFrame
df = DataFrame({'name':['张三', '李四', '王五', '赵六'],
'gender':['男', '女', '男', '女'],
'age':[22, 11, 22, 33]})
# 根据 age 分组
gp1 = df.groupby('age')
# 根据 age、gender 分组
gp2 = df.groupby(['age', 'gender'])
# 根据 gender 进行分组,将 name 作为分组的键
gp3 = df['gender'].groupby(df['name'])
# 查看分组
print(gp2.groups)
# 分组数量
print(gp2.count())
# 选择分组
print(gp2.get_group((22, '男')))
print('---------')
# 聚合
gp4 = df.groupby(df['gender'])
# 和
print(gp4.sum())
# 平均值
print(gp4.mean())
# 最大值
print(gp4.max())
# 最小值
print(gp4.min())
# 同时做多个聚合运算
print(gp4.agg(['sum', 'mean']))
4. Data Merge
Pandas connecting operation with high performance memory, similar to SQL, which provides merge () operation as a function of the inlet connector between DataFrame object, we look by way of example.
from pandas import DataFrame
import pandas as pd
df1 = DataFrame({'A':[2, 4, 5], 'B':[1, 2, 3], 'C':[2, 3, 6]})
df2 = DataFrame({'D':[1, 3, 6], 'E':[2, 5, 7], 'F':[3, 6, 8]})
df3 = DataFrame({'G':[2, 3, 6], 'H':[3, 5, 7], 'I':[4, 6, 8]})
df4 = DataFrame({'G':[1, 3, 5], 'H':[4, 6, 8], 'I':[5, 7, 9]})
# 左连接(以 d1 为基础)
print(df1.join(df2, how='left'))
# 右连接
print(df1.join(df2, how='right'))
# 外连接
print(df1.join(df2, how='outer'))
# 合并多个 DataFrame
print(df3.join([df1, df2]))
# 指定列名进行合并
print(pd.merge(df3, df4, on='G'))
print(pd.merge(df3, df4, on=['G', 'H']))
print(pd.merge(df3, df4, how='left'))
print(pd.merge(df3, df4, how='right'))
print(pd.merge(df3, df4, how='outer'))
5. Data Visualization
Pandas in the Series and DataFrame drawing capabilities are packed plot matplotlib library () method to achieve, by way of example below we look at.
5.1 Line Chart
FIG polyline code implementation is as follows:
import pandas as pd, numpy as np, matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(10,2), columns=list('AB'))
df.plot()
plt.show()
Look at the results:
5.2 bar
Bar code for the vertical position as follows:
import pandas as pd, numpy as np, matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(5,3), columns=list('ABC'))
df.plot.bar()
plt.show()
Look at the results:
a transverse bar code is implemented as follows:
import pandas as pd, numpy as np, matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(5,3), columns=list('ABC'))
df.plot.barh()
plt.show()
Look at the results:
5.3 Histogram
Histogram code implementation is as follows:
import pandas as pd, numpy as np, matplotlib.pyplot as plt
df = pd.DataFrame({'A':np.random.randn(800)+1, 'B':np.random.randn(800)}, columns=list('AB'))
df.plot.hist(bins=10)
plt.show()
Look at the results:
we can be A, B separately, code to achieve the following:
import pandas as pd, numpy as np, matplotlib.pyplot as plt
df = pd.DataFrame({'A':np.random.randn(800)+1, 'B':np.random.randn(800)}, columns=list('AB'))
df.hist(bins=10)
plt.show()
Look at the results:
5.4 Scatter
Scatter code implementation is as follows:
import pandas as pd, numpy as np, matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(20, 2), columns=list('AB'))
df.plot.scatter(x='A', y='B')
plt.show()
Look at the results:
5.5 Pie
Pie code implementation is as follows:
import pandas as pd, numpy as np, matplotlib.pyplot as plt
df = pd.DataFrame([30, 20, 50], index=list('ABC'), columns=[''])
df.plot.pie(subplots=True)
plt.show()
Look at the results: