Python Data Analysis (D): Pandas Advanced

1 Overview

In the last article we first met Pandas have made some basic introduction to the Pandas, the paper we use further to learn some of the Pandas.

2. missing items

In reality, we obtained data sometimes there is a missing term problem for such data, we usually need to do some basic processing, by way of example let's take a look.

import numpy as np
from pandas import Series, DataFrame

s = Series(['1', '2', np.nan, '3'])
df = DataFrame([['1', '2'], ['3', np.nan], [np.nan, 4]])
print(s)
print(df)
#  清除缺失项
print(s.dropna())
print(df.dropna())
# 填充缺失项
print(df.fillna('9'))
print(df.fillna({0:'5', 1:'6'}))

3. Packet aggregation

We see examples of how grouping, aggregation operations.

from pandas import DataFrame

df = DataFrame({'name':['张三', '李四', '王五', '赵六'],
                'gender':['男', '女', '男', '女'],
                'age':[22, 11, 22, 33]})
# 根据 age 分组
gp1 = df.groupby('age')
# 根据 age、gender 分组
gp2 = df.groupby(['age', 'gender'])
# 根据 gender 进行分组,将 name 作为分组的键
gp3 = df['gender'].groupby(df['name'])
# 查看分组
print(gp2.groups)
# 分组数量
print(gp2.count())
# 选择分组
print(gp2.get_group((22, '男')))
print('---------')
# 聚合
gp4 = df.groupby(df['gender'])
# 和
print(gp4.sum())
# 平均值
print(gp4.mean())
# 最大值
print(gp4.max())
# 最小值
print(gp4.min())
# 同时做多个聚合运算
print(gp4.agg(['sum', 'mean']))

4. Data Merge

Pandas connecting operation with high performance memory, similar to SQL, which provides merge () operation as a function of the inlet connector between DataFrame object, we look by way of example.

from pandas import DataFrame
import pandas as pd

df1 = DataFrame({'A':[2, 4, 5], 'B':[1, 2, 3], 'C':[2, 3, 6]})
df2 = DataFrame({'D':[1, 3, 6], 'E':[2, 5, 7], 'F':[3, 6, 8]})
df3 = DataFrame({'G':[2, 3, 6], 'H':[3, 5, 7], 'I':[4, 6, 8]})
df4 = DataFrame({'G':[1, 3, 5], 'H':[4, 6, 8], 'I':[5, 7, 9]})
# 左连接(以 d1 为基础)
print(df1.join(df2, how='left'))
# 右连接
print(df1.join(df2, how='right'))
# 外连接
print(df1.join(df2, how='outer'))
# 合并多个 DataFrame
print(df3.join([df1, df2]))
# 指定列名进行合并
print(pd.merge(df3, df4, on='G'))
print(pd.merge(df3, df4, on=['G', 'H']))
print(pd.merge(df3, df4, how='left'))
print(pd.merge(df3, df4, how='right'))
print(pd.merge(df3, df4, how='outer'))

5. Data Visualization

Pandas in the Series and DataFrame drawing capabilities are packed plot matplotlib library () method to achieve, by way of example below we look at.

5.1 Line Chart

FIG polyline code implementation is as follows:

import pandas as pd, numpy as np, matplotlib.pyplot as plt

df = pd.DataFrame(np.random.randn(10,2), columns=list('AB'))
df.plot()
plt.show()

Look at the results:

5.2 bar

Bar code for the vertical position as follows:

import pandas as pd, numpy as np, matplotlib.pyplot as plt

df = pd.DataFrame(np.random.rand(5,3), columns=list('ABC'))
df.plot.bar()
plt.show()

Look at the results:
Here Insert Picture Description
a transverse bar code is implemented as follows:

import pandas as pd, numpy as np, matplotlib.pyplot as plt

df = pd.DataFrame(np.random.rand(5,3), columns=list('ABC'))
df.plot.barh()
plt.show()

Look at the results:
Here Insert Picture Description

5.3 Histogram

Histogram code implementation is as follows:

import pandas as pd, numpy as np, matplotlib.pyplot as plt

df = pd.DataFrame({'A':np.random.randn(800)+1, 'B':np.random.randn(800)}, columns=list('AB'))
df.plot.hist(bins=10)
plt.show()

Look at the results:

we can be A, B separately, code to achieve the following:

import pandas as pd, numpy as np, matplotlib.pyplot as plt

df = pd.DataFrame({'A':np.random.randn(800)+1, 'B':np.random.randn(800)}, columns=list('AB'))
df.hist(bins=10)
plt.show()

Look at the results:

5.4 Scatter

Scatter code implementation is as follows:

import pandas as pd, numpy as np, matplotlib.pyplot as plt

df = pd.DataFrame(np.random.rand(20, 2), columns=list('AB'))
df.plot.scatter(x='A', y='B')
plt.show()

Look at the results:

5.5 Pie

Pie code implementation is as follows:

import pandas as pd, numpy as np, matplotlib.pyplot as plt

df = pd.DataFrame([30, 20, 50], index=list('ABC'), columns=[''])
df.plot.pie(subplots=True)
plt.show()

Look at the results:


Published 64 original articles · won praise 1276 · Views 300,000 +

Guess you like

Origin blog.csdn.net/ityard/article/details/105010070