matplotlib use:
step1. Creating a blank canvas, this function returns fig canvas
fig=plt.figure()
step2. Create a sub-graph
ax = fig.add_subplot (1,2,1) # 1 means that the canvas is divided into two rows with the first row is now returned to the first column subgraph
step3. Start Draw, here we use seaborn to draw, it is advanced packaging matplotlib it does not require the canvas, which is specified in the previous step drawing canvas, the parameters for the x, y
sns.barplot(missing[col], missing.index)
step4.ax allows us to easily operate the subgraph, we set the title of the sub-picture, wherein the parameter f represents the presence of the interior, where it is used col enclosed in {}, the variable can be displayed in FIG.
ax.set_title(f'Missing values on each columns({col})')
step5. The last showing fig.show () does not display the image currently unknown reason.
plt.show()
step6. If you want to save the image
plt.savefig("1.png")
The example used is a practice of kaggle
https://www.kaggle.com/c/cat-in-the-dat-ii
Reference https://www.kaggle.com/warkingleo2000/first-step-on-kaggle/data
Complete code display, there are two ways first approach is to reference the https://zhuanlan.zhihu.com/p/93423829
Import PANDAS AS pd Import matplotlib.pyplot AS plt Import Seaborn AS the SNS DEF plot_missing_values (df): cols = df.columns COUNT = [. df [COL] .isnull () SUM () for COL in cols] # attention here Knowledge point, bearing in mind isnull () usage. Percent = [I / len (DF) for I in COUNT] where # Missing = pd.DataFrame ({ ' Number ' : COUNT, ' Proportion ' : Percent}, index = cols) # Note how to build dataframe Fig = PLT. figure (figsize = (20, 7)) for i, col in enumerate(missing.columns): ax=fig.add_subplot(1,2,i+1) ax.set_title(f'Missing values on each columns({col})') sns.barplot(missing[col], missing.index) plt.show() if __name__ == '__main__': raw_train=pd.read_csv("train.csv") raw_test=pd.read_csv("test.csv") plot_missing_values(raw_train) #plt.savefig("1.png") plot_missing_values(raw_test) #plt.savefig("2.png")
The second approach reference Kaggle https://www.kaggle.com/warkingleo2000/first-step-on-kaggle/data
def plot_missing_values(df): cols = df.columns count = [df[col].isnull().sum() for col in cols] percent = [i/len(df) for i in count] missing = pd.DataFrame({'number':count, 'proportion': percent}, index=cols) fig, ax = plt.subplots(1,2, figsize=(20,7)) for i, col in enumerate(missing.columns): plt.subplot(1,2,i+1) plt.title(f'Missing values on each columns({col})') sns.barplot(missing[col], missing.index) mean = np.mean(missing[col]) std = np.std(missing[col]) plt.ylabel('Columns') plt.plot([], [], ' ', label=f'Average {col} of missing values: {mean:.2f} \u00B1 {std:.2f}') plt.legend() plt.show() return missing.sort_values(by='number', ascending=False)
Graphical analysis of the painting:
train data
test data
从上面两张图中我们可以看到不论是训练数据还是测试的数据在每个特征中缺失所占比例很少,都在0.0x的范围之中。