Directory: Seaborn library
I. Introduction
Seaborn is a Python library built on top of matplot that can be used to make rich and very attractive statistical graphics.
The Seaborn library aims to make visualization a core part of exploring and understanding data, helping to help people get a closer look at the data sets under study. Whether it is in various algorithm competitions on Kaggle's official website, or in the actual business data mining scenarios of Internet companies, it has its presence.
2. Practice
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
tips = pd.read_csv('tips.csv')
tips.head()
2.1 Correlation of each attribute
# 相关性
tips.corr()
2.1.1 pair plot
#相关性图 很壮观
sns.pairplot(tips)
Look at the pictures and talk: These pictures show the relationship between the three characteristics (variables) of total consumption, tip amount and number of customers in the data set.
#相关性图,和某一列的关系
sns.pairplot(tips ,hue ='sex', markers=["o", "s"])
2.1.2 Heat map
# 相关性热力图
sns.heatmap(tips.corr())
Look at the picture and talk: The heat map can be used to display the correlation between two variables. Here, the lighter the color of the corresponding rectangular box between the two variables, the more correlated there is between the two variables.
# 分层相关性热力图
sns.clustermap(tips.corr())
2.1.3 pair grid diagram
g = sns.PairGrid(tips)
g.map_diag(sns.distplot)
g.map_upper(plt.scatter)
g.map_lower(sns.kdeplot)
In the pair grid diagram, you can present the various types of graphics described above according to your own needs.
2.2 Distribution of a single attribute
2.2.1 dist plot
sns.distplot(tips['total_bill'])
sns.distplot(tips['total_bill'],kde = False)
Look at the picture and talk: The above picture shows that the total amount of consumption of customers in restaurants is mainly distributed within the range of 5-35.
2.2.2 count plot
sns.countplot(x = 'smoker', data = tips)
Look at the picture and talk: the above picture shows that there are more non-smokers than smokers in the restaurant
sns.countplot(x = 'size', data = tips)
Look at the picture and talk: the above picture shows that the total number of times 2 people come to the restaurant is more.
2.2.3 rug plot
sns.rugplot(tips['total_bill'])
Look at the picture and talk: The above picture shows the marginal distribution of the total dining consumption of customers on each value.
2.2.4 kde plot
sns.kdeplot(tips['total_bill'], shade=True)
Look at the picture and talk: KDE stands for Kernel Density Estimation, which also shows the statistical distribution of the value of each total consumption amount.
2.3 Correlation diagram of pairwise attributes
2.3.1 joint plot
sns.jointplot(x = 'total_bill', y = 'tip', data = tips)
Look at the picture and talk: the above picture shows that the main consumption level of customers is between 10-30 yuan, and at this time, the money corresponding to tipping the waiter is between 1-5 yuan.
sns.jointplot(x = 'total_bill', y = 'tip', data = tips ,kind = 'hex')
Another clear visualization, where the depth of color represents frequency.
sns.jointplot(x = 'total_bill', y = 'tip', data = tips ,kind = 'reg')
Look at the picture and talk: By doing a simple regression line, it shows that the amount of the tip increases with the increase of the total bill amount.
sns.jointplot(x = 'total_bill', y = 'tip', data = tips ,kind = 'kde')
Another visual statistical chart: the darker an area is, the more frequencies it corresponds to.
2.3.2 box plot
sns.boxplot(x = 'day', y= 'total_bill', data = tips)
Look at the picture: The chart above shows that most bills are paid on Saturdays and Sundays.
sns.boxplot(x = 'day', y= 'total_bill', data = tips, hue = 'sex')
Look at the picture and talk: In the above chart, you can see that on Saturdays, women pay more than men.
2.3.3 violin plot
sns.violinplot(x = 'day', y= 'total_bill', data = tips)
Look at the picture and talk: voilin plot is very similar to box plot, but it combines box plot diagram and density trace.
sns.violinplot(x = 'day', y= 'total_bill', data = tips, hue = 'sex', split = True)
Look at the picture and talk: increase the gender distinction
2.3.4 strip plot
sns.stripplot(x = 'day', y = 'total_bill', data = tips)
Look at the picture and talk: This picture shows the scatter diagram of the total consumption of customers in the four days of Thursday, Friday, Saturday and Sunday.
sns.stripplot(x = 'day', y = 'total_bill', data = tips, jitter= True,hue = 'sex', dodge = True)
Look at the picture and talk: It is the same as the picture above, but the gender is differentiated.
2.3.5 swarm plot
sns.swarmplot(x = 'day', y = 'total_bill', data = tips)
Look at the picture and talk: Swarn plot is similar to stripplot, but the difference of Swarn plot is that it does not overlap data points.
2.3.6 factor plot
sns.factorplot(x = 'day', y = 'total_bill', kind = 'box', data = tips)
Look at the picture and talk: In the factorplot figure, you can give any graphics you need to display.