[Machine Learning] Python statistical analysis visualization library Seaborn (correlation diagram, variable distribution diagram, box plot, etc.)

I. Introduction

Seaborn is a Python library built on top of matplot that can be used to make rich and very attractive statistical graphics.

The Seaborn library aims to make visualization a core part of exploring and understanding data, helping to help people get a closer look at the data sets under study. Whether it is in various algorithm competitions on Kaggle's official website, or in the actual business data mining scenarios of Internet companies, it has its presence.

2. Practice

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

tips = pd.read_csv('tips.csv')
tips.head()

insert image description here

2.1 Correlation of each attribute

# 相关性
tips.corr()

insert image description here

2.1.1 pair plot

#相关性图 很壮观
sns.pairplot(tips) 

insert image description here
Look at the pictures and talk: These pictures show the relationship between the three characteristics (variables) of total consumption, tip amount and number of customers in the data set.

#相关性图,和某一列的关系
sns.pairplot(tips ,hue ='sex', markers=["o", "s"])

insert image description here

2.1.2 Heat map

# 相关性热力图
sns.heatmap(tips.corr())

insert image description here

Look at the picture and talk: The heat map can be used to display the correlation between two variables. Here, the lighter the color of the corresponding rectangular box between the two variables, the more correlated there is between the two variables.

# 分层相关性热力图
sns.clustermap(tips.corr())

insert image description here

2.1.3 pair grid diagram

g = sns.PairGrid(tips)
g.map_diag(sns.distplot)
g.map_upper(plt.scatter)
g.map_lower(sns.kdeplot)

insert image description here

In the pair grid diagram, you can present the various types of graphics described above according to your own needs.

2.2 Distribution of a single attribute

2.2.1 dist plot

sns.distplot(tips['total_bill'])

insert image description here

sns.distplot(tips['total_bill'],kde = False)

insert image description here
Look at the picture and talk: The above picture shows that the total amount of consumption of customers in restaurants is mainly distributed within the range of 5-35.

2.2.2 count plot

sns.countplot(x = 'smoker', data = tips)

insert image description here
Look at the picture and talk: the above picture shows that there are more non-smokers than smokers in the restaurant

sns.countplot(x = 'size',  data = tips)

insert image description here
Look at the picture and talk: the above picture shows that the total number of times 2 people come to the restaurant is more.

2.2.3 rug plot

sns.rugplot(tips['total_bill'])

insert image description here
Look at the picture and talk: The above picture shows the marginal distribution of the total dining consumption of customers on each value.

2.2.4 kde plot

sns.kdeplot(tips['total_bill'], shade=True)

insert image description here
Look at the picture and talk: KDE stands for Kernel Density Estimation, which also shows the statistical distribution of the value of each total consumption amount.

2.3 Correlation diagram of pairwise attributes

2.3.1 joint plot

sns.jointplot(x = 'total_bill', y = 'tip', data = tips)

insert image description here
Look at the picture and talk: the above picture shows that the main consumption level of customers is between 10-30 yuan, and at this time, the money corresponding to tipping the waiter is between 1-5 yuan.

sns.jointplot(x = 'total_bill', y = 'tip', data = tips ,kind = 'hex')

insert image description here
Another clear visualization, where the depth of color represents frequency.

sns.jointplot(x = 'total_bill', y = 'tip', data = tips ,kind = 'reg')

insert image description here
Look at the picture and talk: By doing a simple regression line, it shows that the amount of the tip increases with the increase of the total bill amount.

sns.jointplot(x = 'total_bill', y = 'tip', data = tips ,kind = 'kde')

insert image description here
Another visual statistical chart: the darker an area is, the more frequencies it corresponds to.

2.3.2 box plot

sns.boxplot(x = 'day', y= 'total_bill', data = tips)

insert image description here
Look at the picture: The chart above shows that most bills are paid on Saturdays and Sundays.

sns.boxplot(x = 'day', y= 'total_bill', data = tips, hue = 'sex')

insert image description here
Look at the picture and talk: In the above chart, you can see that on Saturdays, women pay more than men.

2.3.3 violin plot

sns.violinplot(x = 'day', y= 'total_bill', data = tips)

insert image description here
Look at the picture and talk: voilin plot is very similar to box plot, but it combines box plot diagram and density trace.

sns.violinplot(x = 'day', y= 'total_bill', data = tips, hue = 'sex', split = True)

insert image description here

Look at the picture and talk: increase the gender distinction

2.3.4 strip plot

sns.stripplot(x = 'day', y = 'total_bill', data = tips)

insert image description here

Look at the picture and talk: This picture shows the scatter diagram of the total consumption of customers in the four days of Thursday, Friday, Saturday and Sunday.

sns.stripplot(x = 'day', y = 'total_bill', data = tips, jitter= True,hue = 'sex', dodge = True)

insert image description here

Look at the picture and talk: It is the same as the picture above, but the gender is differentiated.

2.3.5 swarm plot

sns.swarmplot(x = 'day', y = 'total_bill', data = tips)

insert image description here
Look at the picture and talk: Swarn plot is similar to stripplot, but the difference of Swarn plot is that it does not overlap data points.

2.3.6 factor plot

sns.factorplot(x = 'day', y = 'total_bill', kind = 'box', data = tips)

insert image description here
Look at the picture and talk: In the factorplot figure, you can give any graphics you need to display.

Guess you like

Origin blog.csdn.net/wzk4869/article/details/128746372