【python数据处理】seaborn

简化了matplotlib  不存在直方图

Seaborn是一个Python数据可视化库,它提供简单的代码,为统计探索和洞察创建优雅的可视化。Seaborn基于Matplotlib,但在几个方面改进了Matplotlib:

  • Seaborn提供更具视觉吸引力的绘图风格和简洁的语法。
  • Seaborn本身了解Pandas DataFrames,可以更轻松地直接从CSV绘制数据。
  • Seaborn可以轻松地将包含许多行数据的Pandas DataFrames汇总到聚合图表中。

 

1.引入 

import seaborn as sns

 

 

2.使用

首先使用pandas导入csv再使用seaborn绘图

例子:

import codecademylib3_seaborn
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

# Load results.csv here:
df = pd.read_csv('results.csv')
print(df)

sns.barplot(
	data=df,
	x='Gender',
	y='Mean Satisfaction'
)
 
plt.show()
  Gender Mean Satisfaction
0 Male 7.2
1 Female 8.1
2 Non-binary 6.8

3.聚合

自动将数据聚合了

例子:

import codecademylib3_seaborn
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

gradebook = pd.read_csv("gradebook.csv")

print(gradebook.head())

sns.barplot(data=gradebook,
           x='assignment_name',
           y='grade')

plt.show()
  student assignment_name grade
0 Amy Assignment 1 75
1 Amy Assignment 2 82
2 Bob Assignment 1 99
3 Bob Assignment 2 90
4 Chris Assignment 1 72
5 Chris Assignment 2 66
6 Dan Assignment 1 88
7 Dan Assignment 2 82
8 Ellie Assignment 1 91
9 Ellie Assignment 2 85

4.barplot

默认显示的是平均值 可以显示中值 也可以显示某个值的出现次数 会隐藏掉一部分信息 无法体现出值的分布

参数

(1)         .ci='sd'

使用sns.barplot 时 加入参数ci='sd' 可以使图自动变成errorbar一样的,并且存在一个标准差的误差允许范围 

(2)         estimator=np.median

使用sns.barplot时 加入参数estimator=np.median 可以使图统计y的中位数

len统计个数

(3)        hue=         增加比较变量

原来只比较年龄,增加了hue=‘Gender’后多了个比较量 

import codecademylib3_seaborn
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np

df = pd.read_csv("survey.csv")

sns.barplot(data=df,
            x="Age Range",
            y="Response",
            hue="Gender",
            ci='sd')

plt.show()

5.KDEPLOT

KDE代表核密度估计器 

sns.set_style("darkgrid")  控制样式
sns.set_palette("pastel")

import codecademylib3_seaborn
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

# Take in the data from the CSVs as NumPy arrays:
set_one = np.genfromtxt("dataset1.csv", delimiter=",")
set_two = np.genfromtxt("dataset2.csv", delimiter=",")
set_three = np.genfromtxt("dataset3.csv", delimiter=",")
set_four = np.genfromtxt("dataset4.csv", delimiter=",")

# Creating a Pandas DataFrame:
n=500
df = pd.DataFrame({
    "label": ["set_one"] * n + ["set_two"] * n + ["set_three"] * n + ["set_four"] * n,
    "value": np.concatenate([set_one, set_two, set_three, set_four])
})

# Setting styles:
sns.set_style("darkgrid")
sns.set_palette("pastel")

# Add your code below:
sns.kdeplot(set_one, shade=True)
sns.kdeplot(set_two, shade=True)
sns.kdeplot(set_three, shade=True)
sns.kdeplot(set_four, shade=True)

plt.show()

6. boxplot 箱线图

import codecademylib3_seaborn
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

# Take in the data from the CSVs as NumPy arrays:
set_one = np.genfromtxt("dataset1.csv", delimiter=",")
set_two = np.genfromtxt("dataset2.csv", delimiter=",")
set_three = np.genfromtxt("dataset3.csv", delimiter=",")
set_four = np.genfromtxt("dataset4.csv", delimiter=",")

# Creating a Pandas DataFrame:
n=500
df = pd.DataFrame({
    "label": ["set_one"] * n + ["set_two"] * n + ["set_three"] * n + ["set_four"] * n,
    "value": np.concatenate([set_one, set_two, set_three, set_four])
})

# Setting styles:
sns.set_style("darkgrid")
sns.set_palette("pastel")

# Add your code below:
sns.boxplot(data=df, x='label', y='value')
plt.show()

7.violinplot 小提琴图

是boxplot 和KDEplot的联合

一个白点代表中位数。

粗黑线在每个小提琴的中心表示四分位数间距。

从中心延伸的线是置信区间 - 正如我们在条形图上看到的那样,小提琴图也显示95%置信区间。

import codecademylib3_seaborn
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

# Take in the data from the CSVs as NumPy arrays:
set_one = np.genfromtxt("dataset1.csv", delimiter=",")
set_two = np.genfromtxt("dataset2.csv", delimiter=",")
set_three = np.genfromtxt("dataset3.csv", delimiter=",")
set_four = np.genfromtxt("dataset4.csv", delimiter=",")

# Creating a Pandas DataFrame:
n=500
df = pd.DataFrame({
    "label": ["set_one"] * n + ["set_two"] * n + ["set_three"] * n + ["set_four"] * n,
    "value": np.concatenate([set_one, set_two, set_three, set_four])
})

# Setting styles:
sns.set_style("darkgrid")
sns.set_palette("pastel")

# Add your code below:
sns.violinplot(data=df, x="label", y="value")
plt.show()

8.样式

(1).sns.set_style

sns.set_style("darkgrid")
sns.stripplot(x="day", y="total_bill", data=tips)

darkgrid    黑色并且带网格

dark           黑色

white         白色

whitegrid  白色带网格

ticks          标签

(2).sns.despine()

除去右边及上方边框

sns.despine()

全除去 

sns.despine(left=True, bottom=True)

(3)sns.set_context()   设置大小

缩放图 四个大小papernotebooktalk,和poster从小到大

缩放字体和线宽

也可以通过set_context进行

如果都不满意可以以自己改 

sns.set_style("ticks")
sns.set_context("poster")
sns.stripplot(x="day", y="total_bill", data=tips)
sns.plotting_context()


{'axes.labelsize': 17.6,
 'axes.titlesize': 19.200000000000003,
 'font.size': 19.200000000000003,
 'grid.linewidth': 1.6,
 'legend.fontsize': 16.0,
 'lines.linewidth': 2.8000000000000003,
 'lines.markeredgewidth': 0.0,
 'lines.markersize': 11.200000000000001,
 'patch.linewidth': 0.48,
 'xtick.labelsize': 16.0,
 'xtick.major.pad': 11.200000000000001,
 'xtick.major.width': 1.6,
 'xtick.minor.width': 0.8,
 'ytick.labelsize': 16.0,
 'ytick.major.pad': 11.200000000000001,
 'ytick.major.width': 1.6,
 'ytick.minor.width': 0.8}

 

 

9.颜色

sns.color_palette()  调色板设置

palette = sns.color_palette("bright")

#查看调色板
sns.palplot(palette)

 除此以外还有一种设置是Color Brewer调色板

实战演练

import codecademylib3_seaborn
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns


df=pd.read_csv('WorldCupMatches.csv')
df['Total Goals']=df['Home Team Goals']+df['Away Team Goals']
#print(df.head())
df_goals=pd.read_csv('goals.csv')
#print(df_goals.head())

sns.set_style('whitegrid')
sns.set_context('notebook',font_scale=1.25)

f, ax = plt.subplots(figsize=(12,7))

ax=sns.barplot(data=df,x='Year',y='Total Goals')
ax.set_title('Average Number Of Goals Scored In World Cup Matches By Year')

plt.show()


f, ax2 = plt.subplots(figsize=(12,7))
ax2=sns.boxplot(data=df_goals,x='year',y='goals',palette='Spectral')
ax2.set_title('Goals Visualizing')

plt.show()

 

猜你喜欢

转载自blog.csdn.net/yt627306293/article/details/84790723
今日推荐