简化了matplotlib 不存在直方图
Seaborn是一个Python数据可视化库,它提供简单的代码,为统计探索和洞察创建优雅的可视化。Seaborn基于Matplotlib,但在几个方面改进了Matplotlib:
- Seaborn提供更具视觉吸引力的绘图风格和简洁的语法。
- Seaborn本身了解Pandas DataFrames,可以更轻松地直接从CSV绘制数据。
- Seaborn可以轻松地将包含许多行数据的Pandas DataFrames汇总到聚合图表中。
1.引入
import seaborn as sns
2.使用
首先使用pandas导入csv再使用seaborn绘图
例子:
import codecademylib3_seaborn
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
# Load results.csv here:
df = pd.read_csv('results.csv')
print(df)
sns.barplot(
data=df,
x='Gender',
y='Mean Satisfaction'
)
plt.show()
Gender | Mean Satisfaction | |
---|---|---|
0 | Male | 7.2 |
1 | Female | 8.1 |
2 | Non-binary | 6.8 |
3.聚合
自动将数据聚合了
例子:
import codecademylib3_seaborn
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
gradebook = pd.read_csv("gradebook.csv")
print(gradebook.head())
sns.barplot(data=gradebook,
x='assignment_name',
y='grade')
plt.show()
student | assignment_name | grade | |
---|---|---|---|
0 | Amy | Assignment 1 | 75 |
1 | Amy | Assignment 2 | 82 |
2 | Bob | Assignment 1 | 99 |
3 | Bob | Assignment 2 | 90 |
4 | Chris | Assignment 1 | 72 |
5 | Chris | Assignment 2 | 66 |
6 | Dan | Assignment 1 | 88 |
7 | Dan | Assignment 2 | 82 |
8 | Ellie | Assignment 1 | 91 |
9 | Ellie | Assignment 2 | 85 |
4.barplot
默认显示的是平均值 可以显示中值 也可以显示某个值的出现次数 会隐藏掉一部分信息 无法体现出值的分布
参数
(1) .ci='sd'
使用sns.barplot 时 加入参数ci='sd' 可以使图自动变成errorbar一样的,并且存在一个标准差的误差允许范围
(2) estimator=np.median
使用sns.barplot时 加入参数estimator=np.median 可以使图统计y的中位数
len统计个数
(3) hue= 增加比较变量
原来只比较年龄,增加了hue=‘Gender’后多了个比较量
import codecademylib3_seaborn
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
df = pd.read_csv("survey.csv")
sns.barplot(data=df,
x="Age Range",
y="Response",
hue="Gender",
ci='sd')
plt.show()
5.KDEPLOT
KDE代表核密度估计器
sns.set_style("darkgrid") 控制样式
sns.set_palette("pastel")
import codecademylib3_seaborn
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
# Take in the data from the CSVs as NumPy arrays:
set_one = np.genfromtxt("dataset1.csv", delimiter=",")
set_two = np.genfromtxt("dataset2.csv", delimiter=",")
set_three = np.genfromtxt("dataset3.csv", delimiter=",")
set_four = np.genfromtxt("dataset4.csv", delimiter=",")
# Creating a Pandas DataFrame:
n=500
df = pd.DataFrame({
"label": ["set_one"] * n + ["set_two"] * n + ["set_three"] * n + ["set_four"] * n,
"value": np.concatenate([set_one, set_two, set_three, set_four])
})
# Setting styles:
sns.set_style("darkgrid")
sns.set_palette("pastel")
# Add your code below:
sns.kdeplot(set_one, shade=True)
sns.kdeplot(set_two, shade=True)
sns.kdeplot(set_three, shade=True)
sns.kdeplot(set_four, shade=True)
plt.show()
6. boxplot 箱线图
import codecademylib3_seaborn
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
# Take in the data from the CSVs as NumPy arrays:
set_one = np.genfromtxt("dataset1.csv", delimiter=",")
set_two = np.genfromtxt("dataset2.csv", delimiter=",")
set_three = np.genfromtxt("dataset3.csv", delimiter=",")
set_four = np.genfromtxt("dataset4.csv", delimiter=",")
# Creating a Pandas DataFrame:
n=500
df = pd.DataFrame({
"label": ["set_one"] * n + ["set_two"] * n + ["set_three"] * n + ["set_four"] * n,
"value": np.concatenate([set_one, set_two, set_three, set_four])
})
# Setting styles:
sns.set_style("darkgrid")
sns.set_palette("pastel")
# Add your code below:
sns.boxplot(data=df, x='label', y='value')
plt.show()
7.violinplot 小提琴图
是boxplot 和KDEplot的联合
一个白点代表中位数。
粗黑线在每个小提琴的中心表示四分位数间距。
从中心延伸的线是置信区间 - 正如我们在条形图上看到的那样,小提琴图也显示95%置信区间。
import codecademylib3_seaborn
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
# Take in the data from the CSVs as NumPy arrays:
set_one = np.genfromtxt("dataset1.csv", delimiter=",")
set_two = np.genfromtxt("dataset2.csv", delimiter=",")
set_three = np.genfromtxt("dataset3.csv", delimiter=",")
set_four = np.genfromtxt("dataset4.csv", delimiter=",")
# Creating a Pandas DataFrame:
n=500
df = pd.DataFrame({
"label": ["set_one"] * n + ["set_two"] * n + ["set_three"] * n + ["set_four"] * n,
"value": np.concatenate([set_one, set_two, set_three, set_four])
})
# Setting styles:
sns.set_style("darkgrid")
sns.set_palette("pastel")
# Add your code below:
sns.violinplot(data=df, x="label", y="value")
plt.show()
8.样式
(1).sns.set_style
sns.set_style("darkgrid")
sns.stripplot(x="day", y="total_bill", data=tips)
darkgrid 黑色并且带网格
dark 黑色
white 白色
whitegrid 白色带网格
ticks 标签
(2).sns.despine()
除去右边及上方边框
sns.despine()
全除去
sns.despine(left=True, bottom=True)
(3)sns.set_context() 设置大小
缩放图 四个大小paper
,notebook
,talk
,和poster从小到大
缩放字体和线宽
也可以通过set_context进行
如果都不满意可以以自己改
sns.set_style("ticks")
sns.set_context("poster")
sns.stripplot(x="day", y="total_bill", data=tips)
sns.plotting_context()
{'axes.labelsize': 17.6,
'axes.titlesize': 19.200000000000003,
'font.size': 19.200000000000003,
'grid.linewidth': 1.6,
'legend.fontsize': 16.0,
'lines.linewidth': 2.8000000000000003,
'lines.markeredgewidth': 0.0,
'lines.markersize': 11.200000000000001,
'patch.linewidth': 0.48,
'xtick.labelsize': 16.0,
'xtick.major.pad': 11.200000000000001,
'xtick.major.width': 1.6,
'xtick.minor.width': 0.8,
'ytick.labelsize': 16.0,
'ytick.major.pad': 11.200000000000001,
'ytick.major.width': 1.6,
'ytick.minor.width': 0.8}
9.颜色
sns.color_palette() 调色板设置
palette = sns.color_palette("bright")
#查看调色板
sns.palplot(palette)
除此以外还有一种设置是Color Brewer调色板
实战演练
import codecademylib3_seaborn
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
df=pd.read_csv('WorldCupMatches.csv')
df['Total Goals']=df['Home Team Goals']+df['Away Team Goals']
#print(df.head())
df_goals=pd.read_csv('goals.csv')
#print(df_goals.head())
sns.set_style('whitegrid')
sns.set_context('notebook',font_scale=1.25)
f, ax = plt.subplots(figsize=(12,7))
ax=sns.barplot(data=df,x='Year',y='Total Goals')
ax.set_title('Average Number Of Goals Scored In World Cup Matches By Year')
plt.show()
f, ax2 = plt.subplots(figsize=(12,7))
ax2=sns.boxplot(data=df_goals,x='year',y='goals',palette='Spectral')
ax2.set_title('Goals Visualizing')
plt.show()