Python Data Analysis - Visualization "big brother" of Seaborn

Seaborn

Since there matplotlib, then why you need seaborn it? In fact seaborn is encapsulated in matplotlib basis, Seaborn is to make difficult things easier. Matplotlib with the greatest difficulty is the default of various parameters, while Seaborn completely avoids this problem. seaborn is for statistical graphics, in general, seaborn data analysis to meet the needs of 90% of the plot, complex custom graphics, or to Matplotlib. Seaborn aims to become the core of visualization explore and understand data. Facing dataset drawing function blocks and data arrays contain the entire data set is operated, and the semantic mapping and performs the necessary statistics inside the polymerization to produce the information in FIG.

5 kinds of theme style

  • darkgrid
  • whitegrid
  • dark
  • white
  • ticks

Statistical analysis of the sketch - visualize statistical relationships

Statistical analysis is to understand how variable data sets associated with each other and how these relationships depend on other variables of the process. A common method to visualize statistical relationships: scatter plots and line drawings.
Commonly used three functions as follows:

  • replot()
  • scatterplot (kind = "scatter"; default)
  • lineplot (kind = "line", the default)
Commonly used parameters
* x,y,hue 数据集变量 变量名
* date 数据集 数据集名
* row,col 更多分类变量进行平铺显示 变量名
* col_wrap 每行的最高平铺数 整数
* estimator 在每个分类中进行矢量到标量的映射 矢量
* ci 置信区间 浮点数或None
* n_boot 计算置信区间时使用的引导迭代次数 整数
* units 采样单元的标识符,用于执行多级引导和重复测量设计 数据变量或向量数据
* order, hue_order 对应排序列表 字符串列表
* row_order, col_order 对应排序列表 字符串列表
* kind : 可选:point 默认, bar 柱形图, count 频次, box 箱体, violin 提琴, strip 散点,swarm 分散点
size 每个面的高度(英寸) 标量
aspect 纵横比 标量
orient 方向 "v"/"h"
color 颜色 matplotlib颜色
palette 调色板 seaborn颜色色板或字典
legend hue的信息面板 True/False
legend_out 是否扩展图形,并将信息框绘制在中心右边 True/False
share{x,y} 共享轴线 True/False
Scatter associated with variables

Scatter statistical visualization pillars. It depicts the joint distribution of the two variables using the point cloud, wherein the set of observation Each point represents data. Thus the relationship between the observed distribution of two variables preferably scattergram.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## 定义主题风格
sns.set(style="darkgrid")

## 加载tips
tips = sns.load_dataset("tips")

## 绘制图形,根据不同种类的三点设定图注
sns.relplot(x="total_bill", y="tip", hue="smoker", style="time", data=tips);
plt.show()

## 绘制渐变效果的散点图
sns.relplot(x="total_bill", y="tip", hue="size", palette="ch:r=-.5,l=.75", data=tips);
plt.show()

If the amount of data in the case where, with the scattergram hex .

eg:
## 设置颜色
sns.set(color_codes=True)
mean, cov = [0, 1], [(1, .5), (.5, 1)] # 设置均值(一组参数)和协方差(两组参数)
x, y = np.random.multivariate_normal(mean, cov, 1000).T
with sns.axes_style("ticks"):
    sns.jointplot(x=x, y=y, kind="hex", color="k")
plt.show()

Histogram

The main single univariate histograms characteristic data for analysis.

eg:
sns.set(style="darkgrid")
np.random.seed(sum(map(ord, "distributions")))
x = np.random.gamma(6, size=200)
sns.distplot(x, kde=False, fit=stats.gamma)
plt.show()

Comparison chart

Comparison of soil is mainly applied to observe the relationship between pairwise variables. Diagonal histogram (statistical number), the other is a scatter plot.

eg:采用的是鸢尾花的内部数据集
sns.set(color_codes=True)
iris = sns.load_dataset("iris")
sns.pairplot(iris)
plt.show()

Regression analysis chart

regplot()lmplot()都可以绘制回归关系,推荐regplot()

两者间主要的区别是:regplot接受各种格式的x y,包括numpy arrays ,pandas series 或者pandas Dataframe对象。相比之下,lmplot()只接受字符串对象。这种数据格式被称为’long-form’或者’tidy’。除了输入数据的便利性外,regplot()可以看做拥有lmplot()特征的一个子集。

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
 
import seaborn as sns
sns.set(color_codes=True)
np.random.seed(sum(map(ord, "regression")))

tips = sns.load_dataset("tips")
 
## 使用regplot绘制
sns.regplot(x="total_bill", y="tip", data=tips)
plt.show()

rebust回归图,需要添加参数忽略某个异常点

eg:
anscombe = sns.load_dataset("anscombe")
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'III'"),
           robust=True, ci=None, scatter_kws={"s": 80})
plt.show()

树形图

类似于散点图,用于显示每一个数据的分布情况

eg:
tips = sns.load_dataset("tips")
sns.swarmplot(x="day", y="total_bill",hue="sex",data=tips)
plt.show()

Guess you like

Origin www.cnblogs.com/cecilia-2019/p/11368248.html