How to draw a bivariate distribution graph in Seaborn?

Visualization of bivariate distributions of two variables is also useful. The easiest way to do this in Seaborn is to use the jointplot() function, which can create a multi-panel graph, such as a scatterplot, a 2D histogram, a kernel density estimate, etc., to show the bivariate relationship between two variables and Univariate distribution of each variable on a single axis.

The syntax of the jointplot() function is as follows.

seaborn.jointplot(x, y, data=None, 
                  kind='scatter', stat_func=None, color=None, 
                  ratio=5, space=0.2, dropna=True)

The meanings of commonly used parameters in the above functions are as follows:

(1) kind: Indicates the type of drawing graphics.

(2) stat_func: Used to calculate statistics about relationships and label graphs.

(3) color: Indicates the color of the drawing element.

(4) size: used to set the size of the graph (square).

(5) ratio: Indicates the ratio of the center image to the side image. The larger the value of this parameter, the larger the proportion of the center map will be.

(6) space: used to set the interval between the center image and the side image.

The following takes scatter plots, two-dimensional histograms, and kernel density estimation curves as examples to introduce how to use Seaborn to draw these graphics.

1. Draw a scatterplot

An example of calling the seaborn.jointplot() function to draw a scatterplot is as follows.

import numpy as np
import pandas as pd
import结果如下图所示。 seaborn as sns
# 创建DataFrame对象
dataframe_obj = pd.DataFrame({
    
    "x": np.random.randn(500),"y": np.random.randn(500)})
# 绘制散布图
sns.jointplot(x="x", y="y", data=dataframe_obj)

In the above example, a DataFrame object dataframe_obj is first created as the data of the scatter plot, where the data of the x-axis and y-axis are both 500 random numbers, and then the jointplot0 function is called to draw a scatter plot, the name of the x-axis of the scatter plot as "x" and the name of the y-axis as "y".

The running result is shown in the figure.

1678074910574_72.png

2. Draw a two-dimensional histogram

A 2-D histogram is similar to a "hexagonal" plot, primarily in that it shows counts of observations that fall within the hexagonal area, and is suitable for larger data sets. When calling the jointplot() function, only

To pass in kind="hex", you can draw a two-dimensional histogram. The specific sample code is as follows.

# 绘制二维直方图 
sns.jointplot(x="x", y="y", data=dataframe_obj, kind="hex")

The running result is shown in the figure.

1678075126511_73.png

From the depth of the color of the hexagon, you can observe the degree of data density. In addition, the histogram is still given on the top and right of the graph. Note that when plotting a 2D histogram, it is best to use a white background.

3. Draw kernel density estimation graphics

Bivariate distributions can also be viewed using kernel density estimation, which is represented by a contour plot. When calling the jointplot() function, as long as ind="kde" is passed in, the kernel density estimation graph can be drawn. The specific sample code is as follows.

sns.jointplot(x="x", y="y", data=dataframe_obj, kind="kde")

In the example above, a contour plot of the kernel density is plotted, and additionally, plots of the kernel density are given above and to the right of the graph. The running result is shown in the figure.

1678080896899_picture 1.png

By observing the color depth of the contour line, you can see which range has the most numerical distribution and which range has the least numerical distribution.

4. Plotting Paired Bivariate Distributions

To plot multiple pairs of bivariate distributions in a dataset, you can use the pairplot() function, which creates a matrix of axes and displays the relationship for each pair of variables in the Dataframe object. In addition, the pairplot() function can also plot the univariate distribution of each variable on the diagonal axis.

Next, use the sns.pairplot() function to draw a graph of the relationship between variables in the data set. The sample code is as follows:

#加载seaborn中的数据集
dataset=sns.load_dataset("iris")

dataset.head()

1678081126812_picture 2.png

In the above example, the built-in data set in seaborn is loaded through the load_dataset0 function, and multiple bivariate distributions are drawn according to the iris data set.

#绘制多个成对的双变量分布
sns.pairplot(dataset)

The result is shown in the figure below.

1678081254524_picture 3.png

Guess you like

Origin blog.csdn.net/cz_00001/article/details/131923076