Detailed explanation of pandas library scatter_matrix drawing visualization parameters

Using the scatter plot matrix plot, you can find the relationship between the features two by one

scatter_matrix(frame, alpha=0.5, c,figsize=None, ax=None, diagonal='hist', marker='.', density_kwds=None,hist_kwds=None, range_padding=0.05, **kwds)

1. frame, pandas dataframe object
2. alpha, the image transparency, generally takes (0,1]
3. figsize, the image size in inches, generally set to 4 in the form of a tuple (width, height).
ax, optional is generally none
5. diagonal, must and Only one can be selected from {'hist', 'kde'}, 'hist' means Histogram plot, 'kde' means Kernel Density Estimation; this parameter is the key parameter
6 of the scatter_matrix function .marker. Marker types available in Matplotlib, such as '.', ',', 'o', etc.
7. density_kwds. (other plotting keyword arguments, optional), kde-related dictionary arguments
8. hist_kwds. hist-related Dictionary parameter
9. range_padding. (float, optional), the padding of the image near the origin of the x-axis and y-axis, the larger the value, the greater the padding distance, and the image is far from the coordinate origin
10. kwds. and scatter_matrix Dictionary parameters related to the function itself
11. c. Color

Using the iris dataset in the python sklearn library,

import mglearn
import pandas as pd
from sklearn.datasets import load_iris
iris_dataset = load_iris()
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(iris_dataset['data'],iris_dataset['target'],random_state=0)
iris_dataframe=pd.DataFrame(X_train,columns=iris_dataset.feature_names)
grr = pd.plotting.scatter_matrix(iris_dataframe,marker='o',c = y_train,hist_kwds={'bins':20},cmap=mglearn.cm3)

The diagonal of the matrix is ​​the histogram of each feature. The color uses the label of the training set. It can be seen that it clearly separates the three categories.
write picture description here

Get to a new knowledge point
There are some commonly used functions in numpy to generate random numbers, randn() and rand() belong to them.
numpy.random.randn(d0, d1, …, dn) returns one or more sample values ​​from a standard normal distribution.
Random samples of numpy.random.rand(d0, d1, …, dn) are in [0, 1).

Refer to https://blog.csdn.net/hurry0808/article/details/78573585?locationNum=7&fps=1

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324685679&siteId=291194637