Python multidimensional data visualization

The visualization of multi-dimensional (above 3 dimensions) data is not easy to achieve with conventional methods. This article introduces several methods of displaying multi-dimensional data in a two-dimensional plane using Python.

1. Data

Take the classic iris flower dataset as an example (original data download: CSDN or GitHub ).
The following are 5 pieces of formatted data, in order to facilitate subsequent visual display (format processing data set download: GitHub ).

Sepal Length Sepal Width Petal Length Petal Width Species
6.4 2.8 5.6 2.2 virginica
5 2.3 3.3 1 versicolor
4.9 2.5 4.5 1.7 virginica
4.9 3.1 1.5 0.1 silky
5.7 3.8 1.7 0.3 silky

The first 4 columns are the 4 characteristics of iris, and the last column is the 3 classifications of iris.

2. Data visualization

2.1 Parallel coordinates

Each vertical line in the figure represents a feature. The data in a row in the table is represented as a broken line in the figure, and the lines in different colors represent different categories.

import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import parallel_coordinates

data = pd.read_csv('D:\\iris.csv')

plt.figure('多维度-parallel_coordinates')
plt.title('parallel_coordinates')
parallel_coordinates(data, 'Species', color=['blue', 'green', 'red', 'yellow'])
plt.show()

Parallel coordinates

2.2 RadViz radar chart

The 4 features correspond to 4 points on the unit circle, and each scattered point in the circle represents a row of data in the table. It can be imagined that there are 4 lines on each scattered point connected to the 4 feature points, and the eigenvalue (normalized) represents the force exerted by the 4 lines on the scattered points, and the position of each point is exactly Make it balanced by force.

import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import radviz

data = pd.read_csv('D:\\iris.csv')

plt.figure('多维度-radviz')
plt.title('radviz')
radviz(data, 'Species', color=['blue', 'green', 'red', 'yellow'])
plt.show()

RadViz

2.3 Andrews curve

The eigenvalues ​​are converted into Fourier sequence coefficients, and the curves of different colors represent different categories.

import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import andrews_curves

data = pd.read_csv('D:\\iris.csv')

plt.figure('多维度-andrews_curves')
plt.title('andrews_curves')
andrews_curves(data, 'Species', color=['blue', 'green', 'red', 'yellow'])
plt.show()

Andrews curve

2.4 Matrix diagram

Represents the relationship between different features.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data = pd.read_csv('D:\\iris.csv')

sns.pairplot(data, hue='Species')
plt.show()

Matrix diagram

2.5 Correlation coefficient heat map

Indicates the correlation between different features (Pearson correlation coefficient). The larger the value, the higher the correlation.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data = pd.read_csv('D:\\iris.csv')

corr = data.corr()
sns.heatmap(corr, annot=True)
plt.show()

Correlation coefficient heat map

3. References

  1. Multi-dimensional data visualization method, just read this one
  2. Python data visualization, just read this one
  3. Python-based data visualization: from one-dimensional to multi-dimensional

Welcome to follow my WeChat public account:
Insert picture description here

Guess you like

Origin blog.csdn.net/michael_f2008/article/details/107494667