Multidimensional data visualization
The visualization of multi-dimensional (above 3 dimensions) data is not easy to achieve with conventional methods. This article introduces several methods of displaying multi-dimensional data in a two-dimensional plane using Python.
1. Data
Take the classic iris flower dataset as an example (original data download: CSDN or GitHub ).
The following are 5 pieces of formatted data, in order to facilitate subsequent visual display (format processing data set download: GitHub ).
Sepal Length | Sepal Width | Petal Length | Petal Width | Species |
---|---|---|---|---|
6.4 | 2.8 | 5.6 | 2.2 | virginica |
5 | 2.3 | 3.3 | 1 | versicolor |
4.9 | 2.5 | 4.5 | 1.7 | virginica |
4.9 | 3.1 | 1.5 | 0.1 | silky |
5.7 | 3.8 | 1.7 | 0.3 | silky |
The first 4 columns are the 4 characteristics of iris, and the last column is the 3 classifications of iris.
2. Data visualization
2.1 Parallel coordinates
Each vertical line in the figure represents a feature. The data in a row in the table is represented as a broken line in the figure, and the lines in different colors represent different categories.
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import parallel_coordinates
data = pd.read_csv('D:\\iris.csv')
plt.figure('多维度-parallel_coordinates')
plt.title('parallel_coordinates')
parallel_coordinates(data, 'Species', color=['blue', 'green', 'red', 'yellow'])
plt.show()
2.2 RadViz radar chart
The 4 features correspond to 4 points on the unit circle, and each scattered point in the circle represents a row of data in the table. It can be imagined that there are 4 lines on each scattered point connected to the 4 feature points, and the eigenvalue (normalized) represents the force exerted by the 4 lines on the scattered points, and the position of each point is exactly Make it balanced by force.
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import radviz
data = pd.read_csv('D:\\iris.csv')
plt.figure('多维度-radviz')
plt.title('radviz')
radviz(data, 'Species', color=['blue', 'green', 'red', 'yellow'])
plt.show()
2.3 Andrews curve
The eigenvalues are converted into Fourier sequence coefficients, and the curves of different colors represent different categories.
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import andrews_curves
data = pd.read_csv('D:\\iris.csv')
plt.figure('多维度-andrews_curves')
plt.title('andrews_curves')
andrews_curves(data, 'Species', color=['blue', 'green', 'red', 'yellow'])
plt.show()
2.4 Matrix diagram
Represents the relationship between different features.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.read_csv('D:\\iris.csv')
sns.pairplot(data, hue='Species')
plt.show()
2.5 Correlation coefficient heat map
Indicates the correlation between different features (Pearson correlation coefficient). The larger the value, the higher the correlation.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.read_csv('D:\\iris.csv')
corr = data.corr()
sns.heatmap(corr, annot=True)
plt.show()
3. References
- Multi-dimensional data visualization method, just read this one
- Python data visualization, just read this one
- Python-based data visualization: from one-dimensional to multi-dimensional
Welcome to follow my WeChat public account: