Very informative graph-parallel coordinate graph

Author: Wu Zhengxiang 

Source: AI introductory learning

1. Graphic overview

Parallel coordinates are a common visualization method for the visualization of high-dimensional geometric multivariate data. In order to represent a point set in a high-dimensional space, under the background of N parallel lines, a point in the high-dimensional space is represented as a polyline with an inflection point on the N parallel coordinate axes, and the point on the Kth coordinate axis The position represents the value of this point in the Kth dimension.

Parallel coordinates are an important technology for information visualization. In order to overcome the problem that the traditional Cartesian rectangular coordinate system is easy to run out of space and difficult to express data above three dimensions, parallel coordinates represent each variable of high-dimensional data with a series of parallel coordinate axes, and the variable value corresponds to the position on the axis. In order to reflect the trend of change and the relationship between various variables, points describing different variables are often connected into a broken line. Therefore, the essence of the parallel coordinate map is to map a point Xi(xi1,xi2,...,xim) in the dimensional Euclidean space to a curve on the dimensional plane.

Parallel coordinate graphs can represent ultra-high-dimensional data. A significant advantage of parallel coordinates is that it has a good mathematical foundation, and its projective geometric interpretation and duality characteristics make it very suitable for visual data analysis. Let's take a look at specific application cases.

2. Case study

Millward Brown summarizes the most valuable brands in the world every year. Valerio Pellegrini changes according to the ranking of the top 100 brands from 2010 to 2015. The following figure is the result of visualization using a parallel coordinate graph. It can be seen from the figure, Google The rankings of IBM, IBM, Apple, and Microsoft are relatively stable, with little change, while companies in the middle and lower ranks fluctuate significantly every year, and there are new brands every year. Very clear realization of multi-sample, multi-dimensional comparative analysis.

100 MOST VALUABLE BRANDS 2010-15

The following parallel coordinate map also visualizes the ranking of global immigration destinations and origins from 1990 to 2013.

"Global Immigration Roadmap: The United States is the Preferred Destination for Immigrants" NetEase Digital Reading

The figure below shows the ranking changes in the per capita GDP of mainland provinces from 1978 to 2017. The amount of information contained in the figure is very large.

1) For 40 years, Beijing, Shanghai, and Tianjin have been occupying the top 3, but they have changed positions

2) Tianjin once occupied the top spot

3) Heilongjiang and Gansu drove high and low, like waterfalls

4) Fujian drove low and went high, rising rapidly. It is said that Fujianese can do business. This data shows that it is true

5) Guizhou has opened the limit of falling and got out of bottom in recent years, which is probably the reason for the development of big data in Guiyang

6) Hainan soared back down, almost back to the original point

It also contains more information, such as whether there are major supported provinces during the tenure of each big BOSS...

The figure below shows the changes in the overall GDP ranking of mainland provinces from 1978 to 2017. It also contains a lot of information. You can analyze it.

(1978-2017 GDP rankings of all provinces and regions in the country, excluding Hong Kong, Macau, and Taiwan, data source from the National Bureau of Statistics and statistical yearbooks of all provinces, mapping@张靖/星星研究)

In a parallel coordinate diagram, each variable has its own axis. All axes are placed parallel to each other. Each can have different scales and measurement units. A series of straight lines cross all axes to represent different values.

In addition, although the axis arrangement has no fixed order, because adjacent variables are easier to compare than non-adjacent variables, the order of the axis arrangement may affect readers' understanding of the data.

In the parallel coordinate graph, the units of each axis are generally different, so cross-axis data comparison is not possible. However, in the ranking of different years mentioned above, because it is a visualization of the same variable, cross-axis comparison can be performed. Therefore, when reading the graph, we must pay attention to the measurement unit of each axis.

Three, drawing guide

1. R language drawing

To be honest, this package drawing in R language is ugly. Do you have a better package recommendation? The above cases basically have traces of combining P pictures. The software that draws directly has not been found to be better.

#Installation and loading package

#install.packages('lattice')

library(lattice)

data(iris)

parallelplot (

   ~ iris[1:4],

   data = iris,

   groups = Species,

   horizontal.axis = FALSE,#Whether to display vertically

   scales = list(x = list(rot = 90))

  )

2. Online Echarts drawing

URL link: http://echarts.baidu.com/examples/

Change the code in the picture to complete the picture you want

 

Guess you like

Origin blog.csdn.net/yoggieCDA/article/details/109113804