This article explains the data thoroughly (5): Data visualization (part 2)

I. Introduction

In the last few articles, we have learned that "data" is a huge system (as shown in the figure below), and used the example of the vegetable market to explain the meaning of data sources; use the example of buying vegetables to explain the data for everyone The steps of collection; using the example of washing and choosing dishes, I will explain the method of data cleaning...

Today, I will mainly explain to you how to teach cooking methods to others in a simple and easy-to-understand manner after we learn how to cook, that is , the process of data visualization .

Insert picture description here

2. Principles for the selection of visual charts (refer to McKinsey Series Bibliography-"Speaking with Charts")

First, select the principle of dry goods on the chart (see the figure below), and then we will explain the use of each form one by one!

Insert picture description here

Picture source-public account [high-end business report]

1. into categories

(1) Cakes

Under normal circumstances, we will choose a pie chart to show the comparison of ingredients. And the use of pie chart also has the following "hidden rules"

a. Follow the reading habit, the importance of pie chart ingredients needs to be in clockwise order

When reading a pie chart, people will subconsciously read it clockwise from 12 o'clock. Therefore, in order to cater to people's reading habits, we need to display the proportion and importance of the data in a clockwise direction.

Insert picture description here

b. The number of slices should not exceed 7. If the pie chart category is very detailed, it is very difficult for the audience to compare. Therefore, when you find that the number of slices will exceed 6, integrate the remaining less important branches as "other" items.

Insert picture description here

c. Try to avoid using three-dimensional pie charts unless you are really confident to avoid visual illusions.

As shown in the figure below, looking at this three-dimensional pie chart from the perspective of the audience, is it difficult to distinguish between the two smallest sectors? And if we look at this pie chart from a "overlook" perspective, it will be easier to distinguish a lot!

Insert picture description here

2. Item category comparison

First of all, we need to clarify what is the meaning of item category comparison, and what is the contrast dimension it refers to? Generally speaking, common dimension comparisons are nothing more than data size and proportion. In terms of chart types, the most commonly used chart type for item category comparison is bar chart. Let's talk about the principles of bar chart usage.

(1) Comply with the default sorting attempt (from big to small, from high to low, from good to bad), unless you need to follow other logical sorting

Generally speaking, when people look at bar graphs, what they want to get is the head item category comparison information, and no one pays special attention to the proportion and comparison of the tail item category. So in order to cater to people's reading habits, we need to implement drawing according to the default bar graph sorting from big to small, from high to low, and from good to bad.
Insert picture description here

Of course, it is also advisable to follow other specific logical ordering. For example, when a company compares the market share of competitors in the industry, it deliberately sorts the bar graph according to the length of time it has entered the market. As shown below, the bar graph is displayed according to the brand logic.

Insert picture description here

(2) Bar graph interval <bar graph itself width

The classification distance of the McKinsey bar chart is generally set to 20%-50% of its own width; while the Economist's classification distance is 50%-80%, and the reports of large companies such as cleaning also show a range of 10%-30% . As shown below, it is a scatter chart and bar chart in an article by The Economist.

Insert picture description here

Although the spacing is different, it is not difficult to find that there is almost no classification spacing> bar width itself in the industry.

3. Time series, frequency distribution

Time series, as the name implies, is the change of data categories and variables under a timeline; and the frequency distribution may be a little difficult to understand, but I believe that you will be familiar with the following example!

I believe everyone has seen this function when they were in "Probability Theory" in college! Probability density function, yes, it represents the frequency distribution!

Insert picture description here

When mapping time series and frequency distribution, two types of column chart and line chart are usually selected, and how should the selection between these two types of values ​​be made?

In McKinsey's "Speaking with Charts" book, it is recommended to distinguish between 8 time points. When the time point is> 8, a line chart is used to reduce visual contrast fatigue. When the time point is <8, a bar chart is selected. The following are the rules for using two types of charts: histogram and line chart.

Secondly, the choice of histogram and line chart is also related to data characteristics. When you want to highlight the time series changes of data indicators such as output and sales in a specific time period, it is more appropriate to use a histogram because it can also highlight the degree and quantity; and The line chart pays more attention to changes and trends, and is more commonly used in performance data development trends and time series forecasting analysis.

(1) Column chart

a. Column chart interval <column chart itself width

This point is actually similar to a bar chart, so I won’t repeat it here.

b. Stacked histograms are used appropriately to avoid confusion of contrasts

Stacked histogram is suitable for comparing the component relationship of multiple items and categories, and is an important way to express the component relationship.

But you need to pay special attention, try not to compare each category item to more than 3 subdivision components in the percentage stacking chart, otherwise it will cause confusion in contrast. As shown below, the stacked histogram has more than 3 categories, and the comparison will be very confusing.
Insert picture description here

(2) Line chart

a. The trend line should be prominent (color, thickness, etc.)

The line chart is the chart that can visually display the rising, falling, fluctuating, and maintaining the original trend among the above charts. It focuses on the outline of changes and trends, and is used to express the development trend of the data.

However, as the main body of the line chart, the trend line must be prominent. And this highlight can be the highlight of the color, the use of red and other key colors; it can also be the highlight of the thickness, for example, we must notice that the trend line of the line chart is thicker than the grid line and the tick mark!

Insert picture description here

4. Relevance

In correlation analysis, we often use scatter plots to visually display the correlation strength and correlation direction between variables, and for exploratory data analysis, make fitting curves and regression equations, and make exploratory predictions for future trends.

(1) Scatter point, bubble chart

a. Scatter points and bubble size need to be intuitive

Humans are visual animals, and the chart needs to be done to make the best use of human vision! In scatter charts and bubble charts, each scatter point and bubble represents an individual sample, and the size of the scatter point and bubble can only show the size of the sample. Therefore, we need to make full use of this when making charts, and use scatter points and bubble sizes to intuitively show the relationship between samples.
Insert picture description here

3. Conclusion

The above is the whole content of the selection principle of visual chart in this issue! Our [This article explains the data thoroughly] series is over here! Sprinkle flowers! !

Next, what articles do you want to read?

Guess you like

Origin blog.csdn.net/amumuum/article/details/113242817