How to Use Statistical Charts Reasonably

Charts can help us extract information from a large amount of data, transform them into intuitive and interesting forms, reveal the general characteristics of data distribution, and are the simplest and most commonly used statistical analysis tools. The biggest advantage of charts is that they are intuitive and visual. In order to help guide readers to understand the data and avoid creating wrong impressions, the following points should be paid attention to when drawing charts:

  1. Correctly understand the purpose of different charts and choose charts reasonably;
  2. Realistically display the data characteristics and arrange the chart structure reasonably;
  3. Follow the chart making norms;

Next, we will introduce how to use bar charts, pie charts, ring charts, histograms, box plots, scatter plots, and radar charts.

bar chart

Bar chart The bar chart (bar chart) uses bars of the same width to represent the frequency of occurrence of different categories in the data. Categorical variables can be placed on either the horizontal axis or the vertical axis, and the corresponding other axis is frequency (or percentage). Apparently, the bar chart not only retains the specific numbers in the frequency distribution table, but also more intuitively shows the difference in the frequency of different categories. We can also draw the above two categorical variables in the same bar chart for cross-comparison according to the analysis needs, which is the compound bar chart. Usually, the bar chart is convenient for observing the absolute value of the frequency of different categories in a set of data, but if you want to reveal the structure of the frequency of each category in a set of data as a percentage of the total frequency, it is more suitable to draw a pie picture.

pie chart

A pie chart divides a circle into multiple sectors, and uses the area (that is, the angle) of each sector to represent the percentage of the frequency of different categories in the total frequency. From the pie chart, we can more intuitively examine the composition structure and relative frequency of different categories of individuals in a set of data (of course, the absolute frequency value of each category can also be marked in the figure), even if the amount of data changes, as long as the internal If the structure does not change, the proportion of the sector in the pie chart will remain the same. Analyze the preference structure of milk tea brands among consumers of different genders.

Donut chart

A donut chart is a graph formed by stacking two or more pie charts together and then "digging out" the middle part. In the ring diagram, each ring represents a different sample, and different components of the same sample (the same ring) are represented by different segments on the ring. If the researcher needs to compare the structure of multiple sample data at the same time, the results displayed by the ring chart will be more intuitive and concise.

Histogram

A histogram uses the width and height (ie area) of a rectangle to represent the frequency distribution of numerical data. The abscissa is the value of the numerical variable, the width of each rectangle corresponds to the interval of each group after the numerical data is grouped, and the ordinate can be frequency or percentage. Histograms and bar charts are similar in appearance and can easily cause confusion, but in fact they are completely different in nature and function. As mentioned earlier, bar charts are mainly used to describe the frequency distribution of categorical data, and rectangles are usually used to represent different categories, so their width has no practical significance. In bar charts, rectangles of different categories are often arranged separately. The histogram is mainly suitable for describing the frequency distribution of numerical data. The width of the rectangle represents the group distance of each group after grouping, which has actual numerical meaning. Therefore, in the histogram, each rectangle must be arranged continuously.

Box plot (box plot)

Box plot (box plot) is another common type of graphics used to display the distribution characteristics of ungrouped numerical data. The drawing steps are as follows. First, find the 3 quartiles of a set of data and draw the boxes. As the name implies, a set of data is sorted from small to large, and the three numbers located at 25%, 50% and 75% respectively divide the set of data into four equal parts, and these three numbers are called the lower quartile. The number, median and upper quartile are represented by Q25%, Q50% and Q75% respectively. The upper and lower quartiles form the boundaries of a closed box, the median is inside the box, and the length of the box is the difference between the upper and lower quartiles, called the quartile difference or quartile The distance, represented by IQR, represents the range in which the middle 50% of the data varies. Then, calculate the inner fence and adjacent values, and draw the whiskers. The inner fence is two values ​​that are lower than the lower quartile and higher than the upper quartile by 1.5 times the interquartile difference. Among them, Q25%-1.5×IQR is called the lower inner fence, and Q75%+1.5× IQR is called the upper inner fence. The inner fence is generally not displayed in the boxplot, but only used as a boundary to determine outliers. The maximum and minimum values ​​of the data between the upper and lower inner fences (that is, the maximum and minimum values ​​of non-outlier points) are called the upper and lower adjacent values, and the upper and lower adjacent values ​​are respectively connected to the box with a straight line. Whiskers are formed, representing the range of variation for all data except outliers. Finally, outliers are marked. Outliers are values ​​that are greater than the upper inner fence or smaller than the lower inner fence, and are usually marked with "0" in the figure.

Scatter diagram (scatter diagram)

Scatter diagram A scatter diagram is a commonly used graphic used to show the relationship between two numerical variables. If two sets of data of variable x corresponding to variable y are collected, and the abscissa and ordinate are used to represent the two variables respectively, then each pair of data (xi, yi) can be marked as a point in the two-dimensional coordinate system, and all data The graph formed by the points is called a scatterplot.

radar chart

The radar chart (radar chart) starts from one point and uses rays in different directions to represent different variables. The points where the values ​​of the variables of the same sample fall on the rays are connected into a "spider web". Multiple "spider webs" are formed. Therefore, radar charts are also called spider web charts or star charts, which can conveniently display multivariate data on a two-dimensional plane.

Guess you like

Origin blog.csdn.net/lenovo96166/article/details/118332957