Summary of scientific research statistics

1. Common statistical charts

The applicable conditions and examples of commonly used statistical charts are as follows:

For example: scatterplot is suitable for displaying the relationship between two variables; histogram is suitable for displaying the distribution of continuous variables, judging whether it conforms to normal distribution, etc.; boxplot showing the median and quartile of a set of data Information such as the number of digits, the maximum value, and the minimum value can be used to analyze the data distribution or whether there are outliers, etc.

2. Drawing ideas

First of all, distinguish the data type: drawing a picture usually reflects the relationship between X and Y. Then first you need to know what the data types of X and Y are respectively. The following summarizes a table description as follows:

According to the thinking of X and Y, first identify the respective data types of X and Y, and then find a suitable statistical graph to draw.
Example: For example, if you want to analyze the difference in rice yield under different varieties and different fertilization methods, you hope to display the difference intuitively through graphics. It is obvious that the two Xs here are qualitative data, while the rice yield is quantitative data, so the "cluster diagram" can be used for visual analysis.
Upload the data to the SPSSAU system, select "Cluster Chart", drag the variable to the corresponding analysis box on the right, and operate as shown in the figure below:

The result of the cluster diagram is as follows:
 

Clustered Line Chart

Clustered Column Chart

Clustered Bar Chart


SPSSAU currently provides a total of 30 types of graphics, which are automatically generated in various methods. Of course, there are some complex or special graphics that need to be drawn by yourself.

3. SPSSAU automatically plots

When using SPSSAU for data analysis, when selecting the corresponding method to analyze and get the analysis results, SPSSAU will automatically output the corresponding statistical graph analysis results by default.
Example 1: When classifying data for frequency analysis , SPSSAU will output the corresponding pie chart, donut chart, histogram, and bar chart by default. If you need to switch between different statistical graphs, click the upper right corner of the graph to select, as shown below:


Example 2: When using the independent sample t-test to study the difference in the drop in fasting blood glucose between different groups, SPSSAU will output the corresponding histogram, bar graph, and line graph by default. As shown below:

Example 3: When using the chi-square test to study the difference in the treatment effect of different therapies, SPSSAU will output the corresponding stacked column graph, stacked bar graph, etc. by default, as shown in the following figure:

Tips: SPSSAU visualization combines the idea of ​​data analysis methods, and provides accurate visual graphic display by default, which is an integral part of SPSSAU intelligent system. Usually the first statistical graph output automatically is the optimal one, that is, the most suitable one.

Fourth, the detailed description of the statistical chart

1. Scatter plot

The scatter plot is used to investigate the relationship between quantitative data, that is, to view the relationship between X and Y. Scatterplots are often used in exploratory research to visually show the relationship between data.

  • Use scenario
    (1) Before correlation analysis, check the relationship between X and Y.
    (2) Model test after regression analysis, check the correlation between the residuals and the independent variables [the regression model assumes that the residuals should not be correlated with the independent variables, that is, heteroscedasticity].
    (3) Other scenarios for visually displaying data relationships.
  • Scatterplot example

2. Histogram


The histogram is used to visually display the data distribution, observe the normal distribution characteristics of the data , and test whether the data satisfies normality.

  • scenes to be used

(1) During correlation analysis, check the normality of the data; (2) After the regression analysis, the model is checked, and the residual is tested for normality. If the residual shows normality, it means that the model is well constructed, otherwise it means that the model is constructed poor. (3) Other scenarios for viewing data distribution, testing normality, etc.

  • Histogram display

3. Box plot

The box plot (also known as box plot, box plot, etc.) was invented by American statistician John Tukey in 1977, and the analysis data needs to be quantitative data. Through the box plot, you can intuitively explore the characteristics of the data .

  • scenes to be used

(1) Check possible outlier data; (2) Check the data distribution of Y when checking different categories of X during non-parametric testing; (3) Others involve checking data distribution or checking outliers.

  • Histogram display

4. Word Cloud Map

The word cloud map was proposed by Rich Gordon, a professor of journalism at Northwestern University in the United States. "Word cloud" can visually highlight the "keywords" that appear frequently in text , forming a "keyword cloud" or "keyword rendering". So that the viewer can understand the gist of the text as long as they glance at the text.

  • scenes to be used

(1) Intuitively display text information and highlight key information;

(2) Word cloud display for weighted data information.

  • Word cloud display

5. Error bar graph

Error bar plots are used to illustrate the degree of uncertainty in the data, showing the potential error or degree of uncertainty for each data marker . The fluctuation of the sample data is reflected by the standard deviation, so the degree of uncertainty of the sample mean is the standard deviation.

  • Error bar graph display

6. PP map/QQ map

PP diagram and QQ diagram are often used to visually check whether the data is normally distributed . The purposes of the PP chart and the QQ chart are basically the same, but there are differences in principle.

  • scenes to be used

(1) Judgment of whether the Y value corresponding to variance analysis has normality characteristics; (2) Before regression analysis, whether the corresponding Y value has normality characteristics; after regression analysis, use PP diagram and QQ diagram to check the residual Whether the value has normality; (3) Whether the residual value after binary Logit regression has normality. (4) Other scenarios used to visually demonstrate the normality of the data.

  • PP chart/QQ chart display

7. ROC curve

The ROC curve, also known as the receiver operating characteristic curve; the ROC curve was originally used in the military, and is currently widely used in the medical field to study the prediction accuracy of X for Y.

  • ROC curve display

8. Quadrant diagram

Quadrant charts provide a visual representation of data categorization ; horizontal and vertical dividing lines are used to divide the chart area into four quadrants, and each quadrant presents corresponding data. In general, the purpose of quadrant diagram presentation is to directly show the data division area .

  • Quadrant display

9. Pareto chart

The Pareto chart is a graphical embodiment of the "28th principle". 80% of the problems are caused by 20% of the reasons; usually, the Pareto chart can be used to show the proportion of a certain "problem" . the most important reason.

  • Pareto chart display

10. Cluster diagram

From the types of graphs displayed, clustered graphs can be divided into clustered line graphs, clustered column graphs, and clustered bar graphs, etc. SPSSAU provides them by default; SPSSAU provides a total of mean, count, sum, and median Four summary values. In the cluster diagram, if the classification data is 0, 1 or 2, SPSSAU provides corresponding graphic display.

  • Cluster diagram display

11. Combination diagram

The combination chart is used for graphical display when there is a large difference between the two types of dimension values . It has two coordinate axes, the primary axis and the secondary axis. Usually, the primary axis is the item with a large quantity, and the secondary axis is the item with a relatively large quantity. big item. For example, to display GDP and GDP growth rate, the main axis is GDP, and the secondary axis is GDP growth rate.

  • Combination chart display

12. Bubble chart

Bubble charts can be used to show the relationship between X and Y, and use Z to identify the size of the bubble at each point . At the same time, SPSSAU supports displaying 'labels' directly in the bubbles, and different colors can be used to identify the bubbles.
 

  • Bubble chart display

13. Kernel density map

The kernel density map is a non-parametric test method, which is a further abstraction of the histogram, but it is more intuitive , and its area under the curve is 1. It is usually used for the visual display of continuous data, such as the distribution of age, height distribution etc.

  • Kernel Density Map Display

14. Violin diagram

The violin is a collection of box plots and kernel density plots , which can display the various percentile points of the data through box thinking. At the same time, the kernel density plot can also be used to display the "contour" effect of the data distribution, "contour" The larger the value, the more concentrated the data is at that place, and vice versa, the less data there is.

  • Violin diagram display

Guess you like

Origin blog.csdn.net/m0_37228052/article/details/132089963