Initial data analysis of the COVID-19 outbreak in 2019-2020

1. Datasets and tasks

2020-cNoV (as of 2020.2.9)

This time we mainly analyze the relationship between the cumulative confirmed cases, discharged cases, deaths, and the provinces and cities where the cases are located.

2. Composition of the data set

The original main data of this data set include: province, city, time, cumulative number of confirmed cases, cumulative number of discharged cases, cumulative number of deaths, number of newly confirmed cases, number of newly discharged cases, and number of new deaths

3. Data set preprocessing and field calculation

Preprocessing: There are missing values ​​in the initial data, and the missing data are filtered out.

 4. Design and explanation of story points   

         This story point adopts the linkage of the dashboard, and there are linkage effects between the four worksheets. The provinces of worksheet one (top left: national map of confirmed cases and epidemic situation), and the worksheet two (bar chart of confirmed cases and epidemic levels in each province and city) , Worksheet 3 (Hubei - other provinces and cities - fitting trend chart of number of confirmed cases and time), Worksheet 4 (scatter chart of discharge rates in each province and city).

1. The epidemic situation in Hubei Province and its surrounding provinces is relatively serious

        A slider time filter is set up to observe the approximate changes in the number of confirmed cases in various provinces and cities over time. Use divergent red and blue colors, select reverse order so that red represents seriousness, and adjust the center to make the divergence more obvious.

        Because there are a large number of people in Hubei Province, Hubei Province is excluded and displayed separately in the lower right corner to better observe the epidemic situation in other cities.

 2. Wuhan ranks first in the number of confirmed cases and deaths

        Using word cloud diagrams and tree diagrams, we can more intuitively see that Wuhan City, Hubei Province has the largest number of confirmed cases and deaths, and the number of people is far more than other provinces. Wuhan, Hubei Province is the outbreak point of the epidemic in China.

3. The epidemic situation in other cities (except Hubei) has stabilized

        Use a line chart to divide Hubei, which has the largest number of confirmed cases, into one group, and other provinces and cities into one group, and add trend lines to observe and analyze the development of the epidemic.

        The cumulative number of confirmed cases in Hubei Province from January 24 to February 3 was below the trend line, that is, the situation has improved. However, after February 3, the number of confirmed cases changed from flat to upward; overall, on January 10 It slowly increased until January 24, and then the number of confirmed cases surged. That is, January 24 was the outbreak point.

        The situation in other provinces improved from January 20 to February 1, and showed an upward trend from February 1 to February 7. After that, it became relatively stable, and the overall number of confirmed cases increased slowly.

4. There is a certain correlation between the number of confirmed cases and the discharge rate and mortality rate.

        Analyzing the relationship between the discharge rate and death rate and the number of confirmed cases, it can be seen from the figure that the discharge rates are similar when the number of confirmed cases is between 0 and 1500, and the discharge rates between 1500 and 3000 are also similar (deaths The rate is the same), and the fitting degree is also very high. It can be inferred that there is a certain relationship between the number of confirmed cases, the mortality rate, and the discharge rate.

 

5. The number of confirmed cases is greater than 300, and the discharge rate decreases as the number of confirmed cases increases.

        Use a box-and-whisker plot to analyze the relationship between the discharge rate and the number of confirmed cases. It can be observed from the figure that there are many outliers in the discharge rate of <300, and there are great differences in the number of confirmed cases in cities with <100, with many outliers. , and in cities >300, the discharge rate has steadily increased

 

Guess you like

Origin blog.csdn.net/m0_51744718/article/details/132398588