Python data analysis and visualization 2021 01 09

Python data analysis and visualization examples

1. Data source

There are generally two ways to source data, one is the data file crawled by crawlers, and the other is the original data set.

1. Crawler gets data

1. The first job the crawler needs to do is to get the webpage, here is to get the source code of the webpage, you can use the developer tools to get the header information needed to get the webpage, and get the encoding format by viewing the source code. The source code contains some useful information of the webpage, so as long as you get the source code, you can extract the information you want from it. I talked about the concept of request and response. Send a request to the server of the website, and the response body returned is the source code of the web page. Therefore, the key part is to construct a request and send it to the server, and then receive the response and parse it out. Python provides many libraries to help us achieve this operation, such as urllib, requests, etc. We can use these libraries to help us implement HTTP request operations. Both requests and responses can be represented by the data structure provided by the class library. After getting the response, we only need to parse the Body part of the data structure to get the source code of the web page. In this way, we can use the program to achieve the process of obtaining web pages.

2. After obtaining the web page source code, the next step is to analyze the web page source code and extract the data we want from it. First of all, the most common method is to use regular expression extraction, which is a versatile method, but it is more complicated and error-prone when constructing regular expressions. Extracting information is a very important part of the crawler, it can make the messy data organized and clear, so that we can process and analyze the data later.

3. After extracting the information, we generally save the extracted data somewhere for subsequent use. There are many ways of saving here, such as simple saving as TXT text or JSON text, or saving to databases such as MySQL and MongoDB, etc., or saving to remote servers, such as operating with SFTP.

4. The automated program means that crawlers can replace people to complete these operations. First of all, we can of course extract this information manually, but if the equivalent is particularly large or if you want to quickly obtain a large amount of data, you must use the program. Crawler is an automated program that replaces us to complete this crawling work. It can perform various exception handling and error retry operations during the crawling process to ensure that crawling continues to run efficiently.

2. Original data

Usually the original data is saved in a csv file. The example used here is a csv file, as shown in the figure below.
Insert picture description here
This is the epidemic data from 2020 01 22 to 2020 02 04. Confirmed is the number of confirmed cases that day, Deaths is the number of deaths that day, and Recovered is the day's recovery. Number of people.

2. Data analysis and visualization
1. Obtain a histogram of two days for comparison

First, you need to read the file to take out the epidemic data in Mainland China and save it to data, then take out the epidemic data of a certain day and save it to data1 to remove unnecessary data and facilitate visualization. The source code is as follows: the
Insert picture description here
result of the operation is as shown in the figure below: Take the data of 2020 01 23.
Insert picture description here
Use the same method to take out the data of 2020 02 01.
Insert picture description here
You can roughly see that the histogram does not change significantly, but the number of people who see the ordinate is 400 and 8000. Nearly 20 times, it shows that the virus spread very quickly and urgently needs prevention. Fortunately, the blockade started at this time to prevent the spread of the virus. Because there is still an incubation period, it will grow rapidly for more than ten days.

2. Obtain a graph of the provinces with the most severe epidemics

Then write the following source code above, find the data of Hubei Province, and show the line chart of the number of epidemics according to the timetable.
Insert picture description here

The results of the operation are as follows. It
Insert picture description here
can be seen that the number of confirmed diagnoses has increased almost exponentially over time. To deal with this kind of epidemic, we should move less outside, refrain from gathering activities, and obey relevant policy arrangements.

3. Get a pie chart of the beginning and the sum of the data for analysis

First get the total number of people diagnosed during this period. The source code is as follows: The
Insert picture description here
result of the operation is
Insert picture description here
and the data at the beginning shows
Insert picture description here
that the ratio is not much different. There are fewer cases in other cities, which may be a little negligent, and the proportion has become a bit larger. In general The defense measures are quite good.
Now the epidemic seems to be showing signs of a comeback, and now everyone is still at home and walking as little as possible for a good year.

Guess you like

Origin blog.csdn.net/zzl_101/article/details/112399499
Recommended