Python crawler actual combat, requests+pyecharts module, Python realizes data visualization of new crown epidemic (with source code)


What I will introduce to you today is Python to crawl the data of the new crown epidemic and realize data visualization. Here, I will give the code to the friends who need it, and give some tips.

First of all, before crawling, you should pretend to be a browser as much as possible without being recognized as a crawler. The basic thing is to add a request header, but there will be many people crawling such plain text data, so we need to consider changing the proxy IP and random replacement The method of request header is used to crawl the recruitment website data.

Before writing crawler code every time, our first and most important step is to analyze our web pages.

Through analysis, we found that the speed of crawling is relatively slow during the crawling process, so we can also improve the crawling speed of crawlers by disabling Google browser images, JavaScript, etc.

development tools

Python version: 3.8

Related modules:

requests module

lxml module

openpyxl module

pandas module

pyecharts module

Environment build

Install Python and add it to the environment variable, and pip installs the required related modules.

Idea analysis

Open the page we want to crawl in the browser and
press F12 to enter the developer tool to see where the epidemic data we want is.
Here we need the page data.

source code structure



import requests
from lxml import etree
import json
import openpyxl

url = ''
headers = {
    "User-Agent": "换成自己浏览器的"
response = requests.get(url=url,headers=headers).text
html = etree.HTML(response)
 #用xpath来获取我们之前找到的页面json数据  并打印看看
json_text = html.xpath('//script[@type="application/json"]/text()')
json_text = json_text[0]

result = json.loads(json_text)
result = result["component"]
result = result[0]['caseList']

wb = openpyxl.Workbook()
ws =
ws.title = "国内疫情"
for line in result:
     line_name = [line["area"],line["confirmed"],line["died"],line["crued"]]
     for ele in line_name:
         if ele == '':
             ele = 0

How to get User-Agent

insert image description here

Problems encountered Excel xlsx file; not supported solution

Reason: versions after xlrd1.2.0 do not support xlsx format, but support xls format

Method one:

Uninstall the new version pip uninstall xlrd

Install the old version: pip install xlrd=1.2.0 (or earlier)

Method Two:

Change the format of the excel version used by xlrd to xls (for insurance, save it as xls format)

Epidemic data results display

Epidemic data

import pandas  as pd
from pyecharts.charts import Map,Page
from pyecharts import options as opts

pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
df = pd.read_excel('china.xls')
data2 = df['省份']
data2_list = list(data2)
data3 = df['累计确诊']
data3_list = list(data3)
data4 = df['死亡']
data4_list = list(data4)
data5 = df ['治愈']
data5_list = list(data5)

c = (
       .add("治愈", [list(z) for z in zip(data2_list, data5_list)], "china")

Cumulative = (
     .add("累计确诊", [list(z) for z in zip(data2_list, data3_list)], "china")
death = (
      .add("死亡", [list(z) for z in zip(data2_list, data4_list)], "china")
cure = (
        .add("治愈", [list(z) for z in zip(data2_list, data5_list)], "china")
page = Page(layout=Page.DraggablePageLayout)

Epidemic data data visualization

Epidemic data visualization

At last

In order to thank the readers, I would like to share with you some of my recent favorite programming dry goods, to give back to every reader, and hope to help you.

There are practical Python tutorials suitable for beginners~

Come and grow up with Xiaoyu!

① More than 100 PythonPDFs (mainstream and classic books should be available)

② Python standard library (the most complete Chinese version)

③ Source code of reptile projects (forty or fifty interesting and classic hand-practicing projects and source codes)

④ Videos on basics of Python, crawlers, web development, and big data analysis (suitable for beginners)

⑤ Python Learning Roadmap (Farewell to Influential Learning)

Guess you like