Python crawler actual combat, requests+pyecharts module, Python realizes data visualization of new crown epidemic (with source code)

foreword

What I will introduce to you today is Python to crawl the data of the new crown epidemic and realize data visualization. Here, I will give the code to the friends who need it, and give some tips.

First of all, before crawling, you should pretend to be a browser as much as possible without being recognized as a crawler. The basic thing is to add a request header, but there will be many people crawling such plain text data, so we need to consider changing the proxy IP and random replacement The method of request header is used to crawl the recruitment website data.

Before writing crawler code every time, our first and most important step is to analyze our web pages.

Through analysis, we found that the speed of crawling is relatively slow during the crawling process, so we can also improve the crawling speed of crawlers by disabling Google browser images, JavaScript, etc.

development tools

Python version: 3.8

Related modules:

requests module

lxml module

openpyxl module

pandas module

pyecharts module

Environment build

Install Python and add it to the environment variable, and pip installs the required related modules.

Idea analysis

Open the page we want to crawl in the browser and
press F12 to enter the developer tool to see where the epidemic data we want is.
Here we need the page data.

source code structure

Code

Epidemic crawler.py

import requests
from lxml import etree
import json
import openpyxl

#通用爬虫
url = 'https://voice.baidu.com/act/newpneumonia/newpneumonia'
headers = {
    
    
    "User-Agent": "换成自己浏览器的"
 }
response = requests.get(url=url,headers=headers).text
 #在使用xpath的时候要用树形态
html = etree.HTML(response)
 #用xpath来获取我们之前找到的页面json数据  并打印看看
json_text = html.xpath('//script[@type="application/json"]/text()')
json_text = json_text[0]
print(json_text)


#用python本地自带的库转换一下json数据
result = json.loads(json_text)
print(result)
#通过打印出转换的对象我们可以看到我们要的数据都要key为component对应的值之下,所以现在我们将值拿出来
result = result["component"]
#再次打印看看结果
print(result)
#获取国内当前数据
result = result[0]['caseList']
print(result)


#创建工作簿
wb = openpyxl.Workbook()
#创建工作表
ws = wb.active
#设置表的标题
ws.title = "国内疫情"
#写入表头
ws.append(["省份","累计确诊","死亡","治愈"])
#获取各省份的数据并写入
for line in result:
     line_name = [line["area"],line["confirmed"],line["died"],line["crued"]]
     for ele in line_name:
         if ele == '':
             ele = 0
     ws.append(line_name)
 #保存到excel中
wb.save('./china.xls')

How to get User-Agent

insert image description here

Problems encountered Excel xlsx file; not supported solution

Reason: versions after xlrd1.2.0 do not support xlsx format, but support xls format

Method one:

Uninstall the new version pip uninstall xlrd

Install the old version: pip install xlrd=1.2.0 (or earlier)

Method Two:

Change the format of the excel version used by xlrd to xls (for insurance, save it as xls format)

Epidemic data results display

Epidemic data

Visualization.py

 #可视化部分
import pandas  as pd
from pyecharts.charts import Map,Page
from pyecharts import options as opts

#设置列对齐
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
#打开文件
df = pd.read_excel('china.xls')
#对省份进行统计
data2 = df['省份']
data2_list = list(data2)
data3 = df['累计确诊']
data3_list = list(data3)
data4 = df['死亡']
data4_list = list(data4)
data5 = df ['治愈']
data5_list = list(data5)

c = (
    Map()
       .add("治愈", [list(z) for z in zip(data2_list, data5_list)], "china")
      .set_global_opts(
         title_opts=opts.TitleOpts(),
         visualmap_opts=opts.VisualMapOpts(max_=200),
     )
)
c.render()

Cumulative = (
     Map()
     .add("累计确诊", [list(z) for z in zip(data2_list, data3_list)], "china")
        .set_global_opts(
       title_opts=opts.TitleOpts(),
        visualmap_opts=opts.VisualMapOpts(max_=200),
    )
)
 
death = (
    Map()
      .add("死亡", [list(z) for z in zip(data2_list, data4_list)], "china")
     .set_global_opts(
        title_opts=opts.TitleOpts(),
        visualmap_opts=opts.VisualMapOpts(max_=200),
     )
)
 
cure = (
     Map()
        .add("治愈", [list(z) for z in zip(data2_list, data5_list)], "china")
         .set_global_opts(
      title_opts=opts.TitleOpts(),
      visualmap_opts=opts.VisualMapOpts(max_=200),
    )
)
 
page = Page(layout=Page.DraggablePageLayout)
page.add(
          Cumulative,
          death,
          cure,
)
#先生成render.html文件
page.render()

Epidemic data data visualization

Epidemic data visualization

At last

In order to thank the readers, I would like to share with you some of my recent favorite programming dry goods, to give back to every reader, and hope to help you.

There are practical Python tutorials suitable for beginners~

Come and grow up with Xiaoyu!

① More than 100 PythonPDFs (mainstream and classic books should be available)

② Python standard library (the most complete Chinese version)

③ Source code of reptile projects (forty or fifty interesting and classic hand-practicing projects and source codes)

④ Videos on basics of Python, crawlers, web development, and big data analysis (suitable for beginners)

⑤ Python Learning Roadmap (Farewell to Influential Learning)

Guess you like

Origin blog.csdn.net/Modeler_xiaoyu/article/details/128256360