爬取WHO各国病例数据

还在为拿不到官方病例数据而发愁吗?

WHO各国病例数据如下:
https://experience.arcgis.com/experience/685d0ace521648f8a5beeeee1b9125cd

我们的目的就是爬出这个图中的数据:
在这里插入图片描述

审查元素

首先我们随便点开一个国家的疫情情况:

在这里插入图片描述

这里以中国为例,点开后找到URL:
https://services.arcgis.com/5T5nSi527N4F7luB/arcgis/rest/services/Historic_adm0_v3/FeatureServer/0/query?f=json&where=ADM0_NAME%3D%27CHINA%27&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=OBJECTID%2Ccum_conf%2CDateOfDataEntry&orderByFields=DateOfDataEntry%20asc&resultOffset=0&resultRecordCount=2000&cacheHint=true

Preview中可以看到:

在这里插入图片描述

就是我们想要的数据,但是他的时间格式我们没有见过,两两差分可以发现规律:

两个时期间相差864

上面是确证病例的URL,新增病例的如下:
https://services.arcgis.com/5T5nSi527N4F7luB/arcgis/rest/services/Historic_adm0_v3/FeatureServer/0/query?f=json&where=ADM0_NAME%3D%27CHINA%27&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=OBJECTID%2CNewCase%2CDateOfDataEntry&orderByFields=DateOfDataEntry%20asc&resultOffset=0&resultRecordCount=2000&cacheHint=true

以几个国家为例,代码如下(这里暂时写了名字是的单个单词的国家):

#coding:utf-8
import urllib.request
import os
import pandas as pd
import json

res = pd.DataFrame()
def Open(url):
    heads = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
    req = urllib.request.Request(url, headers=heads)
    response = urllib.request.urlopen(url)
    html = response.read()
    return html.decode('utf-8')

def conserve(html, name):
    global res
    time, confirm = [], []
    temp = pd.DataFrame(columns=['time', name])
    for i in html['features']:
        time.append(i['attributes']['DateOfDataEntry'])
        confirm.append(i['attributes']['cum_conf'])
    temp['time'] = time
    temp[name] = confirm
    temp = temp.set_index('time')
    res = pd.concat([res, temp], axis=1)


def main():
    global res
    for name in ['China', 'Italy', 'Spain', 'France', 'Germany', 'Switzerland', 'Netherlands', 'Norway', 'Belgium', 'Sweden', 'Australia', 'Brazil', 'Egypt']:
        print(name)
        url = 'https://services.arcgis.com/5T5nSi527N4F7luB/arcgis/rest/services/Historic_adm0_v3/FeatureServer/0/query?f=json&where=ADM0_NAME%3D%27' + name + '%27&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=OBJECTID%2Ccum_conf%2CDateOfDataEntry&orderByFields=DateOfDataEntry%20asc&resultOffset=0&resultRecordCount=2000&cacheHint=true'
        html = json.loads(Open(utl))
        conserve(html, name)
        print('--------------------------------------------------------------------------')

    #America 单独拿出来
    name = 'America'
    url = 'https://services.arcgis.com/5T5nSi527N4F7luB/arcgis/rest/services/Historic_adm0_v3/FeatureServer/0/query?f=json&where=ADM0_NAME%3D%27United%20States%20of%20America%27&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=OBJECTID%2Ccum_conf%2CDateOfDataEntry&orderByFields=DateOfDataEntry%20asc&resultOffset=0&resultRecordCount=2000&cacheHint=true'
    html = json.loads(Open(url))
    conserve(html, name)


    res['Datetime'] = pd.date_range(start='20200122', end='20200316')
    res.to_csv('conform.csv', encoding='utf_8_sig')
main()

经过简单的数据处理后的结果如下:

在这里插入图片描述

注意,如果res[‘Datetime’] = pd.date_range(start=‘20200122’, end=‘20200317’)这一行报错,原因是我在三月十七号写的,需要将20200317改成今天的日期

发布了154 篇原创文章 · 获赞 52 · 访问量 1万+

猜你喜欢

转载自blog.csdn.net/qq_44315987/article/details/104925840