python crawling automatically required new crown pneumonia epidemic and sent to the mailbox (no selenium) daily

python crawling automatically required new crown pneumonia epidemic and sent to the mailbox every day

I. Introduction

This winter is really long and tiring, I probably have not been out for two weeks. To give herself something to do, reptiles play broke two days, there is no strict local hope advise.

II Project

The project is mainly want a reptile, it can be timed to climb every morning to take the epidemic data of the day, and then send the data to my mailbox so that they can wake up early in the morning to learn about the latest situation in real time. For simplicity's sake, I just crawling Wuhan, Haikou, Nanjing, Jiangsu provinces and cities, the epidemic is a major battleground, one is my current place of residence, a location of my school. Of course, the principle is the same, we want to climb can take elsewhere.

Third, the principle

The entire project is divided into three parts, the first part is the part reptile, part of the message is the second part, the third part is the server part of the cloud, then I will introduce one by one.

1, reptiles

Here I use the library as follows:

import urllib
from bs4 import BeautifulSoup
import json

urllib is used to obtain the source code of the site, bs4 is used to parse html tags, resulting string after parsing, so they took to be converted into json list.

I use the data source is lilac garden , the relevant data is displayed on the website:
Here Insert Picture Description
Let's look at the source code of the relevant part:

Here Insert Picture Description
You can see data in various regions was on the script tag id is "getAreaStat" of where you can easily locate by findAll method.

html = urllib.urlopen("https://ncov.dxy.cn/ncovh5/view/pneumonia?scene=2&clicktime=1579582238&enterid=1579582238&from=singlemessage&isappinstalled=0")
soup = BeautifulSoup(html.read())

block = soup.findAll("script",{"id":"getAreaStat"})
block0 = block[0]
blo_text = block0.get_text()
goal_text = blo_text[27:-11]

Here is bs4.element.ResultSet returned block type, block0 is bs4.element.Tag type of, what we need is a second (here I am learning bs4 is not deep enough not to speak too careful). Then extract its text has been part of that is a string that is where we want to list for each province data, but some of us do not have the head and tail of the string, to have it pinched off, you can get goal_text.

Json string will then be converted into a list of types.

goal_list = json.loads(goal_text)

This list has 34 elements, corresponding to the 34 provinces and municipalities special administrative regions, each element is a dictionary, including the name of a province, confirmed the number, city and so on, the index value is a list of the city, corresponding to a different city, inside is a list of dictionary ...

In short, to sort out its data structures we can find cities by traversing the data we want.

for i in range(len(goal_list)):#这里用的是中文的unicode编码
    if(goal_list[i]['provinceShortName']==u'\u6e56\u5317'):
        hubei_num = i #湖北
    if(goal_list[i]['provinceShortName']==u'\u6d77\u5357'):
        hainan_num = i #海南
    if(goal_list[i]['provinceShortName']==u'\u6c5f\u82cf'):
        jiangsu_num = i #江苏

#三个省的确诊人数
hubei_confirmedCount = goal_list[hubei_num]['confirmedCount']
hainan_confirmedCount = goal_list[hainan_num]['confirmedCount']
jiangsu_confirmedCount = goal_list[jiangsu_num]['confirmedCount']

#武汉
for i in range(len(goal_list[hubei_num]['cities'])):
    if(goal_list[hubei_num]['cities'][i]['cityName']==u'\u6b66\u6c49'):
        wuhan_num = i
wuhan_confirmedCount = goal_list[hubei_num]['cities'][wuhan_num]['confirmedCount']

#海口
for i in range(len(goal_list[hainan_num]['cities'])):
    if(goal_list[hainan_num]['cities'][i]['cityName']==u'\u6d77\u53e3'):
        haikou_num = i
haikou_confirmedCount = goal_list[hainan_num]['cities'][haikou_num]['confirmedCount']

#南京
for i in range(len(goal_list[jiangsu_num]['cities'])):
    if(goal_list[jiangsu_num]['cities'][i]['cityName']==u'\u5357\u4eac'):
        nanjing_num = i
nanjing_confirmedCount = goal_list[jiangsu_num]['cities'][nanjing_num]['confirmedCount']


2, e-mail

Mail code comparison routine, here a brief introduction.

Used on two libraries (smtplib, email), it is responsible for a transport protocol, sending mail, is used to write a message.

import smtplib
from email.mime.text import MIMEText
from email.header import Header

Write your message content, the content of the message that I write here is html format, in order to look better.

message = MIMEText('<html><body><h1>今日疫情</h1>' +
                   '<p><b>湖北确诊人数:'+str(hubei_confirmedCount) + '</b>' +
                   '<br>&nbsp;&nbsp;&nbsp;&nbsp;武汉确诊人数:'+str(wuhan_confirmedCount) + '</p>' +
                   '<p><b>海南确诊人数:'+str(hainan_confirmedCount) + '</b>' +
                   '<br>&nbsp;&nbsp;&nbsp;&nbsp;海口确诊人数:'+str(haikou_confirmedCount) + '</p>' +
                   '<p><b>江苏确诊人数:'+str(jiangsu_confirmedCount) + '</b>' +
                   '<br>&nbsp;&nbsp;&nbsp;&nbsp;南京确诊人数:'+str(nanjing_confirmedCount) + '</p>' +
                   '</body></html>','html','utf-8')

Then send the message tetralogy.

server = smtplib.SMTP()
server.connect('smtp.qq.com',587) #smtp服务器和端口,端口是固定的
server.login(from_addr,'xxx') #第二个参数是邮箱授权码,需要自己去邮箱里找
server.sendmail(from_addr,to_addr,message.as_string()) #发送邮箱地址和接收邮箱地址都是我的地址

See other minutiae Code Part IV of the complete code.

3, the cloud server

In order for the program to run automatically on time every day, you need to hang on to the cloud server, here I use the student Ali cloud cloud server, cheap is good enough.

Timing run the program as follows:
Control Panel → System and Security → Administrative Tools → Scheduled Tasks
Here Insert Picture Description
→ Task Scheduler (local) → Task Scheduler Library → create a task
Here Insert Picture Description
and then write the name of the task, the trigger conditions and to perform operations on the line. Wherein performing an operation so I can refer to fill:
Here Insert Picture Description

Fourth, the complete code

-*- coding: UTF-8 -*-

import urllib
from bs4 import BeautifulSoup
import json
import smtplib
from email.mime.text import MIMEText
from email.header import Header


html = urllib.urlopen("https://ncov.dxy.cn/ncovh5/view/pneumonia?scene=2&clicktime=1579582238&enterid=1579582238&from=singlemessage&isappinstalled=0")
soup = BeautifulSoup(html.read())

block = soup.findAll("script",{"id":"getAreaStat"})
block0 = block[0]
blo_text = block0.get_text()
goal_text = blo_text[27:-11]

#goal_list = eval(goal_text.decode())
goal_list = json.loads(goal_text)

for i in range(len(goal_list)): #这里用的是中文的unicode编码
    if(goal_list[i]['provinceShortName']==u'\u6e56\u5317'):
        hubei_num = i #湖北
    if(goal_list[i]['provinceShortName']==u'\u6d77\u5357'):
        hainan_num = i #海南
    if(goal_list[i]['provinceShortName']==u'\u6c5f\u82cf'):
        jiangsu_num = i #江苏

#三个省的确诊人数
hubei_confirmedCount = goal_list[hubei_num]['confirmedCount']
hainan_confirmedCount = goal_list[hainan_num]['confirmedCount']
jiangsu_confirmedCount = goal_list[jiangsu_num]['confirmedCount']

#武汉
for i in range(len(goal_list[hubei_num]['cities'])):
    if(goal_list[hubei_num]['cities'][i]['cityName']==u'\u6b66\u6c49'):
        wuhan_num = i
wuhan_confirmedCount = goal_list[hubei_num]['cities'][wuhan_num]['confirmedCount']

#海口
for i in range(len(goal_list[hainan_num]['cities'])):
    if(goal_list[hainan_num]['cities'][i]['cityName']==u'\u6d77\u53e3'):
        haikou_num = i
haikou_confirmedCount = goal_list[hainan_num]['cities'][haikou_num]['confirmedCount']

#南京
for i in range(len(goal_list[jiangsu_num]['cities'])):
    if(goal_list[jiangsu_num]['cities'][i]['cityName']==u'\u5357\u4eac'):
        nanjing_num = i
nanjing_confirmedCount = goal_list[jiangsu_num]['cities'][nanjing_num]['confirmedCount']



#################################################email


from_addr = '[email protected]'
to_addr = '[email protected]'

message = MIMEText('<html><body><h1>今日疫情</h1>' +
                   '<p><b>湖北确诊人数:'+str(hubei_confirmedCount) + '</b>' +
                   '<br>&nbsp;&nbsp;&nbsp;&nbsp;武汉确诊人数:'+str(wuhan_confirmedCount) + '</p>' +
                   '<p><b>海南确诊人数:'+str(hainan_confirmedCount) + '</b>' +
                   '<br>&nbsp;&nbsp;&nbsp;&nbsp;海口确诊人数:'+str(haikou_confirmedCount) + '</p>' +
                   '<p><b>江苏确诊人数:'+str(jiangsu_confirmedCount) + '</b>' +
                   '<br>&nbsp;&nbsp;&nbsp;&nbsp;南京确诊人数:'+str(nanjing_confirmedCount) + '</p>' +
                   '</body></html>','html','utf-8')

message['From'] = Header(from_addr)
message['To'] = Header(to_addr)
message['Subject'] = Header(u'新冠肺炎疫情自动通报')

server = smtplib.SMTP()
server.connect('smtp.qq.com',587) #smtp服务器和端口,端口是固定的
server.login(from_addr,'xxx') #第二个参数是邮箱授权码,需要自己去邮箱里找
server.sendmail(from_addr,to_addr,message.as_string())

server.quit()


Fifth, the results show

Here Insert Picture Description

Released three original articles · won praise 8 · views 806

Guess you like

Origin blog.csdn.net/qq_38679612/article/details/104264843