Obtaining seismic information using python

June 17 22 minutes and 25 minutes, the mayor of Yibin County, Sichuan Province 6.0 earthquake occurred, the success of early warning earthquake early warning network China Chengdu Hi-tech Research Institute of Disaster Reduction and Emergency Management departments jointly building the earthquake, 10 seconds ahead of Yibin City warning ahead 61 seconds to Chengdu warning.

Today to share an article on reptiles knowledge of Python, we did not use hate regular expressions, but with a new way of parsing library that implements HTML parsing and extraction of information, and finally use the previous project used wxpy library that implements all of the features, acquiring seismic information with Python, then micro-channel real-time pushed to your group or your friends.

1. preparation

1. basic knowledge of reptiles, such as requests library, and lxml library;

2. using Xpath parsing the HTML;

Some simple projects before writing used for extracting information page is a regular expression, but when the project complex, more cumbersome to use regular expressions, if there wrong place, could result in a match fail, so use regular expressions to extract page information more or less still somewhat inconvenient.

By recent study, I know the web page can be located in one or more nodes by CSS or Xpath selectors, and then call the appropriate method to get its text content or attribute that makes it quick and easy to extract the information we want .

3. To achieve micro-channel real-time push wxpy definitely need to use the library;

4. The project crawling earthquake information from the China Seismological Network, the link is: http: //news.ceic.ac.cn/index.html time = {int (time.time ())}?.

[2019] the most complete line learning Python poke my reading, get a full set of development tools, and python entry-learning materials

2. Code Integration

import requests, time

from lxml import etree

from wxpy import *

# Micro-letters landed

Bot bot = ()

# find friends

my_friend = bot.friends (). search (u'stormwen ') [0] # write your own discussion group name

with open('log.txt', 'r') as f:

  rember = f.readline()

headers = {

  'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36',

  'cookie': 'Hm_lvt_e0025cd5d352165f8a646ccea5beb27d=1543211803; Hm_lpvt_e0025cd5d352165f8a646ccea5beb27d=1543211803',

}

while True:

  try:

    url = f'http://news.ceic.ac.cn/index.html?time={int(time.time())}'

    # 请求数据

    res = requests.get(url, headers=headers).text.encode('ISO-8859-1').decode('utf8')

    html_ele = etree.HTML(res)

    # 返回列表

    res = html_ele.xpath('//*[@id="news"]//td//text()')

    # 如果日志为空,发送最新的一条地震信息

    if rember == '':

      msg = f'北京时间:{res[1]},在纬度:{res[2]} ,经度{res[3]} 处发生了{res[0]}级地震,震源深度{res[4]}千米,参考位置:{res[5]}(5分钟更新一次)'

      # 发送信息

      my_friend.send(msg)

      print('日志为空,msg:', msg)

    # 如果日志非空,就判断是否是最新的,发送日志之后的所有新的数据

    else:

      i = res.index(rember)

      while i > 1:

        i -= 6

        msg = f'北京时间:{res[i]},在纬度:{res[i+1]} ,经度{res[i+2]} 处发生了{res[i-1]}级地震,震源深度{res[i+3]}千米,参考位置:{res[i+4]}(5分钟更新一次)'

        # 发送信息

        my_friend.send(msg)

        print('日志非空,msg:', msg)

    time.sleep(300)

    rember = res[1]

    # 更新日志(记录最新发送的地震信息)

    with open('log.txt', 'w') as f:

      f.write(res[1])

  except:

    time.sleep(60)

3.结果展示

4.总结

我一直认为语言只是工具,只有用它来做点具体的事,才体现出它的价值。今天这个项目用到了Python的爬虫知识,没有用大家讨厌的正则表达式,而是用一种新的方式解析库,实现对HTML的解析和提取信息,最后又用到前面项目用过的wxpy库,实现了全部功能。

Guess you like

Origin blog.csdn.net/huasdsadsa/article/details/93916051