Crawling Ultimate Edition of China Weather Network of Python Crawler Project

Explanation: This project replaces our daily viewing of the temperature conditions across the country from the webpage or other channels (this project only crawls the lowest temperature in various regions of the country, and other extended functions can be improved according to this project!!!)

Let's first look at the very human results:

insert image description hereModules used:

  1. requests
  2. pyecharts
  3. bs4

One point must be explained here (must be done): the python version we use will basically be above 3.6, so there will be problems when using the Bar in pyecharts; in order to solve this problem, the author also has specific steps to solve it: upgrade the version : only supports Python3.6+

Execute the following three items under the cmd command:


1. pip install wheel
 
2. pip install pyecharts==0.1.9.4 或则 pip install pyecharts -U
 
3. pip install pyecharts-snapshot

Note: This problem is caused by the version, and I found it after reading the official github.

Let's go directly to the complete code: (the specific operation steps will be issued in subsequent articles)

import requests
# from pyecharts import Bar
from bs4 import BeautifulSoup
from pyecharts import Bar
# 目标网站:http://www.weather.com.cn/textFC/hb.shtml

All_DATE=[]
def parse_url(url):
    headers = {
    
    'User-Agent':'此处需要更改成你本机的User-Agent'}
    reponse = requests.get(url,headers= headers)
    text = reponse.content.decode('utf-8')
    soup = BeautifulSoup(text, 'html5lib')
    ConMidtab  = soup.find('div',class_='conMidtab')
    # print(ConMidtab)
    tables = ConMidtab.find_all('table')
    for table in tables:
        # print(table)
        trs = table.find_all('tr')[2:]
        for index, tr in enumerate(trs):
            tds = tr.find_all('td')
            city_td = tds[0]
            if index==0:
                city_td = tds[1]
            city = list(city_td.stripped_strings)[0]
            tempt_td = tds[-2]
            min_tempt  = list(tempt_td.stripped_strings)[0]
            All_DATE.append({
    
    'city':city ,'min_tempt':int(min_tempt)})
    All_DATE.sort(key=lambda x: x['min_tempt'])
    data = All_DATE[0:20]  # 取出前20个
    cities = list(map(lambda x: x['city'],data))
    tempts = list(map(lambda x: x['min_tempt'], data))
    # plt.bar(cities, tempts,color ='r')  # 绘制柱状图
    # plt.show('mytempt.html')
    # print(cities)
    # print(tempts)
    chart = Bar("实时动态华北地区的最低气温柱状图")
    chart.add("",cities,tempts)
    chart.render('实时天气.html')


def get_url():
    urls = ['http://www.weather.com.cn/textFC/hb.shtml',
          'http://www.weather.com.cn/textFC/db.shtml',
          'http://www.weather.com.cn/textFC/hd.shtml',
          'http://www.weather.com.cn/textFC/hz.shtml',
          'http://www.weather.com.cn/textFC/hn.shtml',
          'http://www.weather.com.cn/textFC/xb.shtml',
          'http://www.weather.com.cn/textFC/xn.shtml',
          'http://www.weather.com.cn/textFC/gat.shtml'
          ]
    for url in urls:
        parse_url(url)
        break    # 为了节省时间我们只获取了第一个华北地区的最低气温情况


if __name__ == '__main__':
    get_url()

The results will also be displayed normally!

Guess you like

Origin blog.csdn.net/m0_48915964/article/details/115576168