Python爬虫-爬取天气信息(2)

目录

1. 介绍

2. 请求信息

3. 编写爬虫

4. 测试验证


1. 介绍

        本文接着Python爬虫-爬取天气信息(1),介绍如何爬取某地区的今日天气信息

Python爬虫-爬取天气信息(1)_代码写不完了的博客-CSDN博客

         您也可以访问我的主页查看更多文章:

代码写不完了的博客_CSDN博客

2. 请求信息

(1)如下图,我们找到响应的今日天气信息如下,

 (2)对应的标头信息如下所示:

         需要注意的是请求的URL和第三章,爬取实况天气的URL相似,注意甄别。

今日天气URL:http://d1.weather.com.cn/dingzhi/101190101.html?_=1687251340643

实况天气URL:http://d1.weather.com.cn/sk_2d/101190101.html?_=1687251340642

3. 编写爬虫

(1)编写爬取今日天气的爬虫,dingzhi_weather_spider.py:

'''
爬取今日天气
'''

import re
import requests
import json
import datetime

UA = {
        'Referer': 'http://www.weather.com.cn/',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Edg/114.0.1823.43'
      }

class GetDingZhiWeather():

    def get_dingzhi_weather(area_id):

        # 请求的URL
        URL = f'http://d1.weather.com.cn/dingzhi/{area_id}.html'

        # 发送请求
        req = requests.get(URL, headers=UA)

        # print(req.text)
        # print(req)
        if req.status_code == 200:
            # 编码格式为UTF-8
            req.encoding = 'utf-8'
            # 获取当前日期
            today = datetime.date.today()

            # 匹配今日天气信息
            dingzhi_weather = re.search(r'(\{"city".*?\})', req.text)
            # 匹配天气预警信息
            alarm_weather = re.search(r'(\{"w1".*?\})', req.text)

            # 今日天气信息
            weather_info = ''
            # 天气预警信息
            alarm_info = ''

            if dingzhi_weather:
                # 将JSON格式的字符串转换为对应的Python对象。
                weather_json = json.loads(dingzhi_weather.group())
                
                weather_info = f'''
                当前日期: {str(today)}
                当前地区: {weather_json['cityname']}
                今日天气: {weather_json['weather']}
                最高气温: {weather_json['temp']}
                最低气温: {weather_json['tempn']}
                今日风向: {weather_json['wd']}
                今日风力: {weather_json['ws']}
                '''

            if alarm_weather:
                # 将JSON格式的字符串转换为对应的Python对象。
                alarm_json = json.loads(alarm_weather.group())

                alarm_info = f'''
                预警地区: {alarm_json['w1']}
                预警类型: {alarm_json['w13']}
                发布时间: {alarm_json['w8']}
                预警内容: {alarm_json['w9']}
                '''

            return weather_info + alarm_info

        else:
            
            return "数据请求失败"

(2)编写测试代码,dingzhi_weather_test.py:

from spider.dingzhi_weather_spider import GetDingZhiWeather

if __name__ == '__main__':
    
    # 调用get_dingzhi_weather方法获取地区ID为101130501的今日天气
    weather = GetDingZhiWeather.get_dingzhi_weather(101130501)
    print(weather)

4. 测试验证

(1)运行测试代码dingzhi_weather_test.py,查看控制台输出:

可以看到正常输出了地区ID为101130501的今日天气!

猜你喜欢

转载自blog.csdn.net/spx_0108/article/details/131310792