[python] Get historical weather data

Record my python learning path.
Today, I did not use crawlers to obtain data, but according to the data characteristics of the target webpage, I used pd.read_html() to read webpage form data.

Preparation

Environment : python environment 3. Multiple versions, the libraries that need to be used mainly include openpyxl. pandas (for saving data).
Target website : Historical weather in 2345 Weather King
insert image description here

After writing this blog post, I also searched for related crawling methods on the Internet, and found that there are many similar or more brief methods, which are posted here for everyone to learn. The road is long and the road is long~

Reference blog post : [Python Strange Skills] Using the read_html function of pandas to implement a web crawler with only one line of code

Note : When browsing major websites, avoid multiple requests to the other party's server in a short period of time. This is also to respect the data provided by the other party, and not to repay favors.

full code

The code includes reading webpage information and also storing the data directly into the excel sheet.

# import lxml
from openpyxl import load_workbook
import pandas as pd
url = "https://tianqi.2345.com/wea_history/58453.htm"#目标网址
tables = pd.read_html(url)[0]
#准备存储
workbook = load_workbook(filename='数据.xlsx')#打开表格
sheet = workbook.create_sheet('天气')#创建表格
sheet.append(list(tables))#表格的标头
i=0
for line in range(0,len(tables.iloc[:,1])):#分别读取每一行
    print(list(tables.iloc[i, :]))#打印查看
    txt = list(tables.iloc[i, :])
    sheet.append(list(txt))
    i=i+1
workbook.save(filename="数据.xlsx")#保存表格

Code Analysis

What needs to be explained here is that pd.read_html(url) can only target the data with the '<table'> tag on the web page.
In the top picture, we can see that there is a table data on the page, and the table data on the web page can be read directly by using pd.read_html(url).

资料显示:
Pandas可以直接用DataFrame生成HTML表格,同样可以读取HTML文件。
read_html()函数解析HTML页面,寻找HTML表格。
如果找到,就将其转换为可以直接用于数据分析的DataFrame对象。

This function can be said to be very powerful.

Welcome to learn from each other

Guess you like

Origin blog.csdn.net/CBCY_csdn/article/details/125735198