foreword
(。・∀・)ノ゙Hi everyone
Let me chat with you again~
It’s not like the weather has dropped so much recently, it caused me to catch a cold and have a stomachache for a day. I don’t know if it’s the case over there, but Changsha is jumping off a cliff to cool down, which tortured me to death.
Just take out the previous crawler + data analysis case to teach you how to collect and collect weather data, and then show the data analysis by the way (this can also be used as a final or classroom assignment)
knowledge points
- Dynamic data capture
- requestsSend requests
- Structured + unstructured data analysis
development environment
- Python 3.8 to run the code
- pycharm 2022.3.2 Auxiliary Knock Code Professional Edition
- requests send request pip install requests
- parsel parse data pip install parsel
How to install python third-party modules:
- win + R, enter cmd and click OK, enter the installation command pip install module name (pip install requests) and press Enter
- Click Terminal (terminal) in pycharm to enter the installation command
Code:
- send request
- retrieve data
- Analytical data
- save data
Reptile case implementation
1. Thought analysis
Find data sources Static data? Dynamic data?
Network packet capture analysis
Complete code [Click to receive the business card at the end of the article]
import requests # 第三方模块 提前安装 发送请求 (Python里面浏览器) 爆红是因为你没有安装模块
# 如果安装了 但还是爆红是因为什么呢? 解释器在pycharm里面配置的不对
import parsel
import csv
In addition to the code, other information is required [click on the business card at the end of the article]
f = open('天气数据.csv', mode='a', newline='', encoding='utf-8')
csv_writer = csv.writer(f)
csv_writer.writerow(['日期','最高温度','最低温度','天气', '风向','城市'])
areaList = [54511, 58362, 59287, 59493]
for areaId in areaList:
if areaId == 54511:
area = "北京"
elif areaId == 58362:
area = "上海"
elif areaId == 59287:
area = "广州"
else:
area = "深圳"
690643772 ### 源码领取
for year in range(2011, 2023):
for month in range(1, 13):
url = f'https://'
- send request
response = requests.get(url)
- retrieve data
.text: Get text content.content
: Binary data image/audio/video.json
(): Get json data string {"":"", "":"", "":""}
json_data = response.json()
- Analytical data
What data is parsed?
Structured data: json data dictionary value extraction content
Unstructured data: web page source code css/xpath/re bs4/lxml/parsel/re…
html_data = json_data['data']
# tr
select = parsel.Selector(html_data)
trs = select.css('tr')[1:]
for tr in trs:690643772 ### 源码领取
# .get(): 获取单个标签
# .getall(): 获取所有标签
td = tr.css('td::text').getall()
td.append(area)
csv_writer.writerow(td)
print(td)
Show results
Here is a little bit to show some of the effects.
at last
Today's case sharing ends here
Leave a message in the iron juice comment area with questions about the article, or click on the business card at the end of the article to communicate and learn