Wu Yuxiong - born natural python study notes: write web crawler code to get Beijing PM2.5 real-time data

Mastered the front talking about regular expressions, page parsing and basic BeautifulSoup crawl content, you can write a web crawler code to get the data.
Crawl Beijing PM2.5 real-time data from http://www.pm25x.com/ website.
Crawl Beijing PM2.5 real-time data 
Now our purpose is very clear, that is to retrieve PM2. Beijing 5 at the time of the real-time value. Because of this change results in real time, so your actual value will be achieved and this time I crawled in case the data is different, but the process is exactly the same data capture.

 

 

In many cases, we do not want the data well into one page website, which can not be directly crawl, 
to adopt a stepwise manner crawl. Open htψ: //www.pm25x.com / Home of the source code, by 
Ctrl + F key combination to search for keywords "Beijing", found the key buildings located title value. "5 Beijing PM2 " of
 <a> tab .

 

 

Can easily grasp the contents of this tag down by the following statement:

 

 

我们从该网页的页面看到,北京市现在的 PM2.5 值为 31 , 然后 打开 二 级 页
面的源代码,搜索“ 31 ”(你做练习时不要也查 31 ,要看看该 网站实时的 PM2 . 5
是多少你就搜多少〉。很容易发现,这个值位于 class 名为“ aqivalue ”的 <div> 标
签中(如下图),这下就好办了 。 我们通过下面两个语句,把问题搞定 :
datal=sp2 . select (” .aqivalue”) #通过类
名 aqivalue 抓取包含北京市 pm2.S 数值的标签
pm25=datal[OJ .text #获取标签中的 pm2.5 数据

 

 

 

 

 

抓取北京PM2.5的实时的数据
import requests
from bs4 import BeautifulSoup

url1 = 'http://www.pm25x.com/'  #获得主页面链接
html = requests.get(url1)  #抓取主页面数据
sp1 = BeautifulSoup(html.text, 'html.parser')  #把抓取的数据进行解析
city = sp1.find("a",{"title":"北京PM2.5"})  #从解析结果中找出title属性值为"北京PM2.5"的标签
print(city)

 

 

citylink=city.get("href")  #从找到的标签中取href属性值
print(citylink)

 

 

url2=url1+citylink  #生成二级页面完整的链接地址
print(url2)

 

 

html2=requests.get(url2)   #抓取二级页面数据
sp2=BeautifulSoup(html2.text,"html.parser")   #二级页面数据解析
print(sp2)

 

 

data1=sp2.select(".aqivalue")  #通过类名aqivalue抓取包含北京市pm2.5数值的标签
pm25=data1[0].text   #获取标签中的pm2.5数据
print("北京市此时的PM2.5值为:"+pm25) #显示pm2.5值

 

Guess you like

Origin www.cnblogs.com/tszr/p/12021654.html