版权声明:原创不易,如若转载,请注明出处! https://blog.csdn.net/MG1723054/article/details/81875649
上节博客我们利用requests请求库,正则表达式来提取信息(链接https://mp.csdn.net/postedit/81865681),提到过使用selenium也可以抓取酒店信息,在这里利用selenium模块优点是不需要数据处理过滤,只需要处理异常,(实际上也是一样的效果)但是对于使用selenium爬取效率太慢。
具体的分析如上节分析的类似,只是定位元素是利用find_element(s)_by_id,find_element(s)_by_class_name等等定位方法。下面直接给出代码。如有疑问,可留言一起探讨。
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException ###导入异常属性
from pymongo import MongoClient
driver=webdriver.Chrome()
url='http://hotels.ctrip.com/hotel/nanjing12#ctm_ref=ctr_hp_sb_lst'
driver.get(url)
time.sleep(2)
button=driver.find_element_by_css_selector('#appd_wrap_close')
button.click()
page=driver.find_element_by_css_selector('#page_info > div.c_page_list.layoutfix > a:nth-child(9)') ###定位总页数
for m in range(int(page.text)) :
if m :
next_button=driver.find_element_by_css_selector('.c_down')
next_button.send_keys(Keys.ENTER)
driver.execute_script('window.scrollBy(0,5800)')
time.sleep(2)
infor=driver.find_element_by_css_selector('#hotel_list').find_elements_by_class_name('hotel_new_list')
# print(len(infor))
for data in infor :
try:
hotel_level=data.find_element_by_css_selector('ul > li.hotel_item_judge.no_comment > div.hotelitem_judge_box > a > span.hotel_level').text
except NoSuchElementException :
hotel_level=''
try:
recommend=data.find_element_by_css_selector(' ul > li.hotel_item_judge.no_comment > div.hotelitem_judge_box > a > span.recommend').text
except NoSuchElementException :
recommend=' '
hotel_information= {
'title':data.find_element_by_css_selector(' ul > li.pic_medal > div > a').get_attribute('title'),
'adress':data.find_element_by_css_selector(' ul > li.hotel_item_name > p.hotel_item_htladdress').text.rstrip('地图'and'地图街景') ,
'hotle_score':hotel_level+data.find_element_by_css_selector('ul > li.hotel_item_judge.no_comment > div.hotelitem_judge_box > a > span.hotel_value').text,
'lowestprice':data.find_element_by_css_selector('ul > li.hotel_price_icon > div > div > div > a > span').text,
'recommend':recommend,
}
print(hotel_information)
client=MongoClient()
db=client['ctrip']
collection=db['hotels information']
collection.insert_one(hotel_information)
运行得到部分结果截图如下:
原创不易,如若转载,请注明出处与作者,谢谢!