利用selenium爬取携程酒店信息

版权声明:原创不易,如若转载,请注明出处! https://blog.csdn.net/MG1723054/article/details/81875649

上节博客我们利用requests请求库,正则表达式来提取信息(链接https://mp.csdn.net/postedit/81865681),提到过使用selenium也可以抓取酒店信息,在这里利用selenium模块优点是不需要数据处理过滤,只需要处理异常,(实际上也是一样的效果)但是对于使用selenium爬取效率太慢。

具体的分析如上节分析的类似,只是定位元素是利用find_element(s)_by_id,find_element(s)_by_class_name等等定位方法。下面直接给出代码。如有疑问,可留言一起探讨。

import time
from selenium  import webdriver
from selenium.webdriver.common.keys import Keys   
from selenium.common.exceptions import NoSuchElementException  ###导入异常属性
from pymongo import MongoClient
driver=webdriver.Chrome()
url='http://hotels.ctrip.com/hotel/nanjing12#ctm_ref=ctr_hp_sb_lst'
driver.get(url)
time.sleep(2)
button=driver.find_element_by_css_selector('#appd_wrap_close')
button.click()
page=driver.find_element_by_css_selector('#page_info > div.c_page_list.layoutfix > a:nth-child(9)')  ###定位总页数
for m in range(int(page.text)) :
    if m :
        next_button=driver.find_element_by_css_selector('.c_down')       
        next_button.send_keys(Keys.ENTER)
    driver.execute_script('window.scrollBy(0,5800)')
    time.sleep(2)    
    infor=driver.find_element_by_css_selector('#hotel_list').find_elements_by_class_name('hotel_new_list')
   # print(len(infor))
    for data in infor :
        try:
            hotel_level=data.find_element_by_css_selector('ul > li.hotel_item_judge.no_comment > div.hotelitem_judge_box > a > span.hotel_level').text
        except NoSuchElementException :
            hotel_level=''
        try:
            recommend=data.find_element_by_css_selector(' ul > li.hotel_item_judge.no_comment > div.hotelitem_judge_box > a > span.recommend').text
        except NoSuchElementException :
            recommend=' '
            
        hotel_information= {
        'title':data.find_element_by_css_selector(' ul > li.pic_medal > div > a').get_attribute('title'),
        
        'adress':data.find_element_by_css_selector(' ul > li.hotel_item_name > p.hotel_item_htladdress').text.rstrip('地图'and'地图街景') ,
        
        'hotle_score':hotel_level+data.find_element_by_css_selector('ul > li.hotel_item_judge.no_comment > div.hotelitem_judge_box > a > span.hotel_value').text,
    
        'lowestprice':data.find_element_by_css_selector('ul > li.hotel_price_icon > div > div > div > a > span').text,

        'recommend':recommend,

        }
        print(hotel_information)
        client=MongoClient()
        db=client['ctrip']
        collection=db['hotels information']
        collection.insert_one(hotel_information)
        
    

运行得到部分结果截图如下:

原创不易,如若转载,请注明出处与作者,谢谢! 

猜你喜欢

转载自blog.csdn.net/MG1723054/article/details/81875649