selenium automated testing tools simulate landing crawling bestseller list Dangdang top500

selenium automated testing tools can be described as reptiles weapon, the basic dynamic load web crawls, of course, with the update large sites, also appeared for selenium's anti-climb, some sites can recognize whether you are using a selenium access, then you be limited.

Dangdang currently no restrictions in this regard, so today use this exercise to familiarize yourself with selenium operate, we can try to crawl the relevant information about Dangdang bestseller list of top500, the page is as follows:

 

 

While this page can not log in, but we can easily try simulated landing directly on top of this page click Login to enter the login screen, then it will pop up a window,

This is a hundred percent will occur, so we have to simulate a click it points out, in order to transfer the account number and password and then login 

 

 

Then solve is to verify the code, to be honest, Dangdang now it is difficult to achieve with the basic code to crack the code, but you can skip the manual, I am here to pause ten seconds, direct their own click, and then wait for the program to run it like this it is easy to circumvent, anyway, as long as after verification that hurdle, the following data is not afraid to get up.

 

Posted under the Code:

from selenium import webdriver
import time
from lxml import etree
import csv

browser = webdriver.Chrome()
browser.get("http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-recent7-0-0-1-1") # browser.get_cookies() time.sleep(1) button_login1 = browser.find_element_by_xpath("//span[@id='nickname']/a[@class='login_link']") button_login1.click() close_button = browser.find_element_by_id("J_loginMaskClose") close_button.click() input_phone_number = browser.find_element_by_id("txtUsername") input_phone_number.send_keys('自己账号') time.sleep(0.2) input_password = browser.find_element_by_id("txtPassword") input_password.send_keys('自己密码') time.sleep(10) button_login2 = browser.find_element_by_id("submitLoginBtn") button_login2.click() # button_book = browser.find_element_by_name("nav1") # button_book.click() # button_list = browser.find_element_by_xpath("//div[@class='book_top ']/a[@class='more_top']") # button_list.click() for i in range(25): time.sleep(5) text = browser.page_source # print(text) html = etree.HTML(text) book_name = html.xpath("//div[@class='name']/a/text()") price = html.xpath("//span[@class='price_n']/text()") original_price = html.xpath("//span[@class='price_r']/text()") publisher = html.xpath("//div[@class='publisher_info'][2]/a/text()") # auther = html.xpath("//div[@class='publisher_info'][1]/text()") time1 = html.xpath("//div[@class='publisher_info'][2]/span/text()") result = zip(book_name, publisher, price, original_price, time1) with open('book.csv', 'a', newline='') as csvfile: writer = csv.writer(csvfile, dialect='excel') writer.writerows(result) for i in result: print(i) next_button = browser.find_element_by_xpath( "//div[@class='bang_list_box']/div[@class='paginating']/ul[@class='paging']/li[@class='next']/a") next_button.click()

Guess you like

Origin www.cnblogs.com/lattesea/p/11746485.html