Today selenium library to learn how to use the library Baidu crawl inside a word document charges
from selenium import webdriver from selenium.webdriver.common.keys import Keys from pyquery import PyQuery as pq from selenium.webdriver.support.ui import WebDriverWait from selenium import webdriver import time options = webdriver.ChromeOptions() options.add_argument('user-agent="Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19"') driver Webdriver.Chrome = ( ' D: /chromedriver.exe ' , = Options Options) URL = " https://wenku.baidu.com/view/aa31a84bcf84b9d528ea7a2c.html " driver.get (URL) HTML = driver.page_source Page = driver.find_elements_by_xpath ( " / HTML / body / div [2] / div [2] / div [. 6] / div [2] / div [2] / div [. 1] / div / div [. 1] " ) # use page mark recording position of Baidu library page down driver.execute_script ( ' arguments [0] .scrollIntoView (); ' , page)
The results run error:
Because the library at the bottom of the page Baidu need to click "Continue reading" before they can be loaded into the full page, so you must use two lines of code
driver.find_elements_by_xpath = Page ( " / HTML / body / div [2] / div [2] / div [. 6] / div [2] / div [2] / div [. 1] / div / div [. 1] " ) # using page tag recording position of Baidu library page down driver.execute_script ( ' arguments [0] .scrollIntoView (); ' , page)
Scroll to the browser to "continue reading" this position, and then do click the button.
But it broke the wrong part of the yellow. For a long time, finally found the answer on stackoverflow, have to say, stackoverflow still strong ah
This man said,
scrollIntoView()
This function is part of DOM API, so you should use a web element to call it, rather than a web element in the list to use it.
This is an element I realize that I could not locate a, so I repositioned a bit elements, change the code as follows:
from selenium import webdriver from selenium.webdriver.common.keys import Keys driver = webdriver.Chrome('D:/chromedriver.exe') driver.get("https://wenku.baidu.com/view/aa31a84bcf84b9d528ea7a2c.html") page = driver.find_element_by_xpath("//*[@id='html-reader-go-more']/div[2]/div[1]/span/span[2]") driver.execute_script('arguments[0].scrollIntoView();', page) #拖动到可见的元素去 driver.find_element_by_xpath("//*[@id='html-reader-go-more']/div[2]/div[1]/p").click()
Then you can automatically load all document content friends