the use of selenium python crawling Baidu library word article

Today selenium library to learn how to use the library Baidu crawl inside a word document charges

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from pyquery import PyQuery as pq
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver
import time
options = webdriver.ChromeOptions()
options.add_argument('user-agent="Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19"')
driver Webdriver.Chrome = ( ' D: /chromedriver.exe ' , = Options Options) 
URL = " https://wenku.baidu.com/view/aa31a84bcf84b9d528ea7a2c.html " 
driver.get (URL) 
HTML = driver.page_source 
Page = driver.find_elements_by_xpath ( " / HTML / body / div [2] / div [2] / div [. 6] / div [2] / div [2] / div [. 1] / div / div [. 1] " ) # use page mark recording position of Baidu library page down 
driver.execute_script ( ' arguments [0] .scrollIntoView (); ' , page)

The results run error:

Because the library at the bottom of the page Baidu need to click "Continue reading" before they can be loaded into the full page, so you must use two lines of code

driver.find_elements_by_xpath = Page ( " / HTML / body / div [2] / div [2] / div [. 6] / div [2] / div [2] / div [. 1] / div / div [. 1] " ) # using page tag recording position of Baidu library page down 
driver.execute_script ( ' arguments [0] .scrollIntoView (); ' , page)

Scroll to the browser to "continue reading" this position, and then do click the button.

But it broke the wrong part of the yellow. For a long time, finally found the answer on stackoverflow, have to say, stackoverflow still strong ah

This man said,

scrollIntoView()

This function is part of DOM API, so you should use a web element to call it, rather than a web element in the list to use it.

This is an element I realize that I could not locate a, so I repositioned a bit elements, change the code as follows:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome('D:/chromedriver.exe')
driver.get("https://wenku.baidu.com/view/aa31a84bcf84b9d528ea7a2c.html")
page = driver.find_element_by_xpath("//*[@id='html-reader-go-more']/div[2]/div[1]/span/span[2]")
driver.execute_script('arguments[0].scrollIntoView();', page) #拖动到可见的元素去
driver.find_element_by_xpath("//*[@id='html-reader-go-more']/div[2]/div[1]/p").click()

Then you can automatically load all document content friends

 

Guess you like

Origin www.cnblogs.com/gaoshiguo/p/11614266.html