Python3 crawler (thirteen) Selenium for crawling dynamic pages

 Infi-chu:

http://www.cnblogs.com/Infi-chu/

Python provides many libraries for simulating browser operation, such as: Selenium, Splash, etc.

1. Commonly used references

from selenium import webdriver
from selenium.webdriver.commom.by import By
from selenium.webdriver.commom.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

2. Commonly declared browser objects

browser = webdriver.Chrome()
browser = webdriver.Firefox()
browser = webdriver.Edge()
browser = webdriver.PhantomJS()
browser = webdriver.Safari()

3. Access the page
using the get() method

from selenium import webdriver
browser = webdriver.Chrome()
browser.get('http://www.baidu.com')
print(browser.page_source)
browser.close()

4. Find node
single node

find_element_by_name() # Get according to the name value
find_element_by_id() # Get according to the id value
find_element_by_xpath()
find_element_by_link_text()
find_element_by_partial_link_text()
find_element_by_tag_name()
find_element_by_class_name()
find_element_by_css_selector() # Select according to css

# Another way of writing
find_element(By.ID,id)等价于find_element_by_id(id)

Multiple nodes
use the find_elements() method

find_elements_by_name() # Get according to the name value
find_elements_by_id() # Get according to the id value
find_elements_by_xpath()
find_elements_by_link_text()
find_elements_by_partial_link_text()
find_elements_by_tag_name()
find_elements_by_class_name()
find_elements_by_css_selector() # Select according to css

5. Node interaction
The so-called node interaction can be understood as letting the browser perform some actions, such as entering text in the input box, clicking the submit button, etc.
Use the send_keys() method when entering text
Use the clear() method when clearing the text
Use the click() method when clicking the button

6. Action chain
Action chain is an extended version of node interaction, node interaction is a momentary action, and action chain is a continuous action, such as: dragging a picture, etc.

# mouse drag
from selenium import webdriver
from selenium.webdriver import ActionChains

browser = webdriver.Chrome()
url = 'http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable'
browser.get(url)
browser.switch_to.frame('iframeResult')
source = brower.find_element_by_css_selector('#draggable')
target = brower.find_element_by_css_selector('#draggable')
actions = ActionChains(brower)
actions.drag_and_drop(source,target) # drag_and_drop() method specifies the interval
actions.perform() # perform() method makes the action execute

7. Execute JavaScript
using execute_script() method, the parameter is JavaScript syntax

8. Get node information
Get attributes
Use the get_attribute() method to get attributes, the premise is that the node needs to be selected

The text value is
obtained through the text attribute, and the premise is that the node needs to be selected

Get id, location, tag and size
Use the id property to get the id
Use the location property to get the location
Use the tag_name property to get the tag_name
Use the size property to get the size

9. Delayed waiting
Implicit waiting
When the node is searched, the node does not appear immediately, then wait for a period of time to search the DOM, the default time is 0

Explicitly wait for
the specified node to be searched, and specify the maximum waiting time. If the node is loaded within this time period, the search result is returned; otherwise, an exception is thrown

Waiting Conditions and Meaning

Wait Condition Meaning
title_js title is a content
title_contains title contains something
The presence_of_element_located node is loaded, and a positioning tuple is passed in, such as (By.ID,'p')
visibility_of_element_located node is visible, pass in the positioning tuple
visibility_of Visible, pass in the node object
presence_of_all_element_located all nodes loaded complete
text_to_be_present_in_element a node text contains a text
text_to_be_present_in_element_value A node value contains text
frame_to_be_available_and_switch_to_it load and switch
invisibility_of_element_located node is not visible
element_to_be_clickable node is clickable
staleness_of determines whether a node is still in the DOM, and can determine whether the page has been refreshed
element_to_be_selected node can be selected, pass node object
element_located_to_be_selected node can be selected, pass in a tuple object
element_selection_state_to_be passes in the node object and state, returns True if equal, otherwise False
element_located_selection_state_to_be incoming positioning tuple and state, return True if equal, otherwise False
alert_is_present is there an alert

10. Forward and backward
back() method is backward
, forward() method is forward

11.Cookies operation
get_cookies() method to obtain cookies information
add_cookies() method to add cookies information
delete.all_cookies() method to delete all cookies information

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325288230&siteId=291194637