Selenium module application
Introduction
Selenium was originally an automated testing tool, and it was mainly used in crawlers to solve the problem that requests cannot directly execute JavaScript code. Selenium essentially drives the browser to completely simulate the operation of the browser, such as jump, input, click, drop down, etc. , To get the result after the webpage is rendered, can support multiple browsers
Environmental installation
Download and install selenium: pip install selenium
Download the browser driver: http://chromedriver.storage.googleapis.com/index.html
View the mapping relationship between driver and browser version: http://blog.csdn.net/huilan_same/article/details/51896672
Simple use / effect display
from Selenium Import the webdriver from Time Import SLEEP # behind your browser driving position, remember preceded by r '', 'r' is to prevent the escape character Driver = webdriver.Chrome (R & lt ' driver path ' ) # the get open the Baidu page driver.get ( " http://www.baidu.com " ) # locate page "settings" option, and click driver.find_elements_by_link_text ( ' set ' ) [0] .click () sleep ( 2 ) # # Open the settings and find the "Search Settings" option, set to display 50 drivers.find_elements_by_link_text ( ' Search Settings ' ) [0] .click () SLEEP ( 2 ) # checked per page 50 m = driver.find_element_by_id ( ' NR ' ) sleep(2) m.find_element_by_xpath('//*[@id="nr"]/option[3]').click() m.find_element_by_xpath('.//option[3]').click() SLEEP ( 2 ) # Click Save Settings driver.find_elements_by_class_name ( " prefpanelgo " ) [0] .click () SLEEP ( 2 ) # deal with pop-up warning page to determine the accept () and Cancel Dismiss () driver.switch_to_alert (). accept () SLEEP ( 2 ) # find Baidu input box, and input beauty driver.find_element_by_id ( " kW " ) .send_keys ( ' beauty ' ) SLEEP ( 2 ) # click on the search button driver.find_element_by_id ( ' su ' ) .click () SLEEP ( 2 ) # find the page that opens "Selenium - Chinese open source community" and open the page driver.find_elements_by_link_text ( ' beauty _ Baidu Pictures ' ) [0] .click () SLEEP ( 3 ) # close the browser driver.quit ()
Case: Obtaining data for dynamic page loading
from selenium import webdriver from lxml import etree driver = webdriver.Chrome(executable_path='./chromedriver.exe') driver.get('http://125.35.6.84:81/xk/') dir_txt = driver.page_source tree = etree.HTML(dir_txt) title_list = tree.xpath('//*[@id="gzlist"]/li') for i in title_list: title = i.xpath('./dl/@title')[0] print(title) driver.quit()
Method description
Browser creation
Selenium supports a lot of browsers, such as Chrome, Firefox, Edge, etc., as well as mobile browsers such as Android and BlackBerry. In addition, PhantomJS, an interfaceless browser, is also supported.
from selenium import webdriver browser = webdriver.Chrome() browser = webdriver.Firefox() browser = webdriver.Edge() browser = webdriver.PhantomJS() browser = webdriver.Safari()
Element positioning
webdriver provides a series of element positioning methods, commonly used are the following:
find_element_by_id()
find_element_by_name()
find_element_by_class_name()
find_element_by_tag_name()
find_element_by_link_text()
find_element_by_partial_link_text()
find_element_by_xpath()
find_element_by_css_selector()
note
1. Find_element_by_xxx finds the first label that meets the conditions, find_elements_by_xxx finds all the labels that meet the conditions.
2. According to the ID, CSS selector and XPath, the results returned by them are completely consistent.
3. In addition, Selenium also provides a general method find_element()
, which requires two parameters to be passed in: the search method By
and the value. In fact, it is find_element_by_id()
the common function version of this method, for example, find_element_by_id(id)
it is equivalent to find_element(By.ID, id)
that the results obtained by the two are completely consistent.
Node interaction
Selenium can drive the browser to perform some operations, which means that the browser can simulate the execution of some actions. The more common usages are: the send_keys()
method used when entering text, the method used when the text is cleared, and the clear()
method used when the button is clicked click()
. Examples are as follows:
from selenium import webdriver import time browser = webdriver.Chrome() browser.get('https://www.taobao.com') input = browser.find_element_by_id('q') input.send_keys('MAC') time.sleep(1) input.clear() input.send_keys('IPhone') button = browser.find_element_by_class_name('btn-search') button.click() browser.quit()
Execute JavaScript
For some operations, the Selenium API is not provided. For example, if you pull down the progress bar, it can directly simulate and run JavaScript. At this time, execute_script()
you can use the method to achieve it. The code is as follows:
om selenium import webdriver browser = webdriver.Chrome() browser.get('https://www.jd.com/') browser.execute_script('window.scrollTo(0, document.body.scrollHeight)') browser.execute_script('alert("123")')
Get page source data
page_source
You can get the source code of the webpage through attributes, and then you can use the parsing library (such as regular expressions, Beautiful Soup, pyquery, etc.) to extract the information.
Forward and backward
# Forward and back browser simulation Import Time from the Selenium Import webdriver browser=webdriver.Chrome() browser.get('https://www.baidu.com') browser.get('https://www.taobao.com') browser.get('http://www.sina.com.cn/') browser.back() time.sleep(10) browser.forward() browser.close()
Case: Browser automation
from selenium import webdriver import time driver = webdriver.Chrome('./chromedriver.exe') driver.get('https://www.taobao.com') # Label location class_dir = driver.find_element_by_id ( ' Q ' ) # ID tag # Label interaction class_dir.send_keys ( ' umbrella ' ) # Execute program js driver.execute_script ( ' the window.scrollTo (0, document.body.scrollHeight) ' ) time.sleep(2) # Click the Search button # But driver.find_element_by_css_selector = ( '. Btn-Search') # class class label But = driver.find_element_by_xpath ( ' // * [@ the above mentioned id = "J_TSearchForm"] / div [1] / the Button ' ) # xpath path but.click () # click event driver.get('https://www.baidu.com') the time.sleep ( 2 ) # page Back driver.back () # # page forward driver.forward () time.sleep(3) driver.close () # close the browser
iframe processing & action chain
In the above example, some interactive actions are performed for a certain node. For example, for the input box, we call its input text and clear text method; for the button, we call its click method. In fact, there are other operations, they do not have specific execution objects, such as mouse dragging, keyboard keys, etc. These actions are performed in another way, that is, the action chain.
Selenium processing iframe
If the positioned tag exists in the iframe tag, you must use switch_to.frame (id)
Action chain (drag): from selenium.webdriver import ActionChains
1. Instantiate an action chain object: action = ActionChains (bro)
2. click_and_hold (div): click and hold operation
3. move_by_offset (x, y): Set the amount of dragging
4. perform (): make the action chain execute immediately
5. action.release (): release the action chain object
For example, to realize the drag operation of a node, drag a node from one place to another, it can be achieved as follows:
# 动作链 from selenium import webdriver from selenium.webdriver import ActionChains from time import sleep driver = webdriver.Chrome('./chromedriver.exe') driver.get('https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable') # If the tag is located iframe exists in the label, the label must be re-positioned by operating driver.switch_to.frame ( ' iframeResult ' ) # switch browser tab positioned scope div = driver.find_element_by_id ( ' draggable with ' ) # Operation Chain Action = ActionChains (Driver) # Click press the specified label action.click_and_hold (div) for i in range (5 ): # perform () immediately execute the action chain operation action.move_by_offset (17 , 0) .perform () sleep(0.3) # Release action chain action.release () driver.close()
Case: Implementing simulated QQ space login
from selenium import webdriver from selenium.webdriver import ActionChains from time import sleep bor = webdriver.Chrome ( ' ./chromedriver.exe ' ) bor.get('https://qzone.qq.com') # Change label location scope bor.switch_to.frame ( ' login_frame ' ) zm = bor.find_element_by_id('switcher_plogin') zm.click() user = bor.find_element_by_id('u') pawd = bor.find_element_by_id('p') user.send_keys('111222333') sleep(1) pawd.send_keys('123456') sleep(1) but = bor.find_element_by_id('login_button') but.click() sleep(3) bor.close()
Cookie processing
With Selenium, you can also easily manipulate cookies, such as obtaining, adding, and deleting cookies. Examples are as follows:
from selenium import webdriver browser = webdriver.Chrome() browser.get('https://www.zhihu.com/explore') print(browser.get_cookies()) browser.add_cookie({'name': 'name', 'domain': 'www.zhihu.com', 'value': 'germey'}) print(browser.get_cookies()) browser.delete_all_cookies() print(browser.get_cookies())
Exception handling
from selenium import webdriver from selenium.common.exceptions import TimeoutException,NoSuchElementException,NoSuchFrameException try: browser=webdriver.Chrome() browser.get('http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable') browser.switch_to.frame('iframssseResult') except TimeoutException as e: print(e) except NoSuchFrameException as e: print(e) finally: browser.close()
Headless browser & evasion detection
Google Headless Browser
Since PhantomJs has recently stopped updating and maintaining, it is recommended that you can use Google ’s headless browser, which is a Google browser with no interface.
from selenium.webdriver.chrome.options import Options chrome_options = Options() chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') dirver = webdriver.Chrome(chrome_options=chrome_options)
Selenium evades detection and recognition
Now many large websites have adopted a monitoring mechanism for selenium. For example, under normal circumstances, the value of window.navigator.webdriver when we use a browser to access Taobao and other websites is undefined. While using selenium access, the value is true. So how to solve this problem?
Only need to set the startup parameters of Chromedriver to solve the problem. Before starting Chromedriver, enable experimental function parameters for Chrome excludeSwitches
. Its value is [‘enable-automation’]
as follows. The complete code is as follows:
from selenium.webdriver import ChromeOptions
option = ChromeOptions() option.add_experimental_option('excludeSwitches', ['enable-automation'])
driver = Chrome(options=option)
Code demo:
from selenium import webdriver from time import sleep # -Free visual interface from selenium.webdriver.chrome.options Import Options # Achieve evade detection from selenium.webdriver Import ChromeOptions # Create a parameter object, to control the interface mode is turned in a non-chrome chrome_options = the Options () chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') # Achieve evade detection of the Option = ChromeOptions () option.add_experimental_option('excludeSwitches',['enable-automation']) bor = webdriver.Chrome('./chromedriver.exe',chrome_options=chrome_options,options=option) bor.get('https://i.qq.com') bor_txt = bor.page_source print(bor_txt) sleep(3) bor.close()
Super Eagle
1. Register and log in ordinary users
2. Check the question points (recharge)
3. Create the software
4. Download the sample code
Case: Using Super Eagle to simulate login 12306