Introduction
Initially selenium is an automated testing tool, and reptiles use it mainly to solve the problem requests can not be directly executed JavaScript code
selenium is essentially driven by the browser, fully simulate the operation of the browser, such as jumps, input, click the drop-down, etc., to get the results after page rendering to support multiple browsers
from selenium import webdriver Browser = webdriver.Chrome () # Google Chrome Browser = webdriver.Firefox () # Firefox Browser = webdriver.PhantomJS () browser=webdriver.Safari() browser=webdriver.Edge()
installation
>: pip3 install selenium
-
There browser interface
Download chromdriver.exe into the scripts directory python installation path can pay attention to the latest version is 2.38, not 2.9 domestic mirror site address: HTTP: //npm.taobao.org/mirrors/chromedriver/2.38/ latest version Quguan network find: HTTPS: //sites.google.com/a/chromium.org/chromedriver/ Downloads # Verify the installation C: \ the Users \ Administrator> python3 Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from selenium import webdriver >>> driver=webdriver.Chrome() #弹出浏览器 >>> driver.get('https://www.baidu.com') >>> driver.page_source # Note: selenium3 supported by default webdriver is Firfox, and Firefox will need to install geckodriver Download Link: HTTPS: //github.com/mozilla/geckodriver/releases
Recommended within chromdriver.exe into the project record
-
No browser interface
Download phantomjs, unzip phantomjs.exe the bin directory into the environment variable Download link: HTTP: //phantomjs.org/ download.html # Verify the installation C: \ the Users \ Administrator> PhantomJS phantomjs> console.log('egon gaga') there gaga undefined phantomjs> ^C C:\Users\Administrator>python3 Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from selenium import webdriver >>> driver=webdriver.PhantomJS() #无界面浏览器 >>> driver.get('https://www.baidu.com') >>> driver.page_source
Note: phantomjs is no longer updated
selenium + Google browser headless mode
from selenium import webdriver from selenium.webdriver.chrome.options import Options chrome_options = Options() chrome_options.add_argument ( ' window-size = 1920x3000 ' ) # BROWSER resolution chrome_options.add_argument ( ' --disable-GPU ' ) # Google documents mentioned the need to add this property to circumvent the bug chrome_options.add_argument ( ' - scrollbars-hide ' ) # scroll bar is hidden, to deal with some special pages chrome_options.add_argument ( ' Blink-Settings = imagesEnabled = false ' ) # not loaded images, to enhance the speed chrome_options.add_argument ( ' --headless ' ) # browser does not provide visual page. If the system is not supported under linux visualization without this fail to start chrome_options.binary_location = r" C: \ Program Files (x86) \ Google \ Chrome \ the Application \ chrome.exe " # manually specify the chrome_options.add_argument ( ' disable-infobars ' ) # remove 'chrome is being controlled automatic test software' Tip # chrome_options. binary_location = r "C: \ Program Files (x86) \ Google \ Chrome \ Application \ chrome.exe" # manually specify the # Bro = webdriver.PhantomJS () Bro = webdriver.Chrome (chrome_options = chrome_options) # Bro = webdriver. Chrome () bro.get ( ' https://www.baidu.com ' ) # open the destination url Print (bro.page_source) # get the code of the target page bro.close () # close the browser, resource recovery
use
Basic use
from selenium import webdriver from selenium.webdriver.common.keys import Keys import time from selenium.webdriver.chrome.options import Options chrome_options = Options() chrome_options.add_argument ( ' disable-infobars ' ) # remove 'chrome is being controlled automatically test software' tips Bro = webdriver.Chrome (chrome_options = chrome_options) bro.get ( ' https://www.baidu.com ' ) # open the destination URL InP = bro.find_element_by_id ( ' kW ' ) # Get input block page inp.send_keys ( ' Python ' ) # the input frame typing inp.send_keys (Keys.ENTER) # simulated keyboard enter operations Print (bro.page_source) time.sleep(5) bro.close () # close the browser