First, what selenium that?
Initially selenium is an automated testing tool, and reptiles use it mainly to solve the problem requests can not be performed javaScript code.
Second, why use selenium?
selenium can drive the browser automatically execute custom logic code is good, that is, code that can fully simulate the adult class using a browser to access the target site and automatically operate, then we can get it done reptiles.
the selenium is essentially driven by the browser, fully simulate the operation of the browser, such as jumps, input, click the drop-down, etc. ... and then get the results after page rendering to support multiple browsers.
So for a reptile, use it to do any good? There are benefits that can help us to avoid a series of complex communication processes, for example, before we learn the requests module, the module requests when the request is not required to simulate the communication processes are known as the analysis is complete to pass the request, then returns a response. If the target site has a series of complex communication processes, such as sliding verification during login ... then you use the time module requests are not particularly trouble. But you do not need to worry, because the site's strategy anti-climb higher, then the effect of the user experience worse, so the site needs to reduce security policy under the user's despotic power.
Look at the library can request that requests execution js? Is not it! So if your site needs to send ajax request, asynchronous rendering data to get the page, you need to use js is not to send the request. What features of a browser is? It is not a direct access to the target site, and then get each other's data, so as to render the page. These are the benefits of using selenium that is!
That there is no harm in using it? The use of selenium is essentially drive the browser sends a request to the target site, and that the browser when accessing the target site is not static resources are required to have finished loading. html, css, js These files are not waiting for it to be loaded. Speed is not particularly slow. The downside is that using it is very inefficient! So we generally use it for login authentication.
from selenium import webdriver from selenium.webdriver.common.keys import Keys import time driver = webdriver.Chrome() try: driver.implicitly_wait(10) driver.get('https://www.jd.com/') input_tag = driver.find_element_by_id('key') input_tag.send_keys('哈利波特') input_tag.send_keys(Keys.ENTER) time.sleep(10) except Exception as e: print(e) finally: driver.close()
from Selenium Import the webdriver # Import keyboard Keys from selenium.webdriver.common.keys Import Keys Import Time Driver = webdriver.Chrome () # detector block the try : # Implicit wait, wait label loading driver.implicitly_wait (10 ) # To jingdong Home transmission request driver.get ( ' https://www.jd.com/ ' ) # Find input by the input box ID The input_tag = driver.find_element_by_id ( ' Key ' ) # send_keys pass the current tag value input_tag.send_keys ( ' Chinese Dictionary ' ) # press the keyboard's Enter key input_tag.send_keys (Keys.ENTER) the time.sleep ( 3 ) '' ' crawling Jingdong commodity information: doll name url price evaluation ' '' # Element to find a # Elements find more # find all the products list good_list = driver.find_elements_by_class_name ( ' GL-Item ' ) # Print (good_list) # loop through each item for Good in good_list: #Find by attribute selector product details page URL # URL good_url = good.find_element_by_css_selector ( ' .p IMG-A ' ) .get_attribute ( ' the href ' ) Print (good_url) # Name good_name = good.find_element_by_css_selector ( ' EM-name .p ' ) .text Print (good_name) # price good_price = good.find_element_by_class_name ( ' P-. price ' ) .text Print (good_price) # evaluation number good_commit = good.find_element_by_class_name (' P-the commit ' ) .text Print (good_commit) str1 = F '' ' URL: {} good_url Name: {good_name} Price: {good_price} Evaluation: good_commit} { \ n- ' '' # the commodity information written text in with Open ( ' jd.txt ' , ' A ' , encoding = ' UTF-. 8 ' ) AS F: f.write (str1) the time.sleep ( 10 ) # catch exception the except exception AS E: Print (E) # Will ultimately drive the browser closes off a finally : driver.close ()