Selenium basics 2-advantages and disadvantages, three waiting methods wait and limitations

Recently, I used selenium to collect and encountered some troubles. I have a deeper understanding of the advantages and disadvantages of selenium.

Let me talk about the advantages first :

  1. Visual interface, easy to learn for beginners.
  2. Data collection can be performed without a deep understanding of the interaction between dynamic loading and the backend
  3. It is more in line with ordinary people's habit of operating web pages to copy and paste

The disadvantages are also obvious:

  1. The loading efficiency is low, it is easy to cause blockage, and the collection efficiency is not high.
  2. It is not easy to migrate, and it needs more modification to adapt to linux without interface
  3. Visual collection is prone to browser problems and is not robust

I believe that all the problems can be solved. The above problems are entirely a matter of my personal ability.

The previous blog post made a simple study of selenium, and today I went into the actual combat of several problems encountered. This article mainly focuses on the problem of page element loading.

Usually when selenium collects, just like when we visit the page, we need to wait for the page data to be loaded before collecting. This process is the default in selenium, but sometimes the page loading will encounter some problems (such as some internal js The loading is not completed), although the data is collected, the page needs to wait for a while to operate. At the same time, in order to relieve the pressure of accessing the server and reduce some access frequency, you can usually use the combination of time.sleep() and random to control the collection and refresh interval.

Looking at the information, there are usually three ways to solve the loading problem:

  1. Forced to wait for sleep

    sleep(5) #打开页面以后等待5s
    
  2. Implicitly wait implicitly_wait()

    driver.implicitly_wait(10) #隐式等待10秒
    

    The method provided by WebDriver, once set, this implicit wait will work in the entire life cycle of the WebDriver object instance. It does not target a certain element, but is a global element wait, that is, when positioning an element, you need to wait for all elements on the page to load. , Will execute the next statement. If the set time is exceeded, an exception will be thrown.

    It can be understood that if the page does not finish loading, he waits or reports an exception over time.

    If you don't set it this way, the default is implicitly_wait(0), and it will be collected when the page appears when the element is loaded, so we may not be able to collect our target element, resulting in a series of errors (refer to the link below).

    https://stackoverflow.com/questions/53588966/python-selenium-difference-between-driver-implicitly-wait-and-time-sleep

    Disadvantages: When some js on the page cannot be loaded, but the element it wants to find has already come out, it will continue to wait until the page is loaded (the circle in the upper left corner of the browser tab does not turn), and then the next sentence will be executed. In some cases, the script execution speed will be affected.

  3. Explicitly wait for WebDriverWait()

Explicit wait can set the loading waiting time for a specific element

https://blog.csdn.net/sinat_41774836/article/details/88965281

wait = WebDriverWait(driver,10,0.5)
element =waite.until(EC.presence_of_element_located((By.ID,"kw"),message="")
# 此处注意,如果省略message='',则By.ID外面是两层();presence_of_element_located是定位方法和定位元素是否显示。

Involved packages

from selenium import webdriver
import time
import random
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By

Guess you like

Origin blog.csdn.net/u010472858/article/details/104282065