Selenium and python crawler (two) [waiting and multi-window (target 1)]

with a clear purpose

1. Our goal is to first open a web page
2. Locate one or more of the elements to perform the corresponding operation
3. Operate and obtain the element according to the needs, and process the acquired data
4. Then open it again according to the needs Many web pages repeat the previous operations. In
general, our steps are divided into these four major steps, but each step can be subdivided into several small steps. So now we come to achieve goal one

Open the webpage (segment one, open the webpage)

This has been written in the previous blog.
Use the get() method to open the webpage

from selenium import webdriver
drive=webdriver.Chrome()
drive.maximize_window()#窗口最大化
drive.get('https://www.baidu.com/')
print(len(drive.page_source))#page_source是返回的网址源码

Open web page (subdivision two, open page)

Misunderstanding presentation ; when we use the get() method, a brand new web page is opened.
For example;

from selenium import webdriver
drive=webdriver.Chrome()
drive.maximize_window()#窗口最大化
drive.get('https://www.baidu.com/')
drive.get('https://123.sogou.com/')
print(len(drive.page_source))#page_source是返回的网址源码

On the surface, two webpages are opened but in fact only one webpage is opened at the end, which is equivalent to closing the previous webpage and opening the next webpage. But when doing actual crawling work, it often jumps from one page to another and then back. That is, we hope that the browser will be in this state
Insert picture description here
and open another one on the original basis.
Correct posture
1. Use the execute_Script() method to switch web pages
2. Get the position of the window (the first window)
3. Use switc_to_window() to switch The
code is as follows;

from selenium import webdriver
drive=webdriver.Chrome()
drive.maximize_window()#窗口最大化
drive.get('https://www.baidu.com/')
print(len(drive.page_source))#page_source是返回的网址源码

drive.execute_script("window.open('https://123.sogou.com/')")
#drive.window_handles获取窗口通过索引定位窗口
drive.switch_to_window(drive.window_handles[1])

The effect is as follows;
Insert picture description here

Open the webpage (subdivision three, waiting to load) [implicit waiting]

Opening a webpage can be fast but loading a webpage is not necessarily. If you open a webpage without loading the corresponding element and rush to locate the element, you will definitely get an error.
1. The so-called implicit wait is actually equivalent to importing the time module and using the sleep() method. But what is more special is that selenium is more complicated (it is useless to directly use sleep() in various threads flying together, and it has to run and load itself.)
Directly use the method implicitly_wait() and
use the example of the previous blog.

from selenium import webdriver

drive=webdriver.Chrome()
drive.get('https://www.baidu.com/')
############
drive.implicitly_wait(10)#等待十秒
InputTag=drive.find_element_by_name('wd')
InputTag.send_keys('python')
############
SubmitBut=drive.find_element_by_xpath('//input[@type="submit" and @value="百度一下"]')
SubmitBut.click()


Open the webpage (subdivision three, waiting for loading) [Explicit waiting]

This thing is a little smarter, you can add the judgment condition to not die. Of course, more stuff is imported.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as Ec
#Ec加入判断条件的东东

code show as below;

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as Ec




drive=webdriver.Chrome()
drive.maximize_window()
drive.get('https://www.baidu.com/')
drive.implicitly_wait(10)
print(len(drive.page_source))


try:
    InputTag=WebDriverWait(drive,10).until(
        Ec.presence_of_element_located((By.ID,'kw'))
    )
    #等待十秒要是提前出现了就不等了
    InputTag.send_keys('python')


except Exception as error:
    print(error)




Enter=drive.find_element(By.ID,'su')
Enter.click()
print(len(drive.page_source))

In addition, there are many judgment conditions
Insert picture description here
. I believe that your English will not be too bad.

Final demo code

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as Ec




drive=webdriver.Chrome()
drive.maximize_window()
drive.get('https://www.baidu.com/')
drive.implicitly_wait(10)
print(len(drive.page_source))


try:
    InputTag=WebDriverWait(drive,10).until(
        Ec.presence_of_element_located((By.ID,'kw'))
    )
    InputTag.send_keys('python')


except Exception as error:
    print(error)




Enter=drive.find_element(By.ID,'su')
Enter.click()
print(len(drive.page_source))

'''切换网页'''

drive.execute_script("window.open('https://123.sogou.com/')")
drive.switch_to_window(drive.window_handles[1])


Insert picture description here
In addition, there is a small detail here.
Insert picture description here
This is why you use selenium to get Ajax content. Of course, you can directly find the interface to crack the incoming parameters.

Guess you like

Origin blog.csdn.net/FUTEROX/article/details/108428656