with a clear purpose
1. Our goal is to first open a web page
2. Locate one or more of the elements to perform the corresponding operation
3. Operate and obtain the element according to the needs, and process the acquired data
4. Then open it again according to the needs Many web pages repeat the previous operations. In
general, our steps are divided into these four major steps, but each step can be subdivided into several small steps. So now we come to achieve goal one
Open the webpage (segment one, open the webpage)
This has been written in the previous blog.
Use the get() method to open the webpage
from selenium import webdriver
drive=webdriver.Chrome()
drive.maximize_window()#窗口最大化
drive.get('https://www.baidu.com/')
print(len(drive.page_source))#page_source是返回的网址源码
Open web page (subdivision two, open page)
Misunderstanding presentation ; when we use the get() method, a brand new web page is opened.
For example;
from selenium import webdriver
drive=webdriver.Chrome()
drive.maximize_window()#窗口最大化
drive.get('https://www.baidu.com/')
drive.get('https://123.sogou.com/')
print(len(drive.page_source))#page_source是返回的网址源码
On the surface, two webpages are opened but in fact only one webpage is opened at the end, which is equivalent to closing the previous webpage and opening the next webpage. But when doing actual crawling work, it often jumps from one page to another and then back. That is, we hope that the browser will be in this state
and open another one on the original basis.
Correct posture
1. Use the execute_Script() method to switch web pages
2. Get the position of the window (the first window)
3. Use switc_to_window() to switch The
code is as follows;
from selenium import webdriver
drive=webdriver.Chrome()
drive.maximize_window()#窗口最大化
drive.get('https://www.baidu.com/')
print(len(drive.page_source))#page_source是返回的网址源码
drive.execute_script("window.open('https://123.sogou.com/')")
#drive.window_handles获取窗口通过索引定位窗口
drive.switch_to_window(drive.window_handles[1])
The effect is as follows;
Open the webpage (subdivision three, waiting to load) [implicit waiting]
Opening a webpage can be fast but loading a webpage is not necessarily. If you open a webpage without loading the corresponding element and rush to locate the element, you will definitely get an error.
1. The so-called implicit wait is actually equivalent to importing the time module and using the sleep() method. But what is more special is that selenium is more complicated (it is useless to directly use sleep() in various threads flying together, and it has to run and load itself.)
Directly use the method implicitly_wait() and
use the example of the previous blog.
from selenium import webdriver
drive=webdriver.Chrome()
drive.get('https://www.baidu.com/')
############
drive.implicitly_wait(10)#等待十秒
InputTag=drive.find_element_by_name('wd')
InputTag.send_keys('python')
############
SubmitBut=drive.find_element_by_xpath('//input[@type="submit" and @value="百度一下"]')
SubmitBut.click()
Open the webpage (subdivision three, waiting for loading) [Explicit waiting]
This thing is a little smarter, you can add the judgment condition to not die. Of course, more stuff is imported.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as Ec
#Ec加入判断条件的东东
code show as below;
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as Ec
drive=webdriver.Chrome()
drive.maximize_window()
drive.get('https://www.baidu.com/')
drive.implicitly_wait(10)
print(len(drive.page_source))
try:
InputTag=WebDriverWait(drive,10).until(
Ec.presence_of_element_located((By.ID,'kw'))
)
#等待十秒要是提前出现了就不等了
InputTag.send_keys('python')
except Exception as error:
print(error)
Enter=drive.find_element(By.ID,'su')
Enter.click()
print(len(drive.page_source))
In addition, there are many judgment conditions
. I believe that your English will not be too bad.
Final demo code
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as Ec
drive=webdriver.Chrome()
drive.maximize_window()
drive.get('https://www.baidu.com/')
drive.implicitly_wait(10)
print(len(drive.page_source))
try:
InputTag=WebDriverWait(drive,10).until(
Ec.presence_of_element_located((By.ID,'kw'))
)
InputTag.send_keys('python')
except Exception as error:
print(error)
Enter=drive.find_element(By.ID,'su')
Enter.click()
print(len(drive.page_source))
'''切换网页'''
drive.execute_script("window.open('https://123.sogou.com/')")
drive.switch_to_window(drive.window_handles[1])
In addition, there is a small detail here.
This is why you use selenium to get Ajax content. Of course, you can directly find the interface to crack the incoming parameters.