Hello everyone!
When crawling some websites , if the website is restricted, the data must be displayed after login, and the login can only be done through the SMS verification code
At this time, we can complete the login through an already opened browser, and then use the program to continue operating the browser to complete the data crawling
The specific steps are as follows:
1-1 Install dependencies
# 安装依赖
pip3 install selenium
1-2 Chrome application full path
Right-click to view the full path of the Chrome browser
比如:C:\Program Files\Google\Chrome\Application\chrome.exe
1-3 Start the browser from the command line
Next, start the Chrome browser through the command line in the CMD terminal
# 启动浏览器
cd C:\Program Files\Google\Chrome\Application && chrome.exe --remote-debugging-port=1234 --user-data-dir=“C:\selenum\user_data”
in
–remote-debugging-port
Specify browser debugging port number
PS: You can randomly specify a port number here. Do not specify a port number that is already occupied.
–user-data-dir
User profile directory
Here you need to specify a folder directory separately (it will be created if it does not exist). If you do not specify this parameter explicitly, running will pollute the default configuration file of the browser.
1-4 Download ChromeDriver
Download the corresponding ChromeDriver driver according to the Chrome browser version and move it to a certain directory.
download link:
http://chromedriver.storage.googleapis.com/index.html
1-5 Operate the opened browser
Assuming that the browser opened above opens the Baidu homepage, we now write a simple program to continue operating the browser above
Note that you need to use debuggerAddress to specify the browser's address and port number.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
chrome_options = Options()
# 指定已经打开浏览器的地址及端口号
chrome_options.add_experimental_option("debuggerAddress", "127.0.0.1:1234")
# 注意:chrome版本与chromedirver驱动要保持一致
# 下载地址:http://chromedriver.storage.googleapis.com/index.html
s = Service(r"C:\Users\xingag\Desktop\111\chromedriver.exe")
driver = webdriver.Chrome(service=s, options=chrome_options)
# 操作浏览器
input_element = driver.find_element(By.ID, 'kw')
if input_element:
# 关键字
input_element.send_keys("AirPython")
submit_element = driver.find_element(By.ID, 'su')
if submit_element:
# 点击搜索
submit_element.click()
# 释放资源
# driver.close()
Finally, I would like to thank everyone who has read my article carefully. Reciprocity is always necessary. Although it is not a very valuable thing, you can take it away if you need it:
These materials should be the most comprehensive and complete preparation warehouse for [software testing] friends. This warehouse has also accompanied tens of thousands of test engineers through the most difficult journey. I hope it can help you too!