How to use Selenium to crawl open browsers

Hello everyone!

When crawling some websites , if the website is restricted, the data must be displayed after login, and the login can only be done through the SMS verification code

At this time, we can complete the login through an already opened browser, and then use the program to continue operating the browser to complete the data crawling

The specific steps are as follows:

1-1 Install dependencies

# 安装依赖
pip3 install selenium

1-2 Chrome application full path

Right-click to view the full path of the Chrome browser

比如:C:\Program Files\Google\Chrome\Application\chrome.exe

picture

1-3 Start the browser from the command line

Next, start the Chrome browser through the command line in the CMD terminal

# 启动浏览器
cd C:\Program Files\Google\Chrome\Application && chrome.exe --remote-debugging-port=1234 --user-data-dir=“C:\selenum\user_data”

in

–remote-debugging-port

Specify browser debugging port number

PS: You can randomly specify a port number here. Do not specify a port number that is already occupied.

–user-data-dir

User profile directory

Here you need to specify a folder directory separately (it will be created if it does not exist). If you do not specify this parameter explicitly, running will pollute the default configuration file of the browser.

1-4 Download ChromeDriver

Download the corresponding ChromeDriver driver according to the Chrome browser version and move it to a certain directory.

download link:

http://chromedriver.storage.googleapis.com/index.html

1-5 Operate the opened browser

Assuming that the browser opened above opens the Baidu homepage, we now write a simple program to continue operating the browser above

Note that you need to use debuggerAddress to specify the browser's address and port number.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By

chrome_options = Options()

# 指定已经打开浏览器的地址及端口号
chrome_options.add_experimental_option("debuggerAddress", "127.0.0.1:1234")

# 注意:chrome版本与chromedirver驱动要保持一致
# 下载地址:http://chromedriver.storage.googleapis.com/index.html
s = Service(r"C:\Users\xingag\Desktop\111\chromedriver.exe")

driver = webdriver.Chrome(service=s, options=chrome_options)

# 操作浏览器
input_element = driver.find_element(By.ID, 'kw')

if input_element:
    # 关键字
    input_element.send_keys("AirPython")

    submit_element = driver.find_element(By.ID, 'su')

    if submit_element:
        # 点击搜索
        submit_element.click()

# 释放资源
# driver.close()

Finally, I would like to thank everyone who has read my article carefully. Reciprocity is always necessary. Although it is not a very valuable thing, you can take it away if you need it:

insert image description here

These materials should be the most comprehensive and complete preparation warehouse for [software testing] friends. This warehouse has also accompanied tens of thousands of test engineers through the most difficult journey. I hope it can help you too!   

Guess you like

Origin blog.csdn.net/qq_48811377/article/details/132760460