Python crawler advanced - selenium uses proxy under win and linux

Table of contents

Windows selenium configuration

download link

Chrome Chromedriver Version Correspondence

practice test

operating elements

browser operation

Get element information

mouse operation

Combat demo

selenium add proxy

Linux selenium configuration

Check server environment

Download and install third-party libraries (the simplest version)

practice test

code testing

Create a screenshot png view in the directory

Getting Selenium to run in headed mode in Linux 

Introduction to Xvfb

combat test


Windows selenium configuration

Download address (you just need to click directly)

Selenium
ChromeDriver
Chrome
GeckoDriver
Firefox

 Correspondence between Chrome  Chromedriver versions

We maintain multiple versions of ChromeDriver. Which version you choose depends on the version of Chrome you are using.

  1. Specifically, ChromeDriver uses the same version number scheme as Chrome. See https://www.chromium.org/developers/version-numbers for more details.
  2. Each version of ChromeDriver supports Chrome with the same major, minor, and build numbers. For example, ChromeDriver 73.0.3683.20 supports all Chrome versions starting with 73.0.3683.
  3. Before a new major version of Chrome goes into beta, a matching version of ChromeDriver is released.
  4. After the initial release of a new major version, we will release patches as needed. These patches may or may not coincide with updates to the Chrome browser.

Here are the steps to choose which version of ChromeDriver to download:

  1. First, find out which version of the Chrome browser you're using. Let's say your Chrome is 72.0.3626.81.
  2. Take the Chrome version number, drop the last part, and append the result to the URL "https://chromedriver.storage.googleapis.com/LATEST_RELEASE_". For example, using Chrome version 72.0.3626.81, you will get a URL "https://chromedriver.storage.googleapis.com/LATEST_RELEASE_72.0.3626".
  3. Use the URL created in the last step to retrieve a small file containing the version of ChromeDriver to use. For example, the above URL will result in a file containing "72.0.3626.69". (Of course, actual numbers may change in the future).
  4. Use the version number obtained from the previous step to construct the URL to download ChromeDriver. For version 72.0.3626.69, the URL will be "https://chromedriver.storage.googleapis.com/index.html?path=72.0.3626.69/".
  5. After the initial download, it is recommended that you go through the above process occasionally to see if there are any bugfix releases.

practice test

operating elements

1、.send_keys()  # 输入方法
2、.click()  # 点击方法
3、.clear()  # 清空方法

browser operation

1、driver.maximize_window()  # 最大化浏览器
2、driver.set_window_size(w,h)  # 设置浏览器大小 单位像素 【了解】
3、driver.set_window_position(x,y)  # 设置浏览器位置  【了解】
4、driver.back() # 后退操作
5、driver.forward() # 前进操作
6、driver.refrensh() # 刷新操作
7、driver.close() # 关闭当前主窗口(主窗口:默认启动那个界面,就是主窗口)
8、driver.quit() # 关闭driver对象启动的全部页面
9、driver.title # 获取当前页面title信息
10、driver.current_url # 获取当前页面url信息

Get element information

1、text 获取元素的文本; 如:driver.text
2、size 获取元素的大小: 如:driver.size
3、get_attribute 获取元素属性值;如:driver.get_attribute("id") ,传递的参数是元素的属性名
4、is_displayed 判断元素是否可见 如:element.is_displayed()
5、is_enabled 判断元素是否可用 如:element.is_enabled()
6、is_selected 判断元素是否被选中 如:element.is_selected()

mouse operation

1、context_click(element) # 右击
2、double_click(element)  #双击
3、double_and_drop(source, target)  # 拖拽
4、move_to_element(element)  # 悬停 【重点】
5、perform()  # 执行以上事件的方法 【重点】

Combat demo

# demo
from selenium import webdriver
from selenium.webdriver.common.by import By
import time


options = webdriver.ChromeOptions()

options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument('--disable-dev-shm-usage')
# options.add_argument('--proxy-server=http://{0}'.format(ip))
driver = webdriver.Chrome(options=options)

# 用户正常访问该值为false。使用selenium时该值为true。
# 下面代码解决掉这个问题
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
    "source": """
               Object.defineProperty(navigator, 'webdriver', {
                 get: () => undefined
               })
               """
})
driver.get("https://www.baidu.com/")
time.sleep(5)
# 截图看是否访问了百度
driver.save_screenshot("baidu.png")

selenium add proxy

        No matter how you do a crawler, you need to use a proxy, even if it is automated, it is impossible for an IP address to visit thousands of tens of thousands a day.

# 添加无认证代理,以参数形式添加
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument('--proxy-server=http://ip:port')  
driver = webdriver.Chrome(chrome_options=chromeOptions)

 You need to find the ip and port by yourself, get it directly through the proxy platform api, and just install it.

Linux selenium configuration

Check server environment

[root@aa /]# lsb_release -a
Distributor ID: CentOS
Release:        7.9.2009

[root@aa /]# python -V
Python 2.7.5

[root@aa /]# python3 -V
Python 3.6.8

Download and install third-party libraries (the simplest version)

# install selenium

pip3 install selenium

# install chromedriver

yum install https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm yum install mesa-libOSMesa-devel gnu-free-sans-fonts wqy-zenhei-fonts

# Download the corresponding version of Chromedriver (the URL corresponding to the version below is correct) https://chromedriver.storage.googleapis.com/index.html?path= 110.0.5481.30 /

# move Place

mv chromedriver /usr/bin/

# Give execute permission

chmod +x /usr/bin/chromedriver

practice test

code testing

# demo
from selenium import webdriver
from selenium.webdriver.common.by import By
import time


#options = webdriver.ChromeOptions()
#options.add_argument('--headless')
options = webdriver.ChromeOptions()
# 服务器无界面运行,否则会报错,后续配置插件解决
options.add_argument("headless")

options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument('--disable-dev-shm-usage')
# options.add_argument('--proxy-server=http://{0}'.format(ip))
driver = webdriver.Chrome(options=options)
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
    "source": """
               Object.defineProperty(navigator, 'webdriver', {
                 get: () => undefined
               })
               """
})
driver.get("https://www.baidu.com/")
time.sleep(5)
# 截图看是否访问了百度
driver.save_screenshot("aaaaaaaaaaaaaaaaaa.png")

Create a screenshot png view in the directory

Getting Selenium to run in headed mode in Linux 

Introduction to Xvfb

Xvfb implements the X11 display service protocol on a machine without an image device. It implements various interfaces that other graphical interfaces have, but does not have a real graphical interface

So when a program calls GUI-related operations in Xvfb, these operations will run in virtual memory, but you can't see anything

Using Xvfb, we can trick Selenium or Puppeteer into thinking that it is running in a system with a graphical interface, so that we can use the headed mode normally

# Install

yum install Xvfb

combat test

# 更改 demo

# 服务器无界面运行,否则会报错,后续配置插件解决
# 注释掉 以正常有界面模式运行
# options.add_argument("headless")


xvfb-run XXX
# 例如
xvfb-run python3 selenium_test.py

运行查看截图   成功

----------

2023.2.20

Guess you like

Origin blog.csdn.net/qq_52213943/article/details/129048320