Python (Reptile Age) - Reptile Development 03 (Selenium))

Selenium basic operations

Selenium

Selenium is a web automated testing tool. It was originally developed for website automation testing. Selenium can run directly on the browser and it supports all mainstream browsers ( Including interfaceless browsers such as PhantomJS), you can receive instructions to let the browser automatically load the page, obtain the required data, and even take screenshots of the page

Selenium

Install
- pip install selenium
Load web page
- from selenium import webdriver
- driver=webdriver.Chrome()
- driver.get("http://www.baidu.com/")
- driver.save_screenshot("aa.png")
Position and operate
- driver.find_element_by_id("kw").send_keys("Xiaobai")
- driver.find_element_by_id("su").click()
View request
- driver.page_source
- driver.get_cookies()
- driver.current_url
quit
- driver.close() Exit the current page
- driver.quit() Exit the browser

Example

from selenium import webdriver

# 声明Chrome浏览器对象
wb=webdriver.Chrome()
# 访问百度页面
wb.get("http://www.baidu.com")

The Chrome browser will automatically open and visit Baidu. The resultsare as follows:

If you encounter the following error during execution, it is because Chrome's webdriver is not configured.

The solution is as follows:

Visit chrome://version/ to view the browser version
Visit the browser driver URL, find the version driver corresponding to Chrome, and choose the one that suits your system http://chromedriver.storage.googleapis.com/index.html a>
Extract the downloaded driver file and place it inthe installation directory of the c drive browser, At the same time, put another copy into the Python installation directory

The browser can also be set for windowless access

from selenium import webdriver

# 设置无窗口
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')

# 声明Chrome浏览器对象
wb=webdriver.Chrome(options=chrome_options)
# 访问百度页面
wb.get("http://www.baidu.com")

Common browser parameters

Parameters can be found at: https://peter.sh/experiments/chromium-command-line-switches/

# 启动就最大化
--start-maximized

# 指定缓存Cache路径
–-disk-cache-dir=”[PATH]“

# 指定Cache大小，单位Byte
–-disk-cache-size=100

# 隐身模式启动
–-incognito

# 禁用弹出拦截
--disable-popup-blocking

# 禁用插件
--disable-plugins

# 禁用图像
--disable-images

Set browser proxy

chrome_options.add_argument('--proxy-server=http://{ip}:{port}')

Selenium operates page elements

There are two ways to find elements in Selenium

The first is to specify which method to use to find the element, such as specifying a CSS selector or searching based on xpath.
The second is to use find_element() directly. The first parameter passed in is the element search method to be used, and the second parameter is the search value.

Example

from selenium import webdriver
from selenium.webdriver.common.by import By

# 声明Chrome浏览器对象
wb=webdriver.Chrome()
# 访问百度页面
wb.get("http://www.baidu.com")

''' 查找单个元素 '''
# 通过id查找
element = wb.find_element_by_id("kw")
print(element.tag_name)
# 通过name查找
element = wb.find_element_by_name("wd")
print(element.tag_name)
# 通过xpath查找
element = wb.find_element_by_xpath('//*[@id="kw"]')
print(element.tag_name)

# 通过另一种方式查找
element = wb.find_element(By.ID, "kw")
print(element.tag_name)
element = wb.find_element(By.NAME, "wd")
print(element.tag_name)


''' 查找多个元素 '''
print("根据class 属性查找多个元素")
elements=wb.find_elements_by_class_name("s-isindex-wrap")
for ele in elements:
    print(ele.tag_name)

Selenium page operations, imitating mouse click events and keyboard input events

import time
from selenium import webdriver
# 声明Chrome浏览器对象
wb=webdriver.Chrome()
# 访问百度页面
wb.get("http://www.baidu.com")

# 获取百度搜索框元素
element = wb.find_element_by_id("kw")
# 在搜索框中输入关键词 python
element.send_keys("python")
# 点击"百度一下"按钮
wb.find_element_by_xpath('//*[@id="su"]').click()

# 休眼10秒
time.sleep(10)

# 退出当前页面
wb.close()

Browser operations

When requesting a web page, there may be AJAX asynchronous loading. Selenium will only load the main web page and will not take AJAX into account. Therefore, you need to wait some time for the web page to load completely before proceeding.

Implicit loading

When using implicit wait, if webdriver does not find the specified element, it will continue to wait for the specified element to appear until the set time is exceeded. If the specified element is still not found, an element not found exception will be thrown. The default waiting time is 0.
Implicit wait is waiting for the entire page. It should be noted that the implicit wait works for the entire driver cycle, so it only needs to be set once.

from selenium import webdriver

# 声明Chrome浏览器对象
wb=webdriver.Chrome()

# 设置隐式等待时间，单位为秒
wb.implicitly_wait(10)
# 访问百度页面
wb.get("http://www.baidu.com")

# 设置搜索关键词
element = wb.find_element_by_id("kw")
element.send_keys("Python")
wb.find_element_by_xpath('//*[@id="su"]').click()

# 页面右边的"百度热榜"
element2 = wb.find_element_by_xpath('//*[@id="con-ar"]/div[2]/div/div/table/tbody[1]/tr[1]/td[1]/a')
print(element2)

show wait

Explicit waiting waits for the specified element. First determine whether the waiting condition is established. If it is established, return directly; if the condition is not established, the maximum waiting time is the set waiting time. If the waiting condition is still not met after the waiting time, an exception is thrown.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# 声明Chrome浏览器对象
wb=webdriver.Chrome()
# 访问百度页面
wb.get("http://www.baidu.com")

# 设置搜索关键词
element = wb.find_element_by_id("kw")
element.send_keys("Python")
wb.find_element_by_xpath('//*[@id="su"]').click()

# 显示等待10秒，直到页面右边的"百度热榜列表"出现
WebDriverWait(wb, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "toplist1-tr")))

# 获取页面右边的"百度热榜"
element2 = wb.find_element_by_xpath('//*[@id="con-ar"]/div[2]/div/div/table/tbody[1]/tr[1]/td[1]/a')
print(element2)

Browser forward and backward

import time
from selenium import webdriver
# 声明Chrome浏览器对象
wb=webdriver.Chrome()

# 设置隐式等待时间，单位为秒
wb.implicitly_wait(10)
# 访问百度页面
wb.get("http://www.baidu.com")

time.sleep(5)
# 访问豆瓣
wb.get("https://www.douban.com/")
time.sleep(5)
# 返回上个页面
wb.back()
time.sleep(5)
# 前进到下个页面
wb.forward()

Browser adds cookie

from selenium import webdriver
# 声明Chrome浏览器对象
wb=webdriver.Chrome()

# 访问百度页面
wb.get("http://www.baidu.com")

# 获取当前的cookie
print(wb.get_cookies())

# 添加cookie
wb.add_cookie({'name': 'my_cookie', 'value': 'myCookie'})

# 获取设置的cookie
print(wb.get_cookie('my_cookie'))

# 删除设置的cookie
wb.delete_cookie('my_cookie')

The Record of Programmers and Investment Life has been renamed Programmer Zhiqiu, which is the same as the WX official account. Welcome to pay attention!

Python (Reptile Age) - Reptile Development 03 (Selenium))

Selenium basic operations

Selenium

Selenium

Example

The Chrome browser will automatically open and visit Baidu. The resultsare as follows:

If you encounter the following error during execution, it is because Chrome's webdriver is not configured.

The solution is as follows:

The browser can also be set for windowless access

Common browser parameters

Set browser proxy

Selenium operates page elements

There are two ways to find elements in Selenium

Example

Selenium page operations, imitating mouse click events and keyboard input events

Browser operations

Implicit loading

show wait

Browser forward and backward

Browser adds cookie

Guess you like