Python crawler tutorial 30: Selenium web page elements, 8 methods of positioning!

Selenium can drive the browser to complete simulated operations of various web browsers, such as simulated clicks, etc. To operate on an element, you should first identify the element. People have various characteristics (attributes), and we can find people through their characteristics, such as ID number, name, and home address. In the same way, an element will have various characteristics (attributes), and we can find the object through this attribute.

1. What are elements?
Element: consists of the tag header + tag tail + the text content included in the tag header and tag tail;
the information of the element refers to the tag name of the element and the attributes of the element; the
hierarchical structure of the element refers to the hierarchical structure of nested elements
; Positioning ultimately involves positioning elements through the element’s information or the element’s hierarchical structure;

2. View element information: In the browser, select the element, right-click "Inspect", and you can view the element information in Elements. Take checking the search box on Baidu homepage as an example, as shown in the figure below: After
Insert image description here
clicking Check, you can In the developer tools, quickly locate the element position of the Baidu input box, as shown below.
Insert image description here
3. Element positioning methods. Selenium provides a series of object positioning methods. The following 8 are commonly used:
Insert image description here
Insert image description here
import selenium. Currently, due to the upgrade of the new version of selenium, using find_element_by_* will prompt a deprecation warning. It is recommended to use find_element() to Adapt to the latest version. In the following code, I am using version 4.13.

print(selenium.__version__)
# 4.13.0

(1) Positioning based on id: As shown in the following figure, Baidu HTML has the id attribute = "kw", so elements can be operated based on id. The function implemented by the following code is to open Baidu and enter Li Bai into the input box.
Insert image description here

# @Author : 小红牛
# 微信公众号:WdPython
import time
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get('https://www.baidu.com')

# 查找输入框,并输入关键词李白
element = driver.find_element(By.ID, 'kw')
element.send_keys('李白')
time.sleep(5)
# 关闭网页
driver.quit()

(2) Positioning based on name: In HTML, the functions of the name attribute and the id attribute are basically the same, except that the name attribute is not unique. If there is no id tag, we can consider positioning through the name tag.
Baidu search box element html structure:

<input type="text" class="s_ipt" name="wd" id="kw" maxlength="100" autocomplete="off">

Element positioning:

element = driver.find_element(By.NAME, 'wd')

(3) Positioning through class name: We can also locate elements based on class attributes. Usually when we see multiple parallel elements such as a list form, the same class is used.
Baidu search box element html structure:

<input type="text" class="s_ipt" name="wd" id="kw" maxlength="100" autocomplete="off">

Element positioning:

element = driver.find_element(By.CLASS_NAME, 's_ipt')

(4) Tag tag positioning: HTML uses tags to define a type of function, such as input, table, table, tbody, etc. Each element is actually a tag. Since a tag is used to define a type of function, a web page often has many similar tags, so it is difficult to distinguish different elements through tags.
b war search box element html structure

<input class="nav-search-input" type="text" autocomplete="off" accesskey="s" maxlength="100" x-webkit-speech="" x-webkit-grammar="builtin:translate" value="" placeholder="爬虫" title="爬虫">

Element positioning: Open station b, implement it in the search box, and automatically enter Jay Chou.

# @Author : 小红牛
# 微信公众号:WdPython
import time
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get('https://www.bilibili.com/')
element = driver.find_element(By.TAG_NAME, 'input')
element.send_keys('周杰伦')
time.sleep(5)
# 关闭网页
driver.quit()	

(5) Link text positioning, positioning elements through the text of hyperlinks, such as opening the Baidu homepage and simulating clicking on a news link to jump to a new web page.
Baidu search box, news element html structure:

<a href="http://news.baidu.com" target="_blank" class="mnav c-font-normal c-color-t">新闻</a>

Element positioning:

import time
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get('https://www.baidu.com')
# 打开百度首页,点击新闻链接
element = driver.find_element(By.LINK_TEXT, '新闻')
element.click()
time.sleep(5)
# 关闭网页
driver.quit()

(6) Partial link text positioning: Sometimes the text of a hyperlink is very long. If we enter all of it, it will be troublesome and the code will look very unsightly. At this time, we can intercept only part of the string and perform fuzzy matching.

Baidu search box, news element html structure:

<a href="http://news.baidu.com" target="_blank" class="mnav c-font-normal c-color-t">新闻</a>

Element positioning: Or click on the "news link". This will allow you to locate the element by just entering "smell".

element = driver.find_element(By.PARTIAL_LINK_TEXT, '闻')

(7) XPath positioning: Xpath is a language for finding information in XML and HTML documents. When locating elements through Xpath paths, there are also absolute paths and relative paths. The following uses positioning the Baidu search box as an example to explain. As shown in the following figure in the developer tools, you do not need to write the xp syntax yourself. Copy the Xpath directly to indicate a relative path, and copy the complete Xpath to indicate an absolute path.
Insert image description here
Absolute path: represents a page element. In the HTML code structure of a web page, the page element that needs to be located is searched layer by layer from the root node. The absolute path starts with a forward slash (/), and each step is separated by a slash. .

element = driver.find_element(By.XPATH, '/html/body/div[2]/div[2]/div[5]/div[1]/div/form/span[1]/input')
element.send_keys('李白')

The above xpath positioning expression starts from the root node (html node) of the html dom tree and searches layer by layer, and finally locates the "input" node. Features: The path is unique, but it is easily affected by page changes. Even if the page code structure only changes slightly, it may cause the original valid xpath positioning expression to fail.

Relative path: Selects nodes in the document starting from the current node that matches the selection, regardless of their position, starting with a double slash (//). The xpath positioning expression of the relative path is more concise, and it is recommended to use the xpath expression of the relative path.

element = driver.find_element(By.XPATH, '//*[@id="kw"]')
element.send_keys('李白')

(8) CSS selector positioning is used to position page elements. CSS positioning can be done through id selectors, class selectors, label selectors and attribute selectors.

Baidu search box element html structure:

<input type="text" class="s_ipt" name="wd" id="kw" maxlength="100" autocomplete="off">

Element positioning:

# @Author : 小红牛
# 微信公众号:WdPython
# 1. id选择器, 用#号 来定义
element = driver.find_element(By.CSS_SELECTOR, '#kw')
# 2.class选择器,用 .来定义
# element = driver.find_element(By.CSS_SELECTOR, '.s_ipt')
# 3.标签属性定位,格式:[属性名=”属性值”],或标签名[属性名=属性值]
# element = driver.find_element(By.CSS_SELECTOR, 'input[id="kw"]')
# element = driver.find_element(By.CSS_SELECTOR, '[autocomplete="off"]')
# 4.组合定位写法
# element = driver.find_element(By.CSS_SELECTOR, 'input.s_ipt')
# element = driver.find_element(By.CSS_SELECTOR, 'input#kw')

complete! ! thanks for watching

----------★★Historical blog post collection★★----------
My zero-based Python tutorial, Python introduction to advanced video tutorial Py installation py project Python module Python crawler Json
Insert image description here

Guess you like

Origin blog.csdn.net/gxz888/article/details/135288073