1. Introduction to XPath syntax
XPath (XML Path Language) is a language specified by the international standardization organization W3C for selecting nodes in XML and HTML documents.
Currently, mainstream browsers (chrome, firefox, edge, safari) all support XPath syntax. XPath has two versions, 1 and 2. Currently, browsers support XPath 1 syntax.
Why learn Xpath? because
- In some scenarios, it is troublesome to use css, id... to select web elements, but XPath is more convenient.
- In addition, XPath is used in other fields, such as the crawler framework Scrapy and the mobile app framework Appium.
In actual work, only xpath and css positioning methods can be used for projects. So which one should we choose to apply to our project? Then let me tell you: use whichever one you are used to, or you can combine the two. The syntax of CSS is relatively concise and the running speed is slightly faster, but generally speaking, because xpath has more functions and is more powerful , and xpath can be positioned based on text , this is relatively powerful.
- Reason for sharing XPath: Since XPath is used more frequently in UI automation, it is necessary to learn it.
Practice code:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Xpath选择</title>
</head>
<style>
/**{margin: 0; padding: 0;}*/
body{width: 600px; height: 1000px; margin: 0 auto;}
body{line-height: 30px;}
h3{color: brown;}
h4{color: rgb(22, 118, 173)}
.select_city{color: brown; font-weight: bold;}
select option{line-height: 30px;}
</style>
<body>
<h3>select框</h3>
<div>
<h4>单选</h4>
<p>姓名:</p>
<select class="choose_1" >
<option value="小江老师">小江老师</option>
<option value="小雷老师">小雷老师</option>
<option value="小凯老师" selected="selected">小凯老师</option>
</select>
<h4>多选</h4>
<p>课程:</p>
<select class="choose_2" multiple>
<option value="小江老师">小江老师</option>
<option value="小雷老师">小雷老师</option>
<option value="小王老师">小王老师</option>
<option value="小凯老师" selected="selected">小凯老师</option>
</select>
<p class="select_city">城市选择</p>
<div id="china">
<p id="beijing" class='capital huge-city'>北京</p>
<div><p>深圳</p></div>
<p id="shanghai" class='huge-city'>上海</p>
</div>
<div id="us">
<span id="west" style="color:darkgreen">
<a id="newyork">纽约</a>
<a id="huston">休斯顿</a>
</span>
<span id="east" style="color:darkred">
<a id="chigaco">芝加哥</a>
</span>
</div>
</div>
<form action="" method="post">
<h4>登录模块</h4>
<p><span name='user'>账号</span><input type="text" name="user" id="user" value="1" placeholder="请输入账号" /></p>
<p><span>密码</span><input type="password" name="pwd" id="" placeholder="请输入密码" /></p>
<button><span>登录</span></button>
</form>
</body>
</html>
Press F12 in the browser to open the debug window and click on the Elements tab.
To verify that the XPath syntax can successfully select elements, press the key combination Ctrl + F and a search box will appear.
In XPath syntax, the root node of the entire HTML document is represented by "/". If we want to select the html node below the root node, we can enter it in the search box
/html
If you enter the following expression
/html/body/div
This expression represents selecting the div element below the body below the html.
/ is a bit like > in CSS , indicating a direct child node relationship.
1. Absolute path selection
Starting from the root node and reaching a certain node, each level is written down in turn. The expression separated by / between each level is the absolute path of an element.
The above XPath expression /html/body/div is an absolute path XPath expression, which is equivalent to the css expression html>body>div
To use XPath to select web elements, an automation program should call the method find_element_by_xpath or find_elements_by_xpath of the WebDriver object , like this:
elements = driver.find_elements(By.XPATH, "/html/body/div")
2. Relative path selection
Sometimes, we need to select an element on the web page, no matter where it is .
For example, if you select all elements with tags named div on the sample page , if you use CSS expressions, just write a div directly .
So how does XPath achieve the same function? XPath needs to be preceded by // , which means to search for all descendant elements from the current node downwards, no matter where it is.
So the XPath expression should be written like this: //div
The "//" symbol can also be added at the end. For example, if you want to select all p elements in all div elements, no matter where the div is, or where the p element is below the div, you can write // div//p
The corresponding elements in automation are positioned as follows:
elements = driver.find_elements(By.XPATH, "//div//p")
If using CSS selector, the corresponding code is as follows
elements = driver.find_elements(By.XPATH, "//div//p")
If using CSS selector, the corresponding code is as follows
elements = driver.find_elements(By.CSS_SELECTOR,"div p")
elements = driver.find_elements(By.CSS_SELECTOR,"div > p")
If you want to select the direct child nodes p in all div elements, XPath should be written like this //div/p
div > p if using CSS selectors
3. Wildcard
If you want to select all direct children of all div nodes, you can use the expression //div/*
* is a wildcard character corresponding to an element with any node name, equivalent to the CSS selector div > *
code show as below:
elements = driver.find_elements(By.XPATH, "//div/*")
for element in elements:
print(element.get_attribute('outerHTML'))
2. Select based on attributes
XPath can select elements based on attributes.
Selecting elements based on attributes is done in this format [@attribute name='attribute value']
Notice:
- Please note that there is an @ in front of the attribute name.
- Attribute values must be quoted. They can be single quotes or double quotes.
1. Select based on id attribute
To select the element with id west, you can do this //*[@id='west']
2. Select according to class attribute
To select elements with class choose_1 among all select elements, you can do this //select[@class='choose_1']
If an element has multiple classes, such as
<p id="beijing" class='capital huge-city'>北京</p>
If you want to select it, the corresponding xpath should be //p[@class="capital huge-city"]
You can't just write one attribute, like this //p[@class="capital"] will not work
//p[@class="capital huge-city"]
3. According to other attributes
In the same way, we can also use other attribute selections
For example, to select all page elements with the multiple attribute, you can do this //*[@multiple]
//*[@multiple]
4. The attribute value contains a string
To select page elements whose style attribute value contains the color string, you can do this //*[contains(@style,'color')]
To select page elements whose style attribute value starts with color string, you can write //*[starts-with(@style,'color')] , //*[starts-with(@style,'c')] like this Just the beginning part
To select page elements whose style attribute value ends with a certain string, you can guess that it is //*[ ends-with (@style,'color')] . Unfortunately, this is the syntax of XPath 2.0. Currently, browsers not support
//*[contains(@style,'color')]
//*[starts-with(@style,'color')]
//*[starts-with(@style,'c')]
Actual project address (the intranet environment project cannot be accessed from the Internet): http://192.168.1.171/
//input[@placeholder="请输入用户名"]
# 匹配input标签包含type属性的值
//input[contains(@type, 'password')]
//input[contains(@type, 'pa')] # 写一部分也是可以的
# 匹配 当文字中存在广州市 就满足
//div[@role='combobox']/div/div[contains(text(), '广州市')]
//div[contains(text(), '广州市')]
# 匹配a标签包含href属性的值
//a[contains(@href, 'news')]
# 匹配开始字段,ends-with方法不可用
//a[starts-with(@href,'https')]
# 匹配 当class中含有input 就满足条件
//input[contains(@class, 'input')]
在 XPath 中,可以使用 * 选择未知的节点,例如 //div/*/span,表示选择 div 节点下所有节点的 span 节点。
3. Select in order
CSS expressions can be selected based on the order of elements in the parent node, which is very practical. Using nth-child , span:nth-child(2) selects the second child element and is of span type
xpath can also select elements based on order. The syntax is more concise than CSS, using numbers directly in square brackets to indicate the order.
span:nth-child(2)
1. Which sub-element of a certain type
for example
To select the second child element of the span type, that is
//span[2]
Note that what is selected is the second child element of span type , not the second child element, and it is span type .
For another example, you want to select the parent element as the second child element of type p in a div.
//div/p[2]
2. Which sub-element?
You can also select the second child element, no matter what type it is, using wildcards
For example, select the second child element whose parent element is div, no matter what type it is.
//div/*[2]
3. The penultimate sub-element of a certain type
Of course, you can also select the last few sub-elements
for example:
- Select the penultimate child element of type p
//p[last()]
- Select the second to last child element of type p
//p[last()-1]
- Select the parent element as the third-to-last child element of p type in the div
//div/p[last()-2]
Xpath can also select all descendant elements (children, grandchildren, etc.) of the current node , using the syntax descendant::
For example, to select the child , grandson , etc. node p of the element with the id china, write like this
//*[@id='china']/descendant::p
4. Range selection
xpath can also select the order range of child elements.
for example,
- Select the 1st to 2nd child elements of option type
//option[position()<=2]
or
- Select the 1st to 2nd child elements of option type
//option[position()<3]
- Select the first 3 child elements whose class attribute is choose_1
//*[@class='choose_1']/*[position()<=3]
- Select the last three child elements whose class attribute is choose_1
//*[@class='choose_1']/*[position()>=last()-2]
Why not last()-3 ? because
last() itself represents the last element
last()-1 itself represents the second to last element
last()-2 itself represents the third to last element
4. Group selection, parent node, sibling node
1. Group selection
CSS has group selection, which can use multiple expressions at the same time. The results of multiple expression selections are the elements to be selected.
css group selection, expressions separated by commas
XPath also has group selection, which uses vertical bars to separate multiple expressions.
For example, to select all option elements and all h4 elements, you can use
//option | //h4
Equivalent to CSS selectors
option , h4
For another example, to select all elements with class choose_1 and class choose_2, you can use
//*[@class='choose_1'] | //*[@class='choose_2']
Equivalent to CSS selectors
.choose_1 , .choose_2
2. Select the parent node
XPath can select parent nodes , which is not possible with CSS .
The parent node of an element is represented by /..
For example, to select the parent node of the node with id china, you can write //*[@id='china']/.. like this .
When an element has no characteristics and can be selected directly, but it has child nodes with characteristics, you can use this method, first select the child node, and then specify the parent node.
You can also continue to find the upper parent node, such as //*[@id='china']/../../..
//*[@id='china']/..
//*[@id='china']/../../..
XPath can also select the parent node of the current node , using the syntax parent::
For example, to select the parent node div of the element with the id china , write like this
//*[@id='china']/parent::div
3. Sibling node selection
css selector, to select subsequent sibling nodes of a node, use wavy lines
Xpath can also select subsequent sibling nodes, using the syntax follow - sibling:: /ˈfɑːloʊɪŋ-ˈsɪblɪŋ/
For example, to select all subsequent sibling nodes of an element with class choose_1 / /*[@class='choose_1']/following-sibling::*
Equivalent to CSS selector.choose_1 ~ *
If you want to select the div node in the subsequent node, you should write //*[@class='choose_1']/following-sibling::div
XPath can also select the previous sibling node, using the syntax preceding-sibling:: /prɪˈsiːdɪŋ-ˈsɪblɪŋ/
For example, to select all the previous sibling nodes of the element with class choose_1, write like this
//*[@class='choose_1']/preceding-sibling::*
XPath can also select all nodes in the document after the closing tag of the current node, using the following syntax ::
For example, to select all option nodes after the closing tag of the p element //p/following::option
//p/following::option
XPath can also select all nodes in the document before the start tag of the current node, using the syntax preceding::
For example, to select all option nodes before the closing tag of the p element / /h4/preceding::option
//h4/preceding::option
The CSS selector currently has no way to select the previous sibling node.
To learn more about Xpath selection syntax, you can click here to open the Xpath Selector Reference Manual
5. XPath summary
1. Rely on your own attributes and text positioning
//span[text()='登录']
//div[contains(@class,'select_city')]
//input[@type='radio' and @value='1'] 多条件
//span[@name='user'][text()='账号'][1] 多条件
//span[@id='user' or text()='账号'] 找出多个
//span[text()='账号' or text()='密码'] 找出多个
2. Rely on parent node positioning
//div[@class='x-grid-col-name x-grid-cell-inner']/div
//div[@id='dynamicGridTestInstanceformclearuxformdiv']/div
//div[@id='test']/input
3. Rely on child node positioning
//div[div[@id='navigation']]
//div[div[@name='listType']]
//div[p[@name='testname']]
4. Mixed type
//div[div[@name='listType']]//img
//td[a//font[contains(text(),'Xpath 视频')]]//input[@type='checkbox']
5. Advanced part
//input[@id='test']/following-sibling::input 找下一个兄弟节点
//input[@id='test']/preceding-sibling::span 上一个兄弟节点
//input[starts-with(@id,'test')] 以什么开头
//span[not(contains(text(),'xpath'))] 不包含xpath字段的span
6. Index
//div/input[2]
//div[@id='position']/span[3]
//div[@id='position']/span[position()=3]
//div[@id='position']/span[position()>3]
//div[@id='position']/span[position()<3]
//div[@id='position']/span[last()]
//div[@id='position']/span[last()-1]
7. Substring interception judgment
//*[substring(@id,4,5)='Every']/@id 截取该属性 定位3,取长度5的字符
//*[substring(@id,4)='EveryCookieWrap'] 截取该属性从定位3 到最后的字符
//*[substring-before(@id,'C')='swfEvery']/@id 属性 'C'之前的字符匹配
//[substring-after(@id,'C')='ookieWrap']/@id 属性'C之后的字符匹配8.通配符
//span[@*='bruce']
//*[@name='bruce']
8. Shaft
//div[span[text()='测试']]/parent::div 找父节点
//div[span[text()='测试']]/ancestor::div 找祖先节点
9. Grandson node
//div[span[text()='测试']]/descendant::div/span[text()='测试']
//div[span[text()='测试']]//div/span[text()='测试'] 两个表达的意思一样
Disadvantages of Xpath positioning
Since xpath needs to traverse the page, the performance of locating elements is worse than other methods.
Not robust enough, xpath will change as page elements change
The compatibility is not good, and the implementation of xpath is different in different browsers.