XPATH

Today I learned the spiders part of scrapy, the crawler name, the starting point of start_url, and the syntax of xpath:

nodename Selects all child nodes of this node.
/ Pick from the root node.
// Selects nodes in the document from the current node that matches the selection, regardless of their position.
. Select the current node.
.. Select the parent node of the current node.
@ Select properties.

bookstore Selects all child nodes of the bookstore element.
/bookstore

Select the root element bookstore.

Note: A path always represents an absolute path to an element if it starts with a forward slash ( / )!

bookstore/book Selects all book elements that are children of bookstore.
//book Selects all book child elements, regardless of their position in the document.
bookstore//book Selects all book elements that are descendants of the bookstore element, regardless of where they are located below the bookstore.
//@lang Select all properties named lang.
/bookstore/book[1] Selects the first book element that is a child element of the bookstore.
/bookstore/book[last()] Selects the last book element that is a child element of the bookstore.
/bookstore/book[last()-1] Selects the penultimate book element that is a child element of the bookstore.
/bookstore/book[position()<3] Selects the first two book elements that are children of the bookstore element.
//title[@lang] Selects all title elements that have an attribute named lang.
// title [@ lang = 'eng'] Selects all title elements that have a lang attribute with a value of eng.
/bookstore/book[price>35.00] Selects all book elements of the bookstore element, and the value of the price element must be greater than 35.00.
/bookstore/book[price>35.00]/title 选取 bookstore 元素中的 book 元素的所有 title 元素,且其中的 price 元素的值须大于 35.00。
* 匹配任何元素节点。
@* 匹配任何属性节点。
node() 匹配任何类型的节点。
/bookstore/* 选取 bookstore 元素的所有子元素。
//* 选取文档中的所有元素。
//title[@*] 选取所有带有属性的 title 元素。
//book/title | //book/price 选取 book 元素的所有 title 和 price 元素。
//title | //price 选取文档中的所有 title 和 price 元素。
/bookstore/book/title | //price 选取属于 bookstore 元素的 book 元素的所有 title 元素,以及文档中所有的 price 元素。

XPath 轴(Axes)

轴可定义相对于当前节点的节点集。

轴名称 结果
ancestor 选取当前节点的所有先辈(父、祖父等)。
ancestor-or-self 选取当前节点的所有先辈(父、祖父等)以及当前节点本身。
attribute 选取当前节点的所有属性。
child 选取当前节点的所有子元素。
descendant 选取当前节点的所有后代元素(子、孙等)。
descendant-or-self 选取当前节点的所有后代元素(子、孙等)以及当前节点本身。
following 选取文档中当前节点的结束标签之后的所有节点。
following-sibling 选取当前节点之后的所有兄弟节点
namespace 选取当前节点的所有命名空间节点。
parent 选取当前节点的父节点。
preceding 选取文档中当前节点的开始标签之前的所有节点。
preceding-sibling 选取当前节点之前的所有同级节点。
self 选取当前节点。












Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325715300&siteId=291194637