CSS/Xpath selector number of child nodes/parent nodes/sibling nodes

0. Reference

 

1. Initialization

In [325]: from scrapy import Selector

In [326]: text="""
     ...: <div>
     ...:     <a>1a</a>
     ...:     <p>2p</p>
     ...:     <p>3p</p>
     ...: </div>"""

In [327]: sel=Selector(text=text)

In [328]: print(sel.extract())
<html><body><div>
    <a>1a</a>
    <p>2p</p>
    <p>3p</p>
</div></body></html>

 

 

2. Parent node/previous next sibling node

In [329]: sel.xpath('//a/parent::*/p').extract()
Out[329]: ['<p>2p</p>', '<p>3p</p>']

In [330]: sel.xpath('//p/preceding-sibling::a').extract()
Out[330]: ['<a>1a</a>']

In [331]: sel.xpath('//a/following-sibling::p').extract()
Out[331]: ['<p>2p</p>', '<p>3p</p>']

 

3. CSS first few child nodes

3.1 General

#Complete list of child nodes, counting from the first child node, and satisfying the child node tag limit 
In [332]: sel.css( ' a :nth-child(1) ' ).extract()
Out[ 332]: [ ' <a>1a</a> ' ]
 #Complete list of child nodes, counting from the last child node, and satisfying the child node tag limit 
In [333]: sel.css( ' a :nth -last-child(1) ' ).extract()
Out[333]: []


In [334]: sel.css('p:nth-child(1)').extract()
Out[334]: []

In [335]: sel.css('p:nth-child(2)').extract()
Out[335]: ['<p>2p</p>']

In [336]: sel.css('p:nth-child(3)').extract()
Out[336]: ['<p>3p</p>']

In [337]: sel.css('p:nth-last-child(1)').extract()
Out[337]: ['<p>3p</p>']

In [338]: sel.css('p:nth-last-child(2)').extract()
Out[338]: ['<p>2p</p>']

In [339]: sel.css('p:nth-last-child(3)').extract()
Out[339]: []

 

3.2 Special designation

In [340]: sel.css('a:first-child').extract()
Out[340]: ['<a>1a</a>']

In [341]: sel.css('a:last-child').extract()
Out[341]: []

In [342]: sel.css('p:first-child').extract()
Out[342]: []

In [343]: sel.css('p:last-child').extract()
Out[343]: ['<p>3p</p>']

 

3.3 The above -child is modified to -of-type, and only the filtered corresponding child node list is counted

4. The first few child nodes of Xpath

In [344]: sel.xpath('//div').extract()
Out[344]: ['<div>\n    <a>1a</a>\n    <p>2p</p>\n    <p>3p</p>\n</div>']

In [345]: sel.xpath('//div/*').extract()
Out[345]: ['<a>1a</a>', '<p>2p</p>', '<p>3p</p>']

In [346]: sel.xpath('//div/node()').extract()
Out[346]: ['\n    ', '<a>1a</a>', '\n    ', '<p>2p</p>', '\n    ', '<p>3p</p>', '\n']

In [347]: sel.xpath('//div/a').extract()
Out[347]: ['<a>1a</a>']

In [348]: sel.xpath('//div/p').extract()
Out[348]: ['<p>2p</p>', '<p>3p</p>']

In [349]:

In [349]: sel.xpath('//div/a[1]').extract()
Out[349]: ['<a>1a</a>']

In [350]: sel.xpath('//div/a[last()]').extract()
Out[350]: ['<a>1a</a>']

In [351]:

In [ 351]: sel.xpath( ' //div/p [1] ' ).extract() #equivalent     to the filtered list of child nodes 
Out[351]: [ ' <p>2p</p> ' ]

In [352]: sel.xpath('//div/p[last()]').extract()
Out[352]: ['<p>3p</p>']

In [353]: sel.xpath('//div/p[last()-1]').extract()
Out[353]: ['<p>2p</p>']

In [354]:

In [ 354]: sel.xpath( ' //div/*[1] ' ).extract() #Complete         list of child nodes 
Out[354]: [ ' <a>1a</a> ' ]

In [355]: sel.xpath('//div/*[last()]').extract()
Out[355]: ['<p>3p</p>']

In [356]:

In [ 356]: sel.xpath( ' //div/node()[1] ' ).extract() #Include    plain text     
Out[356]: [ ' \n     ' ]

In [357]: sel.xpath('//div/node()[last()]').extract()
Out[357]: ['\n']

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325306053&siteId=291194637
Recommended