HTML Sample Code
html = "<div class="update_details">
<a href="xxxx">2019</a>
<br> //注意这里有两个br标签
<br>
<a href="xxxx">2020</a>
</div>"
We want to select the second a标签中的内容2020
, you may do so
from pyquery.pyquery import PyQuery as pq
doc = pq(html)
second_a = doc(".update_details a:nth-child(2)").text()
The result is you have nothing to take.
why? Because you think the order is this:
- First look
所有a标签
- Remove the second of
a标签
The fact of the order is this:
- Find all the
class="update_details"
sub-elements - Remove
第二个
subelements 判断
The second child element is not a given that your child elements, if not also does not complain that it did not get the text
Proper operation
n的值应该是4
from pyquery.pyquery import PyQuery as pq
doc = pq(html)
second_a = doc(".update_details a:nth-child(4)").text()
That nth-child(n)
there is a pit, see below
html = '<div class="update_details">
<a href="xxxx">2019</a>
<br>
<br>
<a href="xxxx">2020</a>
<div class="inner">
<a href="xxxx">2018</a>
<a href="xxxx">2018</a>
<a href="xxxx">2018</a>
<a href="xxxx">2018</a>
</div>
</div>'
If the inner layer containing the same a标签
and also in position 4, the above operation will 取出两个值
come out.
If you only want to remove the internal a标签的值
, external and want to remove, and how to do it? Proceed as follows:
- Remove the first value of the internal
- Delete the internal label
- Then remove the outer label content
inner_a = doc(".inner a:nth-child(4)").text() #取内部值
doc(".update_details").remove(".inner").find("a:nth-child(4)").text() #移除内部标签,再取出外部a标签的值