pyquery concerning nth-child (n) Selection of the elements not specified

HTML Sample Code

html = "<div class="update_details">
			<a href="xxxx">2019</a>
			<br>  //注意这里有两个br标签
			<br>
			<a href="xxxx">2020</a>
		</div>"

We want to select the second a标签中的内容2020, you may do so

from pyquery.pyquery import PyQuery as pq
doc = pq(html)
second_a = doc(".update_details a:nth-child(2)").text()

The result is you have nothing to take.
why? Because you think the order is this:

  1. First look所有a标签
  2. Remove the second ofa标签

The fact of the order is this:

  1. Find all the class="update_details"sub-elements
  2. Remove 第二个subelements
  3. 判断The second child element is not a given that your child elements, if not also does not complain that it did not get the text

Proper operation

n的值应该是4

from pyquery.pyquery import PyQuery as pq
doc = pq(html)
second_a = doc(".update_details a:nth-child(4)").text()

That nth-child(n)there is a pit, see below

html = '<div class="update_details">
			<a href="xxxx">2019</a>
			<br>
			<br>
			<a href="xxxx">2020</a>
			<div class="inner">
				<a href="xxxx">2018</a>
				<a href="xxxx">2018</a>
				<a href="xxxx">2018</a>
				<a href="xxxx">2018</a>
			</div>
		</div>'

If the inner layer containing the same a标签and also in position 4, the above operation will 取出两个值come out.
If you only want to remove the internal a标签的值, external and want to remove, and how to do it? Proceed as follows:

  1. Remove the first value of the internal
  2. Delete the internal label
  3. Then remove the outer label content
inner_a = doc(".inner a:nth-child(4)").text() #取内部值
doc(".update_details").remove(".inner").find("a:nth-child(4)").text() #移除内部标签,再取出外部a标签的值
Published 141 original articles · won praise 131 · views 210 000 +

Guess you like

Origin blog.csdn.net/qq_41621362/article/details/104936140