Regular expression, the latter acquired a href data

<div class="share-person-data-top">
  <a href="/share/home?uk=3924974212&suk=mOZidGjjyKS6Y6NecksgaQ" target="_blank" title="å»Taç
                                                                                           å享主页" class="share-
person-username global-ellipsis">ç¯å**å享</a>
  <a href="//yun.baidu.com/buy/center?tag=1&from=sicon" class="unvip-icon sicon">
  <em></em>
  </a>
</div>

Above: There <a href under div. We need to get data href

 
First, the data acquired in the regular div, response content for the return, and to output text, i.e., returns the contents of the above html
 
tr_content = re.findall('<div class="share-person-data-top">(.*?)</div', response, re.S)[0]

 

Print tr_content

 

 Afterwards, regular data acquisition href

= the re.findall td_content ( ' <A href. *? = "(. +)." *?> (. *?) </a> ' , tr_content, re.S) # n value is acquired href

Print td_content

 

 Removing the outermost "[]"

print(td_content[0])

 

 Remove "3924974212" and print

td_content = re.findall("\d+", td_content, re.S)
print(td_content[0])

 

Guess you like

Origin www.cnblogs.com/becks/p/12499345.html