爪巴虫根据text文本内容搜索标签

本文地址:https://goodgoodstudy.blog.csdn.net/article/details/108585966

在这里插入图片描述

from bs4 import BeautifulSoup

bs = BeautifulSoup(html)

col = bs.find('div', {
    
    'class':'col'})

col.findAll('a')
"""
[<a href="/paper/2020">Proceedings of the International Conference on Machine Learning 1  pre-proceedings (ICML 2020)</a>,
 <a class="btn btn-light btn-sm btn-spacer disabled" download="" href="/paper/2020/file/ec7f346604f518906d35ef0492709f78-Bibtex.bib">Bibtex »</a>,
 <a class="btn btn-light btn-sm btn-spacer" href="/paper/2020/file/ec7f346604f518906d35ef0492709f78-Metadata.json">Metadata »</a>,
 <a class="btn btn-light btn-sm btn-spacer" href="/paper/2020/file/ec7f346604f518906d35ef0492709f78-Paper.pdf">Paper »</a>,
 <a class="btn btn-light btn-sm btn-spacer" href="/paper/2020/file/ec7f346604f518906d35ef0492709f78-Supplemental.pdf">Supplemental »</a>]
"""

现在需要找得是 text 部分含有 supplement 的 a 标签

import re
col.findAll('a',text= re.compile('Supplemental.*'))
"""
[<a class="btn btn-light btn-sm btn-spacer" href="/paper/2020/file/ec7f346604f518906d35ef0492709f78-Supplemental.pdf">Supplemental »</a>]
"""

成功!

猜你喜欢

转载自blog.csdn.net/itnerd/article/details/108585966