1. 基于bs4库的HTML内容查找方法

1.1 <>.find_all() 和 re （正则表达式库）

（1）参数为单一字符串

（2）参数为列表

（3）参数为True，则返回所有标签内容

（4）显示以 b 开头的标签，如 b，body。（使用 re：正则表达式库）

import requests
from bs4 import BeautifulSoup
import re

r = requests.get("http://python123.io/ws/demo.html")

demo = r.text

soup = BeautifulSoup(demo, "html.parser")
for tag in soup.find_all(re.compile('b')):
    print(tag.name)

（5）find_all中的 attrs：返回带有 attr属性值的 name标签

python爬虫笔记（五）网络爬虫之提取—信息组织与提取方法（3）基于bs4库的HTML内容查找方法

1. 基于bs4库的HTML内容查找方法

1.1 <>.find_all() 和 re （正则表达式库）

猜你喜欢