python reptile notes (five) extraction web crawler - The organization and information extraction method (3) Find a method based on HTML content bs4 library

1. Find a method based on HTML content bs4 library

1.1 <>. Find_all () and re (regular expression library)

 

(1) a single parameter string

 

(2) parameter list

 

(3) parameter is True, it returns the contents of all labels

 

(4) shown in  tags beginning, such as b, body. (Use re: regular expression library)

import requests
from bs4 import BeautifulSoup
import re

r = requests.get("http://python123.io/ws/demo.html")

demo = r.text

soup = BeautifulSoup(demo, "html.parser")
for tag in soup.find_all(re.compile('b')):
    print(tag.name)

(5)find_all中的 attrs:返回带有 attr属性值的 name标签

 

Guess you like

Origin www.cnblogs.com/douzujun/p/12241185.html