First, install
Run directly in cmd pip install beautifulsoup4 installation
Second, the principle
BeautifulSoup4(html)
Get the node: find (), find_all () / select ()
Acquiring property: attrs
Get the text: text
Principle: beautifulsoup4 complex HTML document into a tree structure, each node is Python objects.
Third, the use
from bs4 import BeautifulSoup import requests url = "http://wsjkw.sc.gov.cn/scwsjkw/gzbd/fyzt.shtml" res = requests.get(url) res.encoding = "utf-8" html = res.text soup = BeautifulSoup(html) soup.find("h2").text a = soup.find("a") print(a) print(a.attrs) print(a.attrs["href"])
The above information is res.text BeautifulSoup (html) obtained analytical method, to find printing href
url_new = "http://wsjkw.sc.gov.cn" + a.attrs["href"] res = requests.get(url_new) res.encoding = "utf-8" soup = BeautifulSoup(res.text) soup.find("p")
A href above to find the url stitching, and used the CSS style tags to find the information you need
Fourth, the results show
V. Summary
beautifulsoup4 text is parsed request information requests, or information may then be taken to the desired content through different CSS tags.