Use beautifulsoup4 analytical content

First, install

  Run directly in cmd pip install beautifulsoup4 installation

Second, the principle

  BeautifulSoup4(html)

    Get the node: find (), find_all () / select ()

    Acquiring property: attrs

    Get the text: text

  Principle: beautifulsoup4 complex HTML document into a tree structure, each node is Python objects.

Third, the use

  

from bs4 import BeautifulSoup
import requests

url = "http://wsjkw.sc.gov.cn/scwsjkw/gzbd/fyzt.shtml"
res = requests.get(url)
res.encoding = "utf-8"
html = res.text
soup = BeautifulSoup(html)
soup.find("h2").text
a = soup.find("a")
print(a)
print(a.attrs)
print(a.attrs["href"]) 

  The above information is res.text BeautifulSoup (html) obtained analytical method, to find printing href

url_new = "http://wsjkw.sc.gov.cn" + a.attrs["href"]
res = requests.get(url_new)
res.encoding = "utf-8"
soup = BeautifulSoup(res.text)
soup.find("p")

  A href above to find the url stitching, and used the CSS style tags to find the information you need

Fourth, the results show

 

 V. Summary

  beautifulsoup4 text is parsed request information requests, or information may then be taken to the desired content through different CSS tags.

Guess you like

Origin www.cnblogs.com/renleiblog/p/12627381.html