A mounting and import library Beautiful Soup
————Beautiful Soup库是解析、遍历、维护“标签树”的功能库
-
Installation:
Win platform: "Run as administrator" cmdExecution pip install beautifulsoup4 -
Import module
Beautiful Soup library, also called beautifulsoup4 or bs4
convention as reference, i.e. mainly used class BeautifulSoup
from bs4 import BeautifulSoup 引入bs4库的BeautifulSoup类功能模块
import bs4 引入整个bs4库
Two, BeautifulSoup class of analytic basic principle
By parser, parsing HTML / XML tags tree, to obtain the desired information.
Parser:
Third, the basic elements of the class BeautifulSoup
Four, HTML content traversal methods bs4 library-based (call mode:.. Soup label attribute)
- Traversing the tree downlink tag
遍历儿子节点
for child in soup.body.children:
print(child)
遍历子孙节点
for child in soup.body.descendants:
print(child)
- Traversing up the tree tag
note:
3. parallel tree traversal tag
note:
Five, HTML-based format output bs4 library
- bs4 library prettify () method (called by:soup.prettify())
- Coding bs4 library
Sixth, find the library provides methods bs4
<>.find_all(name,attrs,recursive, string, **kwargs)
-
name : string to retrieve the tag name
-
attrs : search character string tag attribute values, attribute search can be labeled
as: id = "", class = ""
-
recursive This : whether to retrieve all descendants, default True
-
String : <> ... </> string retrieving character string region
returns a list type, memory lookup results
note: Because the lookup function more commonly used, so: