Beautiful Soup library --HTML / XML parsing page

A mounting and import library Beautiful Soup

	————Beautiful Soup库是解析、遍历、维护“标签树”的功能库	
  1. Installation:
    Win platform: "Run as administrator" cmdExecution pip install beautifulsoup4

  2. Import module
    Beautiful Soup library, also called beautifulsoup4 or bs4
    convention as reference, i.e. mainly used class BeautifulSoup


from bs4 import BeautifulSoup 引入bs4库的BeautifulSoup类功能模块

import bs4		引入整个bs4库

Two, BeautifulSoup class of analytic basic principle

Here Insert Picture Description
By parser, parsing HTML / XML tags tree, to obtain the desired information.
Parser:
Here Insert Picture Description

Third, the basic elements of the class BeautifulSoup

Here Insert Picture Description
Here Insert Picture Description

Four, HTML content traversal methods bs4 library-based (call mode:.. Soup label attribute)

Here Insert Picture Description

  1. Traversing the tree downlink tag

Here Insert Picture Description

遍历儿子节点
for	child in soup.body.children:
	print(child)
	
遍历子孙节点
for	child in soup.body.descendants:
	print(child)

  1. Traversing up the tree tag

Here Insert Picture Description

note:
Here Insert Picture Description
3. parallel tree traversal tag
Here Insert Picture Description
note
Here Insert Picture Description

Five, HTML-based format output bs4 library

  1. bs4 library prettify () method (called by:soup.prettify()
    Here Insert Picture Description
  2. Coding bs4 library
    Here Insert Picture Description

Sixth, find the library provides methods bs4

<>.find_all(name,attrs,recursive, string, **kwargs)

  1. name : string to retrieve the tag name
    Here Insert Picture Description

  2. attrs : search character string tag attribute values, attribute search can be labeled
    as: id = "", class = ""

  3. recursive This : whether to retrieve all descendants, default True
    Here Insert Picture Description

  4. String : <> ... </> string retrieving character string region
    returns a list type, memory lookup results
    Here Insert Picture Description

note: Because the lookup function more commonly used, so:
Here Insert Picture DescriptionHere Insert Picture Description

Published 17 original articles · won praise 0 · Views 310

Guess you like

Origin blog.csdn.net/L_xiao_jie/article/details/104253806