python bs4 library

Beautiful Soup parses anything you give it, and does the tree traversal stuff for you.

BeautifulSoup library is resolved, traverse, Maintenance "tag tree" function library (traversal refers to a strip search along the route followed by each node of the tree were done only once and do a visit). https://www.crummy.com/software/BeautifulSoup

BeautifulSoup library we often call bs4, import the library: from bs4 import BeautifulSoup. Wherein, import BeautifulSoup i.e. primarily in bs4 BeautifulSoup class.

bs4 parser library

The basic elements of the class BeautifulSoup

Import Requests. 1 
 2 from the BeautifulSoup Import BS4 
 . 3 
 . 4 requests.get RES = ( 'http://www.pmcaff.com/site/selection') 
 . 5 = the BeautifulSoup Soup (res.text, 'lxml') 
 . 6 Print (Soup. a) 
 . 7 # any tag present in HTML syntax can be used Soup. <tag> to obtain access, the presence of a plurality of identical <tag> when the corresponding content, Soup. <tag> when returns the first HTML document. 
 . 8 
 . 9 Print (soup.a.name) 
10 # each <tag> has its own name, can be obtained by .name <tag>, string type 
. 11 
12 is Print (soup.a.attrs) 
13 is Print (Soup. a.attrs [ 'class']) 
14 # a <tag> may have one or more attributes, a type dictionary 
15 
16 Print (soup.a.string) 
. 17 # <tag> .string tag can take into nonattributed the string 
18 is 
. 19 soup1 the BeautifulSoup = ( '<P> <-! here is a comment -> </ p>', 'lxml'
22 # comment is a special type, can also be taken through .string <tag>

operation result:

<a class="no-login" href="">登录</a>

a

{'href': '', 'class': ['no-login']} ['no-login']

log in

Here is a comment

<class 'bs4.element.Comment'>

HTML content bs4 traversal library

The basic structure of the HTML

Traversing the tree downlink tag

Wherein, BeautifulSoup tag type is the root of the tree.

Traversing son node # 1 
2 for Child in soup.body.children: 
. 3 Print (child.name) 
. 4 
. 5 # descendant node traversal 
. 6 in soup.body.descendants for Child: 
. 7 Print (child.name)

Traversing up the tree tag

# 1 when traversing all ancestor nodes, including soup itself, so if ... else ... Analyzing 
2 for parent in soup.a.parents: 
. 3 None IF parent IS: 
. 4 Print (parent) 
. 5 the else: 
. 6 Print ( parent.name)

operation result:

div

div

body

html

[document]

Tag tree traversal parallel

1 # 遍历后续节点
2 for sibling in soup.a.next_sibling:
3     print(sibling)
4 
5 # 遍历前续节点
6 for sibling in soup.a.previous_sibling:
7     print(sibling)

bs4库的prettify()方法

prettify()方法可以将代码格式搞的标准一些,用soup.prettify()表示。在PyCharm中,用print(soup.prettify())来输出。

Guess you like

Origin www.cnblogs.com/ltn26/p/10983836.html