Simple to use Beautiful Soup library

First, the simple use of the library BeautifulSoup

import requests
r=requests.get("http://python123.io/ws/demo.html")
demo=r.text
from bs4 import BeautifulSoup  #导入BeautifulSoup库
soup=BeautifulSoup(demo,"html.parser")  #使用html.parser进行解析
print(soup.prettify())  #打印解析结果

The entire contents of a correspondence BeautifulSoup HTML / XML documents

BeautifulSoup === === BeautifulSoup class tag tree

Second, the use of simple labels BeautifulSoup

Using the following .name, .attrs peer access names, attributes:

HTML-based content traversal methods bs4 library:

Simple code:

import requests
r=requests.get("http://python123.io/ws/demo.html")
demo=r.text
from bs4 import BeautifulSoup  #导入BeautifulSoup库
soup=BeautifulSoup(demo,"html.parser")  #使用html.parser进行解析
print(soup.title)  #若有多个类似标签,仅返回第一个
print(soup.a)
print(soup.a.name)   #返回名字
print(soup.a.parent.name)

tag=soup.a
print(tag.attrs)   #属性
print(tag.attrs['class'])
print(tag.string)   #两个尖括号之间的内容
print(type(tag.string))

print(soup.head)
print(soup.head.contents)   #contents返回其儿子

print(soup.title.parent)  #返回父节点

Third, find a method based on HTML content bs4 library

Such as:

soup.find_all(string='Basic Python')
soup.find_all(id=re.compile('link'))   #正则表达式的模糊查询

Extension method:

 

Published 462 original articles · won praise 55 · views 320 000 +

Guess you like

Origin blog.csdn.net/LY_624/article/details/105149023