- html格式化:"<html>
<head>
<title>
Page title
</title>
</head>
<body>
<p align="center" id="firstpara">
This is paragraph
<b>
one
</b>
</p>
<p align="blah" id="secondpara">
This is paragraph
<b>
two
</b>
</p>
</body>
</html>"soup = BeautifulSoup(html) print soup.prettify()
- Get the first corresponding label of the label: soup.label name
print soup.head 输出: <head><title>Page title</title></head>
- Get the content of the corresponding label to get the first one: soup.title.string
- Get all p tags:
soup = BeautifulSoup(''.join(doc),'lxml') print soup.find_all('p')
- Find a tag based on attributes:
soup.find(id = 'firstpara')
- Get all the content of html, excluding tags:
soup.get_text()
- Modify the content of a label replace_with:
soup = BeautifulSoup(''.join(doc),'lxml') tag = soup.title tag.string.replace_with('hello word hh')
- Output the child nodes of a label in the form of a list:
soup.head.contents
- Get the parent node:
soup.title.parent
- The id attribute searched by the css selector method is #, and the class is.:
soup.select('#firstpara')
- Find according to attribute value:
soup.select('p[id= "secondpara"] ')
详情请点击:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html
Basic usage of BeautifulSoup4
Guess you like
Origin blog.csdn.net/xxy_yang/article/details/92766424
Recommended
Ranking