- html格式化:"<html>
<head>
<title>
Page title
</title>
</head>
<body>
<p align="center" id="firstpara">
This is paragraph
<b>
one
</b>
</p>
<p align="blah" id="secondpara">
This is paragraph
<b>
two
</b>
</p>
</body>
</html>"soup = BeautifulSoup(html) print soup.prettify()
- 获取标签第一个相应的标签: soup.标签名
print soup.head 输出: <head><title>Page title</title></head>
- 获取相应标签的内容获取的是第一个:soup.title.string
- 获取所有p标签的:
soup = BeautifulSoup(''.join(doc),'lxml') print soup.find_all('p')
- 根据属性查找某个标签:
soup.find(id = 'firstpara')
- 获取html所有的内容,不包括标签:
soup.get_text()
- 修改某个标签的内容 replace_with:
soup = BeautifulSoup(''.join(doc),'lxml') tag = soup.title tag.string.replace_with('hello word hh')
- 将某个标签的子节点,以列表形式输出:
soup.head.contents
- 获取父节点:
soup.title.parent
- css选择器方式查找的 id属性是#,class是.:
soup.select('#firstpara')
- 根据属性值查找:
soup.select('p[id= "secondpara"] ')
详情请点击:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html
BeautifulSoup4基本用法
猜你喜欢
转载自blog.csdn.net/xxy_yang/article/details/92766424
今日推荐
周排行