Python 【解析库BeautifulSoup】

一.简介

二.安装命令

pip install beautifulsoup4

三.基本使用

1.基本使用

html ='''
<!DOCTYPE html>
<html>
<head>
    <title>故事</title>
</head>
<body>
   <p class="title" name="dromouse"><b>这个是dromouse</b></p>
   <p class="story">Once upon a time there were three little sister;
       and their names were
       <a href="http://www.baidu.com" class="sister" id="link1"><!--GH--></a>
       <a href="http://www.baidu.com/oracle" class="sister" id="link2">Local</a>and
       <a href="http://www.baidu.com/title" class="sister" id="link3">Tillie</a>;
   and they lived at the bottom of a well.</p>
   <p class="story">...</p>

</body>
</html>

'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html,'lxml')

#将网页以标准格式输出
soup.prettify()

#输出title节点的内容
title = soup.title.string

print(title)
View Code

2.节点选择器

html ='''
<!DOCTYPE html>
<html>
<head>
    <title>故事</title>
</head>
<body>
   <p class="title" name="dromouse"><b>这个是dromouse</b></p>
   <p class="story">Once upon a time there were three little sister;
       and their names were
       <a href="http://www.baidu.com" class="sister" id="link1"><!--GH--></a>
       <a href="http://www.baidu.com/oracle" class="sister" id="link2">Local</a>and
       <a href="http://www.baidu.com/title" class="sister" id="link3">Tillie</a>;
   and they lived at the bottom of a well.</p>
   <p class="story">...</p>

</body>
</html>

'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html,'lxml')

#将网页以标准格式输出
soup.prettify()

#输出title节点的内容
title = soup.title.string

#输出节点的名称
name = soup.title.name

print(name)
View Code

猜你喜欢

转载自www.cnblogs.com/Crown-V/p/12726000.html