BeautifulSoup模块的使用 - 代码天地

BeautifulSoup模块的使用

其他 2018-07-16 14:55:41 阅读次数: 0

"DOM"

# -*- encoding:utf-8 -*-

#BeautifulSoup操作DOM
#导入urllib2,BeautifulSoup,re模块
from bs4 import BeautifulSoup
import urllib2,re

#BeautifulSoup官方文档
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""

#获取baidu页面DOM操作对象
Response=urllib2.Request("http://www.badiu.com")
url=urllib2.urlopen(Response)
badiu_html=url.read()
#print(badiu_html)
#获取baidu页面的全部a节点的name,href,text

soup=BeautifulSoup(html_doc,'html.parser',from_encoding='utf-8')
#异常处理获取不到的节点
try:
nodeall=soup.find_all('a')
for link in nodeall:
print(link.name,link['href'],link.get_text())
except:
print('key wrong')
#正则获取单个p节点
p_node=soup.find('p',class_=re.compile('to'))
print(p_node.name,p_node.get_text())

扫描二维码关注公众号，回复： 2180842 查看本文章

猜你喜欢

转载自www.cnblogs.com/activecode/p/9317533.html

BeautifulSoup模块的使用

BeautifulSoup4模块的使用

[基础]-beautifulsoup模块使用详解

使用BeautifulSoup模块解析HTML

Python数据抓取_BeautifulSoup模块的使用

python3 BeautifulSoup模块使用

Python爬虫之Beautifulsoup模块的使用

python：BeautifulSoup 模块使用指南

02_BeautifulSoup4模块简介与使用

BeautifulSoup模块

BeautifulSoup 模块

python---requests和beautifulsoup4模块的使用

使用BeautifulSoup模块解析HTML(文件example.html）

BeautifulSoup 使用

BeautifulSoup使用

python中BeautifulSoup模块

爬虫----BeautifulSoup模块

爬虫之beautifulsoup模块

BeautifulSoup模块详解

BeautifulSoup 模块详解

BeautifulSoup解析模块

爬虫模块介绍--Beautifulsoup

BeautifulSoup模块详细介绍

beautifulsoup4模块

3. BeautifulSoup模块

爬虫BeautifulSoup模块（下）

Crawler - BeautifulSoup解析模块

BeautifulSoup模块的安装

用BeautifulSoup模块解析HTML

BeautifulSoup 模块安装和导入

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

国产云输入法——仅华为无云端数据上传安全问题

开源日报 | 工业开源项目OGG 1.0；姐姐，你要和我一起配置火狐吗；苹果AI遥遥落后？Fedora 40

周排行

购置笔记本常识

从源码看Spring Security之采坑笔记（Spring Boot篇）

大数据学习——高可用配置案例

如何避免选择不专业的建站公司?

Euclid's Game HDU - 1525（博弈）

面试笔记（六）---Js实现eventHandler

Windows 实例搭建的 FTP 在外网无法连接和访问

设计模式 : 桥接模式

USB 设备驱动开发之几个重要结构体分析

14-p14_sqrt求平方根

每日归档

更多

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)

2024-04-20(6)