使用Beautiful Soup解析html文件 - 代码天地

使用Beautiful Soup解析html文件

其他 2019-01-07 08:50:53 阅读次数: 0

demo代码：

# -*- coding: UTF-8 -*-
import re
from bs4 import BeautifulSoup
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
# 创建一个BeautifulSoup解析对象
soup = BeautifulSoup(html_doc, "html.parser", from_encoding="utf-8")
# 获取所有的链接
links = soup.find_all('a')
print("所有的链接:")
for link in links:
    print(link.name, link['href'], link.get_text())
print()
print("获取特定的URL地址:")
link_node = soup.find('a', href="http://example.com/elsie")
print(link_node.name, link_node['href'], link_node['class'], link_node.get_text())
print()
print("正则表达式匹配:")
link_node = soup.find('a', href=re.compile(r"ti"))
print(link_node.name, link_node['href'], link_node['class'], link_node.get_text())
print()
print("获取P段落的文字:")
p_node = soup.find('p', class_='story')
print(p_node.name, p_node['class'], p_node.get_text())

运行截图：

在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/qq_32534441/article/details/85846517

使用Beautiful Soup解析html文件

Beautiful Soup 解析html表格

使用Beautiful Soup解析库

Python中解析 html 使用Beautiful Soup库

Beautiful Soup 的使用

使用Beautiful Soup

Beautiful Soup 的使用问题

Beautiful Soup模块的使用

Beautiful Soup的使用

Beautiful Soup的使用（一）

beautiful soup解析html获得数据

Beautiful Soup库——HTML/XML页面解析

Python之Html解析方法(beautiful soup)

HTML文档解析之Beautiful Soup

Beautiful Soup解析库的安装和使用

Python 爬虫解析库的使用 --- Beautiful Soup

Python Beautiful Soup模块的使用

20181223 python 使用Beautiful Soup

Beautiful soup的使用方法

Beautiful Soup库的简单使用

对Beautiful Soup使用的小小总结

python Beautiful Soup解析html页面table标签

Python爬虫之Beautiful Soup解析库的使用（五）

python3爬虫(6)--使用Beautiful Soup解析数据

第4章解析库的使用---Beautiful soup

Python爬虫之解析库的使用(XPath、Beautiful Soup)

【爬虫解析5】Beautiful Soup

Beautiful Soup解析工具简介

Beautiful Soup 基本使用方法

（十九）Python爬虫：Beautiful Soup的使用

今日推荐

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

周排行

计算机组成与设计（七）—— 除法器

Integer Approximation(分治+枚举)

大话数据库索引

windows10系统JDK的配置及下载地址

mysql实现秒值转换中原六仔平台搭建

Codeforces Round #556 (Div. 1)

百练1064 网线主管

Codeforces 995F Cowmpany Cowmpensation

子集生成之增量构造法，位向量法，二进制法

ERROR: cmd.exe failed with args /c "/APK\gradle\rungradle.bat...

每日归档

更多

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)