从0开始学爬虫7之BeautifulSoup模块的简单介绍 - 代码天地

从0开始学爬虫7之BeautifulSoup模块的简单介绍

其他 2019-07-17 12:04:55 阅读次数: 0

参考文档：

https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

# 安装 beautifulsoup4

(pytools) D:\python\pytools>pip install beautifulsoup4

# coding=utf-8

from bs4 import BeautifulSoup as bs
import re


html_doc = """
<html><head><title>The Dormouse's story</title></head>

<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""
soup = bs(html_doc, "html.parser")

# print(soup.prettify())

# print(soup.title.string)
# print(soup.a)
# print(soup.find(id='link2'))

# print(soup.find(id='link2').string)

# print(soup.find(id='link2').get_text())

# 获取所有a标签的内容
# for link in soup.find_all('a'):
#     print(link.get_text())

# 使用string获取不到p标签的内容
# print(soup.find("p", {"class": "story"}).get_text())

# 查找所有以b开头的标签
# for tag in soup.find_all(re.compile("^b")):
#     print(tag.get_text())

# 查找所有的a标签中href以 http://example.com开头的数据，其中的.号代表任意字符，如果需要 表示点号本身可以用\转义   href=re.compile(r"^http://example\.com/")
data = soup.find_all("a", href=re.compile(r"^http://example.com/"))
print(data)

猜你喜欢

转载自www.cnblogs.com/reblue520/p/11200073.html

从0开始学爬虫7之BeautifulSoup模块的简单介绍

爬虫模块介绍--Beautifulsoup

爬虫之beautifulsoup模块

Python爬虫之BeautifulSoup库(一)：介绍与快速开始

Python爬虫之BeautifulSoup模块

【网页爬虫】BeautifulSoup4模块介绍

Python爬虫之Beautifulsoup模块的使用

爬虫----BeautifulSoup模块

爬虫BeautifulSoup模块（下）

从0开始学爬虫8使用requests/pymysql和beautifulsoup4爬取维基百科词条链接并存入数据库

Python爬虫库BeautifulSoup的介绍与简单使用实例

基于BeautifulSoup简单爬虫

爬虫--BeautifulSoup简单案例

BeautifulSoup模块详细介绍

从0开始学爬虫12之使用requests库基本认证

从0开始学爬虫10之urllib和requests库与github/api的交互

从0开始学爬虫11之使用requests库下载图片

python爬虫日志（7）BeautifulSoup的一些简单知识

爬虫笔记之BeautifulSoup模块官方文档笔记

python爬虫之BeautifulSoup4介绍

爬虫系列之BeautifulSoup

爬虫之Beautifulsoup及xpath

爬虫之BeautifulSoup

爬虫之BeautifulSoup库

python爬虫之BeautifulSoup

python爬虫之BeautifulSoup

爬虫之 BeautifulSoup与Xpath

python爬虫之BeautifulSoup4库的简单用法

python爬虫--beautifulsoup模块简介

从0开始学爬虫（1）—— 配置工作环境

今日推荐

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

国产云输入法——仅华为无云端数据上传安全问题

开源日报 | 工业开源项目OGG 1.0；姐姐，你要和我一起配置火狐吗；苹果AI遥遥落后？Fedora 40

开放签电子签章：停止新增，优化体验，前进更进（五一假期前工作）

周排行

Metasploit文件目录与入侵基本概念

跨域(CORS)请求问题[No 'Access-Control-Allow-Origin' header is present on the requested resource]常见解决方案

CodeIgniter 源码解读之 CodeIgniter.php（二）

SAS入门之（四）改变数据类型

初识元组

[数学建模]数学建模算法和模型（B站视频）（二）

Nginx 服务器源码安装配置流程

C#实现语音视频录制【基于MCapture + MFile】

开发进度4

下载安装vue的方法网址

每日归档

更多

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)