Python爬虫学习2：Beautifulsoup的使用 - 代码天地

Python爬虫学习2：Beautifulsoup的使用

其他 2018-07-07 01:02:45 阅读次数: 0

Beautiful Soup提供一些简单的、python式的函数用来处理导航、搜索、修改分析树等功能。它是一个工具箱，通过解析文档为用户提供需要抓取的数据，因为简单，所以不需要多少代码就可以写出一个完整的应用程序。

Beautiful Soup自动将输入文档转换为Unicode编码，输出文档转换为utf-8编码。你不需要考虑编码方式，除非文档没有指定一个编码方式，这时，Beautiful Soup就不能自动识别编码方式了。然后，你仅仅需要说明一下原始编码方式就可以了。

Beautiful Soup已成为和lxml、html6lib一样出色的python解释器，为用户灵活地提供不同的解析策略或强劲的速度。

1. soup.prettify() : 按html标签层级格式打印，看起来很舒服

from bs4 import BeautifulSoup
import requests

soup = BeautifulSoup(open('test.html'))
print(soup.prettify())

2. 获取tag

print(type(soup.title))
print(soup.title.name)

<class 'bs4.element.Tag'>
title

3. 获取tag中的string

print(type(soup.title.string))
print(soup.title.string)
<class 'bs4.element.NavigableString'>
The Dormouse's story

4. 获取注释

 print(type(soup.a.string))
print(soup.a.string)
<class 'bs4.element.Comment'>
 Elsie

for item in soup.body.contents:   # 遍历body中每个标签的内容
    print(item)
    print(item.name)   # 只找子元素的name

5. css查询

注意soup.a和soup.select('a')的区别

a_s = soup.a         #.点选择第一个a
for a in a_s:
    print(a)
Elsie

a_s = soup.select('a')       # select()选中所有a
for a in a_s:
    print(a)

猜你喜欢

转载自blog.csdn.net/zhuzuwei/article/details/80870252

Python爬虫学习2：Beautifulsoup的使用

python爬虫学习test2-学习beautifulsoup、学习xpath

python爬虫2---BeautifulSoup

爬虫学习打卡2——BeautifulSoup

python3实现网络爬虫（2）--BeautifulSoup使用（1）

python爬虫之BeautifulSoup学习

python爬虫之beautifulsoup的使用

python爬虫_BeautifulSoup库使用

Python爬虫——BeautifulSoup的使用（C）

python爬虫学习笔记-使用BeautifulSoup解析html

python爬虫：BeautifulSoup 使用select方法的使用

如何入门爬虫而系统学习？python爬虫实战基础学习(使用BeautifulSoup4等)

python爬虫beautifulsoup4系列2

Python学习爬虫（3）——BeautifulSoup入门介绍

Python爬虫学习笔记(BeautifulSoup补充)

使用Beautifulsoup做python网络爬虫

Python爬虫之BeautifulSoup和requests的使用

python爬虫：BeautifulSoup 使用select方法详解

python 爬虫之BeautifulSoup 库的基本使用

python之爬虫（八）BeautifulSoup库的使用

Python 爬虫 BeautifulSoup4 库的使用

Python爬虫之Beautifulsoup模块的使用

Python爬虫之BeautifulSoup使用指南

Python爬虫库-1-BeautifulSoup的使用

Python爬虫实习笔记 | Week2 Python正则和BeautifulSoup学习与试炼

2-4-1 基于 HTML 的爬虫，Python（Beautifulsoup）实现（版本：py3）——学习笔记

Python网络爬虫与信息提取学习记录（2）——关于BeautifulSoup库的用法

爬虫6——BeautifulSoup(2)

【爬虫学习一】 Python实现简单爬虫（requests，BeautifulSoup）

3.零基础系统学习Python爬虫之BeautifulSoup的简单使用

今日推荐

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

周排行

curl的POST请求，封装方法

8.1.1. Integer Types

Java基础 Day05(个人复习整理)

Python - Django - 中间件 process_exception

小L的试卷

【Shell编程】（函数）判断用户是否存在

python(css样式)

spring ant path 匹配原则 - 【笔记】

《JavaScript与JScript从入门到精通》(美)James.Jaworski.中译本.扫描版.pdf

Eclipse运行带参数的java程序

每日归档

更多

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)