小白学爬虫笔记5---beautifulsoup库基本元素 - 代码天地

小白学爬虫笔记5---beautifulsoup库基本元素

编程语言 2018-07-19 18:09:14 阅读次数: 0

Beautiful Soup库的基本元素

解析、遍历、维护标签树的功能库

<p>..</p>：标签Tag
p为Name
class="title"为属性，属性为键值对构成

Beautiful Soup库的引用 from bs4 import BeatifulSoup import bs4
HTML文档、标签树、BeautifulSoup类等价

from bs4 import BeautifulSoup
soup = BeautifulSoup("<html>data</html>","html.parser")
soup2 = BeautifulSoup(open("D://demo.html"),"html.parser")

解析器

bs4的HTML解析器 'html.parser' 需要bs4库
lxml的HTML解析器 'lxml' pip install lxml
lxml的XML解析器 'lxl' pip install lxml
html5lib的解析器 'html5lib' pip install html5lib

Beautiful Soup类基本元素

Tag 标签，最基本的信息组织单元 <></>
Name 标签的名字，如p,.name
Attributes 属性,如class,.attrs
NavigableString 标签内费属性字符串，.string,即内容
Comment 标签内字符串的注释，一种特殊的Comment类型

from bs4 import BeautifulSoup
soup = BeautifulSoup(demo,"HTML.parser")
soup.title
tag = soup.a
tag

获取标签名字

from bs4 import BeautifulSoup
soup = BeautifulSoup(demo,"html.parser")
soup.a.name
soup.a.parent.name
soupt.a.parent.parent.name
tag = soup.a
tag.attrs #这是一个字典
tag.attrs['class']
tag.attrs['href']
type(tag.attrs) #dict
type(tag) #bs4.element.Tag
#NavigableString
soup.a.string
soup.p.string 
type(soup.p.string) #bs4.element.NavigableString
#Comment 可对类型做判断过滤注释信息
newsoup = BeautifulSoup("<b><!--This is a comment--></b><p>This is not a comment</p>, "html.parser")
newsoup.b.string
type(newsoup.b.string) #bs4.element.Comment
type(newsoup.p.string) #bs4.element.NavigableString

猜你喜欢

转载自blog.csdn.net/paleyellow/article/details/81079346

小白学爬虫笔记5---beautifulsoup库基本元素

BeautifulSoup库的基本元素

小白学爬虫笔记4---beautifulsoup库

Python爬虫笔记5 | BeautifulSoup

Python爬虫小白入门（三）BeautifulSoup库

Hbase基本元素

form基本元素

HTML基本元素

基本元素类型

Python 基本元素

qml 基本元素

HTML5笔记（二）：HTML5的常用元素与属性之基本元素

爬虫笔记（十四）——BeautifulSoup库

小白学爬虫笔记3---几个基本实例

【小白学爬虫连载（5）】--Beautiful Soup库详解

5 shell编程的基本元素：变量判断循环函数

HTML5的基本元素和属性

python 理解Beautiful Soup库的基本元素

转小白学爬虫笔记15---Scrapy 库入门

前端学习笔记 HTML（一）基本元素总结

python 爬虫之BeautifulSoup 库的基本使用

爬虫库requests和BeautifulSoup的基本使用

Mondrain Schema 基本元素

HTML基本元素的运用

前端---HTML基本元素

MATLAB语言的基本元素

编程语言的基本元素

C# 基本元素

python爬虫之BeautifulSoup库程序笔记

数据爬虫（五）：爬虫BeautifulSoup库的基本使用

今日推荐

探索 api.maynor1024.live：一站式 AI 服务平台

AI一键去衣技术：窥见深度学习在图像处理领域的革命(最后有彩蛋)

艾体宝案例 | 使用Redis和Spring Ai构建rag应用程序

Apple M1 vs 高通8Gen2 vs Apple A12Z各方面比较

【升职加薪必备架构图】Springboot学习路线汇总_springboot四层架构流程图

与Apollo共创生态：Apollo7周年大会自动驾驶生态利剑出鞘

Spring Boot 3.0：未来企业应用开发的基石

Java 的 AI 前景光明

国内首个智能体生态大会！2024百度万象大会定档5月30日

开源一周年，青语言新版发布

深入浅出：大型语言模型（LLM）的全面解读

顶会ICLR2024论文Time-LLM：基于大语言模型的时间序列预测

周排行

第五讲：AbstractBean以及Ioc常见注解使用和自动装配

python-re模块学习-正则表达式

黑客攻击常用手段

正则表达式的规则

windwos::mutex

Spring中日志的使用（log4j）

Bootstra5 按钮处理

JVM内存结构-这一篇全部了解

Android的低级错误

Oracle中Cursor, A表a1字段值复制到B表b1字段

每日归档

更多

2024-06-02(4)

2024-06-01(60)

2024-05-31(47)

2024-05-30(4)

2024-05-29(65)

2024-05-28(2)

2024-05-27(56)

2024-05-26(6)

2024-05-25(68)

2024-05-24(65)