python BeautifulSoup html解析 - 代码天地

python BeautifulSoup html解析

其他 2018-08-09 14:06:24 阅读次数: 0

* BeautifulSoup 的.find(), .findAll() 函数原型

findAll(tag, attributes, recursive, text, limit, keywords)
find(tag, attributes, recursive, text, keywords)

　　

* 取得 span.green

bsObj.findAll("span", {"class":"green"})

#-*- coding: UTF-8 -*-
#!/usr/local/bin/python
from urllib.request import urlopen
from urllib.request import HTTPError, URLError
from bs4 import BeautifulSoup

def getBsObj(url):
    try:
        html = urlopen(url, None, 3)
    except(HTTPError, URLError) as e:
        print(e)
        return None
    try:
        bsObj = BeautifulSoup(html.read(), "html.parser")
    except AttributeError as e:
        return None
    return bsObj

bsObj = getBsObj("http://www.pythonscraping.com/pages/warandpeace.html")
nameList = bsObj.findAll("span", {"class":"green"})
for name in nameList:
    print(name.get_text())

　　

* 取得 h1,h2,h3,h4,h5,h6

bsObj.findAll({"h1","h2","h3","h4","h5","h6"});

　　

// javascript 生成引号包裹每个元素的字符串

function quote(s) {
    return "\"" + s.split(",").join("\",\"") + "\"";
}
var s = "h1,h2,h3,h4,h5,h6"
console.log(quote(s))

　　

* 取得 span.green, span.red

bsObj.findAll("span", {"class":{"green", "red"}})

* 取得网页中包含"the prince"内容的标签数量

nameList = bsObj.findAll(text="the prince")
print(len(nameList))

* 找到#text id="text"

allText = bsObj.find(id="text")
print(allText.get_text())

* 找到div#text

allText = bsObj.find("div", {"id":"text"})

* 找到div#text > span.red:first-child

red = bsObj.find("div", {"id":"text"}).find("span", {"class":"red"}, False)
print(red.get_text())

　　

猜你喜欢

转载自blog.csdn.net/fareast_mzh/article/details/81463890

python BeautifulSoup html解析

python解析html网页BeautifulSoup

python - BeautifulSoup解析html页面

python安装BeautifulSoup库解析HTML页面

python用 BeautifulSoup 模块解析 HTML

Python下利用BeautifulSoup解析HTML

Python的BeautifulSoup中的HTML结构解析

python BeautifulSoup4解析html

Python 【解析库BeautifulSoup】

BeautifulSoup解析html网页（Python3--爬虫）

python爬虫之html解析Beautifulsoup和Xpath

python爬虫学习笔记-使用BeautifulSoup解析html

Python3 HTML数据解析(lxml/BeautifulSoup/JsonPath)

说说如何利用 Python 的 BeautifulSoup 模块解析 HTML 页面

Python爬虫：BeautifulSoup解析静态HTML页面【附完整代码】

Python爬虫 —— 使用BeautifulSoup4解析HTML文档

python : BeautifulSoup 格式美化 html

Python-爬虫-Beautifulsoup解析

python数据解析之BeautifulSoup

python BeautifulSoup库使用解析

python爬虫-数据解析BeautifulSoup

使用BeautifulSoup解析HTML

beautifulsoup 解析html方法

BeautifulSoup解析html介绍

[学习]用python的BeautifulSoup分析html

使用Python的BeautifulSoup库加载HTML报文

使用BeautifulSoup解析html页面

用BeautifulSoup模块解析HTML

使用BeautifulSoup模块解析HTML

python BeautifulSoup 解析器的区别

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

周排行

循环神经网络（rnn）讲解

Tigao教程四：单独的关节运动

金蝶K3WISE15.0-注册套打教程

如何在Mac上配置Kubernetes

Android应用结束自身进程的方法

SpringMVC学习十三拦截器栈

中国驻洛杉矶总领馆举行新春招待会

HttpClient get post 发送

11 - three.js 笔记 - 绘制三维字体模型

Mysql递归获取某个父节点下面的所有子节点和子节点上的所有父节点

每日归档

更多

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)