Python pyquery库解析html网页 - 代码天地

Python pyquery库解析html网页

其他 2019-01-25 11:26:20 阅读次数: 0

版权声明：记录平常的一些知识点 https://blog.csdn.net/cz9025/article/details/85302724

pyquery 类似于python版的jquery，以jquery风格解析html：

部分方法，代码如下：


from pyquery import PyQuery

# 可加载html的字符串，或html文件，或url地址
"""
用法：
    PyQuery("<html><title>hello</title></html>")
    PyQuery(filename=html_file)
    PyQuery(url='http://www.baidu.com')
"""
pq = PyQuery(url='https://news.baidu.com/guonei')

# 获取到的元素不只一个时，html()、text()方法只返回首个元素的相应内容块

# 根据类名、id名得到指定元素
# print pq('ul').filter('.ulist').html()


# 直接根据类名、id名获取元素
# print pq('#col_focus').html()


# attr获取属性值
# print pq('a').eq(0).attr('href')


# 只打印第一个p的
# print pq('p').html()


# find查找元素
# 打印在div中找到的第一个ul
# print pq('div').find('ul').eq(0).html()


# 返回父元素
# print pq('.ulist').parent('div')
# print pq('.ulist').parents('div')


# 子元素
# print pq('.ulist').children()


# 遍历一层后，会返回到上一层，会打印出ul及子元素
# print pq('ul.ulist').find('li').end()


# 清空
# pq('.ulist').find('li').empty()


# 打印所有ulist类下的a标签，下的文本及href值
lista = pq('.ulist a')
print len(lista)
for a in lista.items():
    print a.text(), a.attr('href')

猜你喜欢

转载自blog.csdn.net/cz9025/article/details/85302724

Python pyquery库解析html网页

python库的解析--PyQuery(pyquery库)

【Python爬虫】PyQuery解析库

Python使用PyQuery解析网页元素

python pyquery 解析html数据（2）

python3解析库pyquery

Python-爬虫-解析库（pyquery）的使用

Python爬虫解析库之pyquery详解

python爬虫8--pyquery解析库

Python3 【解析库pyquery】

python爬虫学习——解析库pyquery的使用

Python爬虫：Windows系统下用pyquery库解析含有中文的本地HTML文件报UnicodeDecodeError的解决方法

使用Python的Requests-HTML库进行网页解析

Python3爬虫（七）解析库的使用之pyquery

Python3 BeautifulSoup和Pyquery解析库随笔

python笔记解析web的库 XPath BeautifulSoup pyQuery基础

Python爬虫(四) | 解析库--BeautifulSoup、Xpath、pyquery

小白学 Python 爬虫（23）：解析库 pyquery 入门

python解析html网页BeautifulSoup

Python爬虫——PyQuery库

Python之Pyquery库

Python网页解析库：用requests-html爬取网页

Python爬虫：scrapy内置网页解析库parsel-通过css和xpath解析xml、html

初触Python,关于pyquery解析html（百度贴吧）

Python写爬虫——抓取网页并解析HTML

Python爬虫：HTML网页解析方法小结

python爬虫pyquery库详解

PyQuery库 python3

python爬虫_PyQuery库基础

【Python3 爬虫学习笔记】解析库的使用 8 —— 使用pyquery 1

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)