Python实战（三）——Python解析器 BeautifulSoup使用 - 代码天地

Python实战（三）——Python解析器 BeautifulSoup使用

其他 2018-05-11 05:11:26 阅读次数: 1

一、安装

1、进入python安装目录，如果scripts文件中有pip.exe则直接cmd 执行 pip install beautifulsoup4，开始安装

2、验证安装是否成功

#coding :utf-8

import bs4
print bs4   #返回bs4信息，，<module 'bs4' from 'C:\Python27\lib\site-packages\bs4\__init__.pyc'>

返回bs4模块信息，beautifulsoup安装成功。

二、网页解析

from bs4 import BeautifulSoup
import re
html_doc="""

"""

#创建bs对象
soup=BeautifulSoup(html_doc,'html_parser',from_encoding='uft-8') #html内容，解析器，编码

#获取所有url
links=soup.find_all('a')
for link in links:
    print link.name,link['href'],link.get_text()
    
#获取指定url    
link=soup.find('a',href='http://baidu.com')
print link.name,link['href'],link.get_text()

#根据正则表达式筛选
link=soup.find('p',href=re.compile(r"titl"))
print link.name,link['href'],link.get_text()

#获取p段落文字
p_node=soup.find('p',class_='title')  # a标签里的class名="title"
print p_node.name,p_node.get_text()

猜你喜欢

转载自blog.csdn.net/daybreak1209/article/details/60869520

Python实战（三）——Python解析器 BeautifulSoup使用

python BeautifulSoup 解析器的区别

Python爬虫包 BeautifulSoup的各种html解析器的比较及使用

python BeautifulSoup库使用解析

python中的BeautifulSoup使用

python BeautifulSoup的简单使用

Python BeautifulSoup 使用

[Python]BeautifulSoup安装与使用

Python BeautifulSoup库使用

Python的BeautifulSoup库的使用

使用BeautifulSoup解析页面

使用BeautifulSoup解析HTML

Python爬虫(十二)_BeautifulSoup4 解析器

IDEA使用--添加python解析器

python爬虫之beautifulsoup的使用

python爬虫_BeautifulSoup库使用

Python爬虫——BeautifulSoup的使用（C）

爬虫实战：使用Scrapy与BeautifulSoup

使用BeautifulSoup解析html页面

使用BeautifulSoup模块解析HTML

python爬虫：BeautifulSoup 使用select方法的使用

Python爬虫开发【第1篇】【beautifulSoup4解析器】

【python】打卡学习第七天-爬虫解析器BeautifulSoup4

Python中xPath技术和BeautifulSoup的使用

Python数据抓取_BeautifulSoup模块的使用

使用Beautifulsoup做python网络爬虫

Python爬虫之BeautifulSoup和requests的使用

Python爬虫学习2：Beautifulsoup的使用

(转载) python3: beautifulsoup的使用

python爬虫：BeautifulSoup 使用select方法详解

今日推荐

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

周排行

rbac——界面、权限

Apache CXF + SpringMVC 整合发布WebService

so插件化

Vue.js实战系列---图标字体制作（svg格式）

PAT乙级 1007 素数对猜想(孪生素数对) (20分) ---（C语言 + 详细注释）

被IRM保护的文档，打开失败

Calendar和Date计算日期差的小问题

win10子系统ubuntu18.4安装docker

利用Wrap Shell Script定位Android Native内存泄漏

MySQL: Transaction (Part I - Basic Concept)

每日归档

更多

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)