Python3---BeautifulSoup - 代码天地

Python3---BeautifulSoup

其他 2018-06-25 05:13:28 阅读次数: 2

 
   # 爬虫网络请求方式：urllib(模块), requests(库), scrapy, pyspider(框架) 
  
   # 爬虫数据提取方式：正则表达式, bs4, lxml, xpath, css 
  
   from bs4 
   import BeautifulSoup 
  
   # 参数1：序列化的html源代码字符串，将其序列化成一个文档树对象。 
  
   # 参数2：将采用 lxml 这个解析库来序列化 html 源代码 
  
    html = BeautifulSoup( 
   open( 
   'index.html', 
   encoding= 
   'utf-8'), 
   'lxml') 
  
   # print(html.title) 
  
   # print(html.a) 
  
   # 
  
   # # 获取某一个标签的所有属性 
  
   # # {'href': '/', 'id': 'result_logo', 'onmousedown': "return c({'fm':'tab','tab':'logo'})"} 
  
   # print(html.a.attrs) 
  
   # 
  
   # # 获取其中一个属性 
  
   # print(html.a.get('id')) 
  
   # 获取多个标签，需要遍历文档树 
  
   # print(html.head.contents) 
  
   # print(html.head.children) # list_iterator object 
  
   # for ch in html.head.children: 
  
   # print(ch) 
  
   # descendants 
  
   # print(html.head.descendants) 
  
   # find_all 
  
   # find 
  
   # get_text: 标签内所有文本，包含子标签 
  
   # select 
  
   # string: 不能有其他标签。 
  
   print(html.select( 
   '.two')[ 
   0].get_text()) 
  
   # print(help(html)) 
  
   # find_all：根据标签名查找一组元素 
  
    res = html.find_all( 
   'a') 
  
   # print(res) 
  
   # select：支持所有的CSS选择器语法 
  
    res = html.select( 
   '.one')[ 
   0] 
  
   # print(res.get_text()) 
  
   # print(res.get('class')) 
  
    res = html.select( 
   '.two')[ 
   0] 
  
   print(res) 
  
   print( 
   '----',res.next_sibling) 
  
   import os 
  
    os.mkdir( 
   'abc') 
   # 在当前目录下6-7下，创建abc 
  
    os.chdir( 
   'abc') 
   # 进入到abc 
  
    os.mkdir( 
   '123') 
   # 在abc创建123目录 
  
    os.chdir(os.path.pardir) 
   # 回到父级目录 
  
    os.mkdir( 
   'erf')

猜你喜欢

转载自blog.csdn.net/qq_42336542/article/details/80697435

Python3---BeautifulSoup

python3 BeautifulSoup模块

python爬虫beautifulsoup4系列3

python3-bs4~Beautifulsoup

python3: 爬虫---- urllib, beautifulsoup

(转载) python3: beautifulsoup的使用

Python3导入BeautifulSoup报错

python3爬虫-urllib+BeautifulSoup

python3之beautifulsoup4

python爬虫（三）：BeautifulSoup 【3. 遍历】

python3 BeautifulSoup模块使用

python3 爬虫（requests+BeautifulSoup）

Python3 BeautifulSoup4

Python学习爬虫（3）——BeautifulSoup入门介绍

python爬虫3---BeautifulSoup实战

Python爬虫-3 BeautifulSoup基本语法

python BeautifulSoup

python—BeautifulSoup

【Python】BeautifulSoup

BeautifulSoup解析html网页（Python3--爬虫）

python3解析库BeautifulSoup4

python3 爬虫相关-requests和BeautifulSoup

Python3 BeautifulSoup和Pyquery解析库随笔

python3安装beautifulsoup全过程

Python3 --- BeautifulSoup4用法总结

python3实现网络爬虫（2）--BeautifulSoup使用（1）

Python3中beautifulsoup库的使用(爬虫利器)

【python3爬虫】beautifulsoup4 安装

python3爬虫学习之beautifulsoup实战

Python3 HTML数据解析(lxml/BeautifulSoup/JsonPath)

今日推荐

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

周排行

curl的POST请求，封装方法

8.1.1. Integer Types

Java基础 Day05(个人复习整理)

Python - Django - 中间件 process_exception

小L的试卷

【Shell编程】（函数）判断用户是否存在

python(css样式)

spring ant path 匹配原则 - 【笔记】

《JavaScript与JScript从入门到精通》(美)James.Jaworski.中译本.扫描版.pdf

Eclipse运行带参数的java程序

每日归档

更多

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)