Python爬虫笔记5 | BeautifulSoup

BeautifulSoup

from bs4 import BeautifulSoup
soup = BeautifulSoup("<html>data</html>", 'html.parser')
soup2 = BeautifulSoup(open("D://demo.html"), 'html.parser')
  • Beautiful Soup 的解析器
解析器 使用方法 条件
bs4 的HTML解析器 BeautifulSoup(mk,‘html.parser’) 安装bs4库
lxml的HTML解析器 BeautifulSoup(mk,‘lxml’) pip sinstall lxml
lxml的XML解析器 BeautifulSoup(mk,‘xml’) pip install lxml
html5lib 的解析器 BeautifulSoup(mk,‘html5lib’) pip install html5lib
  • Beautiful Soup 的基本元素
    在这里插入图片描述

[例一]

>>> import requests
>>> from bs4 import BeautifulSoup
>>> r = requests.get("https://python123.io/ws/demo.html")
>>> demo = r.text
>>> demo
'<html><head><title>This is a python demo page</title></head>\r\n<body>\r\n<p class="title"><b>The demo python introduces several python courses.</b></p>\r\n<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:\r\n<a href="http://www.icourse163.org/course/BIT-268001" class="py1" id="link1">Basic Python</a> and <a href="http://www.icourse163.org/course/BIT-1001870001" class="py2" id="link2">Advanced Python</a>.</p>\r\n</body></html>'
>>> soup= BeautifulSoup(demo,'html.parser')
>>> soup
<html><head><title>This is a python demo page</title></head>
<body>
<p class="title"><b>The demo python introduces several python courses.</b></p>
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>
</body></html>
>>> print(soup.prettify())
<html>
 <head>
  <title>
   This is a python demo page
  </title>
 </head>
 <body>
  <p class="title">
   <b>
    The demo python introduces several python courses.
   </b>
  </p>
  <p class="course">
   Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
   <a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">
    Basic Python
   </a>
   and
   <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">
    Advanced Python
   </a>
   .
  </p>
 </body>
</html>
>>> soup.a
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>
>>> soup.a.name
'a'
>>> soup.a.attrs
{'href': 'http://www.icourse163.org/course/BIT-268001', 'class': ['py1'], 'id': 'link1'}
>>> soup.a.attrs['href']
'http://www.icourse163.org/course/BIT-268001'
>>> soup.a.string
'Basic Python'
>>> 

发布了51 篇原创文章 · 获赞 5 · 访问量 4196

猜你喜欢

转载自blog.csdn.net/qq_43519498/article/details/93603388