Beautiful Soup module
Features
In short, parse and extract HTML / XML data.
As for why to learn this module? After all, I was just learning python path out the importance of this module, personally find it interesting (because it is crawling little sister pictures only way ha ha ha ha), I also followed a teacher the following b station (we do not know sorry) notes do, feel useful to learn while doing the code knock wanted to write down notes
fundamental element
|
Explanation
|
Tag
|
Tag information is the most basic unit of organization, respectively <> </> indicate the beginning and end
|
Name
|
Name tag, <p> </ p> name is p, the format: <Tag> .name
|
attributes
|
Tag attributes, organized dictionary (keys and values), the format: <Tag> .attrs
|
Navigablestring |
Non attribute string in the tag, the format: <Tag> .string
|
comment
|
Note the portion of the inner tag string
|
Attributes
|
Explanation
|
.contens
|
List of child nodes of the <tag> list of all son nodes into
|
.children
|
Iterator type of child nodes, and .contens similar for loop iterates son node
|
.descendants
|
Iterative descendant node type, comprising all descendant nodes, a loop through
|
Attributes
|
Explanation
|
.parents
|
Father node label
|
.parents
|
Iterative ancestor node type label ancestor node for looping through
|
Attributes
|
Explanation
|
.next_sibling
|
Back tab under a parallel node according to the order of HTML text
|
.previous_sibling
|
Return a text node parallel sequence in accordance with HTML tags
|
.next_siblings
|
Iterative type, according to the label returned HTML text sequence all subsequent nodes in parallel
|
.previous_siblings
|
Iterative type, all return label Continued parallel nodes in accordance with the procedure of HTML text
|
Examples
I will give the teacher code integrated together, each sentence will have basic notes, no learning python little friends do not want to see the video, then the code will code one yard hey! Note that, basically output statement here, be sure to slowly realize, print a statement is run, the comparator output output content analysis step by step, after completion of a learning knowledge remember commented block, followed by the next school knowledge block, so I learned to feel Bang Bang da ~ ~ ~ ~ do not comment out the following oh
The basic elements of code blocks exercises
Exercise downlink block traversal
Uplink block traversal Practice
Exercise parallel traversal block
Here is the source put it, not to copy not just focus on watching, yard one yard
import requests
from bs4 import BeautifulSoup
url="http://python123.io/ws/demo.html"
r = requests.get(url)
demo = r.text
soup = BeautifulSoup(demo , 'html.parser') #html解释器
print(demo)
print(soup.prettify)#对比输出的不同,html解析的功能
#以下是五种基本元素的使用
tag = soup.a
print(tag)#输出a标签
print(tag.name) #输出标签的名字
print(tag.parent.name) #输出a的父类标签的名字
print(tag.attrs)#输出标签属性(输出方式为字典)
print(tag.attrs['class'])#输出['py1'],也就是属性class的值
print(tag.attrs['href'])#输出herf属性的值
print(type(tag.attrs)) #输出标签属性类型,这里是字典类型
print(type(tag))#输出标签的类型
print(tag.string)#输出a标签中的非属性字符串信息
print(soup.p.string)#输出p标签中的string
print(type(soup.p.string))#输出标签中的string的类型,是Navigablestring,有跨标签的性质所以p标签中的b标签并没有显示出来
newsoup = BeautifulSoup("<b><!--this is a comment--></b><p>this is not a moment</p>","html.parser")
#注释以<!--注释内容-->
print(newsoup)#并分析b标签和p标签的类型观察有什么不同
#标签数的下行遍历
tag = soup.body
print(tag)
print(tag.contents)#输出body标签的儿子节点,.contents返回的类型是列表
print(len(tag.contents))#返回儿子节点的数量,因为返回类型是列表类型所以可以用列表来检索标签内容
print(tag.contents[1])#输出列表第一个子节点
for child in tag.children:
print(child)#遍历所有儿子节点
for child in tag.descendants:
print(child)#遍历所有子孙节点
#标签树的上行遍历
tag = soup.title
print(tag.parent)#输出title标签的父亲
for parent in soup.a.parents:
if parent is None:
print(parent)
else:
print(print.name)#这里是遍历出a标签的所有父标签
#标签树的平行遍历
tag = soup.a
print(tag.next_sibling)#发现输出不是标签
print(tag.next_sibling.next_sibling)#a标签的下下个平行节点
print(tag.previous_sibling)#输出a标签的上一个平行节点
for sibling in tag.next_siblings:
print(sibling)#遍历后续节点
for sibling in tag.previous_siblings:
print(sibling)#遍历前续节点
Record of the road at the same time I hope that this study also notes to help you ~~~