bs4——BeautifulSoup模块:解析网页

解析由requests模块请求到的网页

1 import requests
2 from bs4 import BeautifulSoup
3 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/\
4 537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36',
5                'Host':'movie.douban.com'}
6 link = 'https://movie.douban.com/top250'
7 r = requests.get(link, headers=headers,timeout=2)
8 soup = BeautifulSoup(r.text, 'lxml') #以lxml格式解析网页文本

BeautifulSoup模块有两个查找方法:

一个是:find(),返回符合条件的第一条内容

 1 import requests
 2 from bs4 import BeautifulSoup
 3 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/\
 4 537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36',
 5                'Host':'movie.douban.com'}
 6 link = 'https://movie.douban.com/top250'
 7 r = requests.get(link, headers=headers,timeout=2)
 8 soup = BeautifulSoup(r.text, 'lxml') #以lxml格式解析网页文本
 9 find_result = soup.find('div', class_='hd')
10 print(find_result)

下面是解析后的结果:

C:\python3.5\python.exe C:/Users/MR/Desktop/test.py
<div class="hd">
<a class="" href="https://movie.douban.com/subject/1292052/">
<span class="title">肖申克的救赎</span>
<span class="title"> / The Shawshank Redemption</span>
<span class="other"> / 月黑高飞(港)  /  刺激1995(台)</span>
</a>
<span class="playable">[可播放]</span>
</div>

另一个是:find_all(),以列表格式返回符合条件的所有内容

 1 import requests
 2 from bs4 import BeautifulSoup
 3 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/\
 4 537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36',
 5                'Host':'movie.douban.com'}
 6 link = 'https://movie.douban.com/top250'
 7 r = requests.get(link, headers=headers,timeout=2)
 8 soup = BeautifulSoup(r.text, 'lxml') #以lxml格式解析网页文本
 9 find_result = soup.find_all('div', class_='hd')
10 print(find_result)

下面是解析后结果:

  注意返回的是列表

C:\python3.5\python.exe C:/Users/MR/Desktop/test.py
[<div class="hd">
<a class="" href="https://movie.douban.com/subject/1292052/">
<span class="title">肖申克的救赎</span>
<span class="title"> / The Shawshank Redemption</span>
<span class="other"> / 月黑高飞(港)  /  刺激1995(台)</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1291546/">
<span class="title">霸王别姬</span>
<span class="other"> / 再见,我的妾  /  Farewell My Concubine</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1295644/">
<span class="title">这个杀手不太冷</span>
<span class="title"> / Léon</span>
<span class="other"> / 杀手莱昂  /  终极追杀令(台)</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1292720/">
<span class="title">阿甘正传</span>
<span class="title"> / Forrest Gump</span>
<span class="other"> / 福雷斯特·冈普</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1292063/">
<span class="title">美丽人生</span>
<span class="title"> / La vita è bella</span>
<span class="other"> / 一个快乐的传说(港)  /  Life Is Beautiful</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1292722/">
<span class="title">泰坦尼克号</span>
<span class="title"> / Titanic</span>
<span class="other"> / 铁达尼号(港 / 台)</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1291561/">
<span class="title">千与千寻</span>
<span class="title"> / 千と千尋の神隠し</span>
<span class="other"> / 神隐少女(台)  /  Spirited Away</span>
</a>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1295124/">
<span class="title">辛德勒的名单</span>
<span class="title"> / Schindler's List</span>
<span class="other"> / 舒特拉的名单(港)  /  辛德勒名单</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/3541415/">
<span class="title">盗梦空间</span>
<span class="title"> / Inception</span>
<span class="other"> / 潜行凶间(港)  /  全面启动(台)</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/2131459/">
<span class="title">机器人总动员</span>
<span class="title"> / WALL·E</span>
<span class="other"> / 瓦力(台)  /  太空奇兵·威E(港)</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/3011091/">
<span class="title">忠犬八公的故事</span>
<span class="title"> / Hachi: A Dog's Tale</span>
<span class="other"> / 忠犬小八(台)  /  秋田犬八千(港)</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/3793023/">
<span class="title">三傻大闹宝莱坞</span>
<span class="title"> / 3 Idiots</span>
<span class="other"> / 三个傻瓜(台)  /  作死不离3兄弟(港)</span>
</a>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1292001/">
<span class="title">海上钢琴师</span>
<span class="title"> / La leggenda del pianista sull'oceano</span>
<span class="other"> / 声光伴我飞(港)  /  一九零零的传奇</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1291549/">
<span class="title">放牛班的春天</span>
<span class="title"> / Les choristes</span>
<span class="other"> / 歌声伴我心(港)  /  唱诗班男孩</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1292213/">
<span class="title">大话西游之大圣娶亲</span>
<span class="title"> / 西遊記大結局之仙履奇緣</span>
<span class="other"> / 西游记完结篇仙履奇缘  /  齐天大圣西游记</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1292064/">
<span class="title">楚门的世界</span>
<span class="title"> / The Truman Show</span>
<span class="other"> / 真人Show(港)  /  真人戏</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1291841/">
<span class="title">教父</span>
<span class="title"> / The Godfather</span>
<span class="other"> / Mario Puzo's The Godfather</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1889243/">
<span class="title">星际穿越</span>
<span class="title"> / Interstellar</span>
<span class="other"> / 星际启示录(港)  /  星际效应(台)</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1291560/">
<span class="title">龙猫</span>
<span class="title"> / となりのトトロ</span>
<span class="other"> / 邻居托托罗  /  邻家的豆豆龙</span>
</a>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/5912992/">
<span class="title">熔炉</span>
<span class="title"> / 도가니</span>
<span class="other"> / 无声呐喊(港)  /  漩涡</span>
</a>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1307914/">
<span class="title">无间道</span>
<span class="title"> / 無間道</span>
<span class="other"> / Infernal Affairs  /  Mou gaan dou</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1849031/">
<span class="title">当幸福来敲门</span>
<span class="title"> / The Pursuit of Happyness</span>
<span class="other"> / 寻找快乐的故事(港)  /  追求快乐</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/6786002/">
<span class="title">触不可及</span>
<span class="title"> / Intouchables</span>
<span class="other"> / 闪亮人生(港)  /  逆转人生(台)</span>
</a>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/3319755/">
<span class="title">怦然心动</span>
<span class="title"> / Flipped</span>
<span class="other"> / 萌动青春  /  青春萌动</span>
</a>
<span class="playable">[可播放]</span>
</div>, <div class="hd">
<a class="" href="https://movie.douban.com/subject/1300267/">
<span class="title">乱世佳人</span>
<span class="title"> / Gone with the Wind</span>
<span class="other"> / 飘</span>
</a>
<span class="playable">[可播放]</span>
</div>]

猜你喜欢

转载自www.cnblogs.com/muouran0120/p/10029646.html