request+正则表达式，爬取豆瓣电影top100 - 代码天地

request+正则表达式，爬取豆瓣电影top100

其他 2018-08-09 22:06:31 阅读次数: 0

import requests,re,json
from requests.exceptions import RequestException
from multiprocessing import Pool

def get_page_source(url):

headers={ "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)Chrome/65.0.3325.181 Safari/537.36" }

try:
response=requests.get(url,headers=headers)

if response.status_code is 200:
return response.text

except RequestException:
return None

def pares_page_source(html):
pattern=re.compile('<dd>.?board-index.?>(\d+)</i>.?<a.?title="(.?)".?<img.?-src="(.?)".?="star">\n(.?)\n.?'
'releasetime">(.?)</p>.?integer">(.?)</i>.?fraction">(\d+).?</dd>',re.S)

result=re.findall(pattern,html)
for i in result:
yield{
"index":i[0],
"title": i[1],
"image":i[2],
"actor":i[3].strip(),
"time":i[4][5:],
"score":i[5]+i[6]
}

def write_to_txt(content):
with open("maoyan.txt","a",encoding="utf-8")as f:
f.write(json.dumps(content,ensure_ascii=False)+"\n")

def main(i):
url="http://www.maoyan.com/board/4?offset="+str(i)
html=get_page_source(url)
#print(type(html))
for i in pares_page_source(html):
write_to_txt(i)
print(i)

if name == 'main':
for i in range(10):
main(i*10)

猜你喜欢

转载自blog.csdn.net/qiushuidongshi/article/details/81253885

request+正则表达式，爬取豆瓣电影top100

【Python】Requests+正则表达式爬取猫眼电影TOP100

python 爬虫正则表达式爬取猫眼电影top100榜

Requwsts+正则表达式爬取猫眼电影Top100

Requests+正则表达式爬取猫眼TOP100电影

爬取猫眼电影TOP100（回顾正则表达式）

爬取猫眼电影榜单Top100—利用requests、正则表达式

利用正则表达式爬取猫眼电影TOP100信息

requests+正则表达式爬取猫眼电影TOP100

request+正则爬猫眼电影榜top100

爬虫从头学之Requests+正则表达式爬取猫眼电影top100

python3.6 利用requests和正则表达式爬取猫眼电影TOP100

利用requests和正则表达式re爬取猫眼电影top100，并下载图片

Python爬虫实战之Requests+正则表达式爬取猫眼电影Top100

requests+正则表达式爬猫眼电影TOP100

正则表达式爬取猫眼top100

多进程，Request+正则表达式爬取榜单类网站

使用BeautifulSoup和正则表达式爬取时光网不同地区top100电影并使用Matplotlib对比

正则表达式爬取猫眼电影100

Python requests + 正则表达式猫眼电影top100 信息抓取

python网络爬虫--正则表达式抓取猫眼电影排行TOP100

使用正则表达式爬虫抓取猫眼电影排行Top100

Python爬虫入门——2. 5 利用正则表达式爬取豆瓣电影 Top 250

基础爬虫，谁学谁会，用requests、正则表达式爬取豆瓣Top250电影数据！

爬取豆瓣电影前250，借此熟悉python的request，数据入库，正则表达式

用Requests和正则表达式爬取猫眼电影(TOP100+最受期待榜）

requests+re爬取豆瓣电影top100

利用Requests库和正则表达式爬取豆瓣影评Top250

用Requests和正则表达式爬取豆瓣图书TOP250

requests+正则表达式爬取豆瓣读书top250

今日推荐

Linus “吃狗粮”最积极！

开源日报 | Winamp播放器即将开源；生成式AI之战升级第二轮；Linus“吃狗粮”最积极；AI进入泡沫前期；吴泳铭为阿里云带来了什么？

NetBSD 禁止提交由 AI 生成的代码

Apache Doris 2.0.10 版本正式发布！

开源日报 | 大模型开战；大模型独角兽被曝卖身；周鸿祎建议谷歌开源所有产品；最大开源AI社区提供1000万美元共享GPU

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

周排行

SVN服务端安装在阿里云

实战 | 相机标定

webpack核心概念

note20——》只要肯低头吃苦，人生就会有救

PAT甲级 1062 Talent and Virtue （25 分）排序

NG Toolset开发笔记--5GNR Resource Grid（26）

如何对待上司

oracle命令

第9章 STL迭代器

logstash使用es映射模板

每日归档

更多

2024-05-20(36)

2024-05-19(0)

2024-05-18(4)

2024-05-17(34)

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)