Python正则表达式制作简单爬虫,爬取知乎主题/关注度/url

import requests
import re
#---------------------
#作者:qyqin
#时间:20170617
#内容:简单爬虫爬取知乎首页主题/关注度/超链接

#---------------------


#设置请求头
headers ={
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 BIDUBrowser/8.7 Safari/537.36',
}

#知乎的请求地址,F12中获取,limit为限制条数
json =requests.get( 'https://www.zhihu.com/api/v3/explore/guest/feeds?limit=40', headers =headers)
json.encoding = 'utf-8'


voteup_count_list =re.findall( '"voteup_count":(.*?),',json.text,re.S)
url_list =re.findall( ',"url":"(https://api.zhihu.com/questions/.*?)",',json.text,re.S)
title_list =re.findall( '"title":"(.*?)",',json.text,re.S)

data = zip(title_list,voteup_count_list,url_list)

for index,item in enumerate(data):
print(index,item)





结果:

0 ('最理性的暗恋是什么样子?', '5526', 'https://api.zhihu.com/questions/60140238')
1 ('如何评价 2018 俄罗斯世界杯 D 组冰岛 1:1 逼平阿根廷?', '3286', 'https://api.zhihu.com/questions/281284075')
2 ('如何评价《创造101》的第九期?', '3520', 'https://api.zhihu.com/questions/277065160')
3 ('如何评价电影《侏罗纪世界2》(Jurassic World2:Fallen Kingdom)?', '120', 'https://api.zhihu.com/questions/280716385')
4 ('如何看待《创造101》第九期,孟美岐选择了盛放?', '143', 'https://api.zhihu.com/questions/281271173')
5 ('你们讨厌「小哥哥」「小姐姐」这种称呼吗?', '18684', 'https://api.zhihu.com/questions/276768080')
6 ('熬夜看世界杯有哪些「神器」?', '511', 'https://api.zhihu.com/questions/275730383')
7 ('你认识的那些女博士都是怎样的?她们毕业都干什么去了?过的好吗?', '562', 'https://api.zhihu.com/questions/279383125')
8 ('如何看待李健至今不买房?', '955', 'https://api.zhihu.com/questions/278703678')
9 ('男生眼里   长发女生和短发女生给人的感觉有啥不同?', '1022', 'https://api.zhihu.com/questions/276216403')

...

猜你喜欢

转载自blog.csdn.net/fitz_p/article/details/80721371