20行python代码——爬取知乎神回复


知乎简单爬虫代码

简单思路

1、集成BeautifulSoup
2、用urllib.request解析 url
3、用bs4解析
4、soup.findAll找出某一类class
5、对该类别中的标签进行解析
import time
import urllib.request
from bs4 import BeautifulSoup

for p in range(1, 76):
    url = "http://www.zhihu.com/collection/27109279?page=" + str(p)
    page = urllib.request.urlopen(url)
    soup = BeautifulSoup(page, 'html.parser')  # 使用html解析器进行解析
    allp = soup.findAll(class_='zm-item')
    print('                                 第' + str(p) + '页\n')
    for each in allp:
        answer = each.findNext(class_='zh-summary summary clearfix')
        answer = answer.text.replace('显示全部', '')
        answer = answer.replace('\n', '')
        if len(answer) > 200:
            continue
        problem = each.findNext(class_='zm-item-title')
        print(str(allp.index(each) + 1) + '、问题: ' + problem.text)
        print('   神回:' + answer)
    time.sleep(5)



源码下载请点击:20行python代码爬取知乎

爬取的神回复内容非常搞笑:一起笑一笑

发布了16 篇原创文章 · 获赞 9 · 访问量 7139

猜你喜欢

转载自blog.csdn.net/qq_30803353/article/details/78415475