利用python实现20行代码爬取《完美世界》

前言

《武炼巅峰》不好看,《一年永恒》看完了,接下来打算看看《完美世界》。习惯性的不在网上下载txt,而是自己利用python爬取小说,刚把代码写完,发现只有20行代码,感慨python真的方便好用啊!

代码

from requests_html import HTMLSession


def write_2_txt(title, content):
    """ 将标题和内容写入到txt文件中 """
    with open('完美世界.txt', 'a', encoding='utf-8') as file:
        file.write('\n')
        file.write(title)
        file.write('\n')
        file.write(content)


def crawl():
    """ 爬取《完美世界》主程序 """
    session = HTMLSession()
    r = session.get('https://www.qu.la/book/14/')
    _list = r.html.find('#list', first=True)
    links = _list.find('dl dd a')
    # 获取章节目录
    catalogs = [link.attrs['href'] for link in links[12:]]
    # 爬取每章并写入txt
    domain = 'https://www.qu.la'
    for catalog in catalogs:
        url = domain + catalog
        o = session.get(url)
        content = o.html.find('#content', first=True).text
        title = o.html.find('h1', first=True).text
        print(title + '----------------开始写入')
        write_2_txt(title, content)
        print(title + '----------------写入完成')


if __name__ == '__main__':
    crawl()

结尾

我相信,《完美世界》肯定好看,刚看了5章,感觉还不错。

猜你喜欢

转载自blog.csdn.net/ClassLjx/article/details/89515887