Python takes you to realize the novel downloader and realize the freedom of novels! !

foreword

My roommate shouted that I had no novels to read, and asked me to recommend a few books to him. At first I thought about it, but I didn’t read any novels, but can this stump me?

I used python to download all the novels on the entire website for him in minutes, as expected of me!

Without further ado, let's start straight away!
insert image description here

environment

The python 3.6 interpreter
pycharm editor professional version requires an activation code

requests >>> pip install requests
parsel >>> pip install parsel
pandas >>> pip install pandas
tqdm >>> pip install tqdm

Data source analysis

  1. determine needs

    Crawl the content of the novel
    and perform the search function (when entering the name of a novel or the author chooses to download the corresponding novel content)
    2. Data source analysis

code implementation process

Send a requestSend a request for the novel chapter list page

Get data Get the source code of the web page (the text data response.txt of the response body)

Parsing data to extract novel name/url address

Send a request Send a request to the url address of the novel chapter

Get data Get the source code of the web page (the text data response.txt of the response body)

Parse the data to extract the content of the novel / novel chapter name

save data

implement a search function
insert image description here

All modules\environment\source code\tutorials in this article can
be clicked here to jump for free
Jump to get free


import requests  # pip install requests
import parsel  # 数据解析 pip install parsel
import pandas as pd # pip install pandas
from tqdm import tqdm  # pip install tqdm

headers = {
    
    
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
}


def get_response(html_url):
 '''发送请求'''
    response = requests.get(url=html_url, headers=headers)
    response.encoding = response.apparent_encoding
    return response


def save(name, title, content):
    """保存小说"""
    with open(name + '.txt', mode='a', encoding='utf-8') as f:
        f.write(title)
        f.write('\n')
        f.write(content)
        f.write('\n')
        # print(title)


def get_novel_url(html_url):
    """获取小说章节url"""
    response = get_response(html_url)
    # 把获取到的html字符串数据 转成 selector 对象
    selector = parsel.Selector(response.text)
    name = selector.css('#info h1::text').get()
    href = selector.css('#list dd a::attr(href)').getall()
    # https://www.biquges.com/10_10770/6896120.html
    href = ['https://www.biquges.com/' + i for i in href]
    for index in tqdm(href):
        """获取小说的内容"""
        response = get_response(index)
        selector = parsel.Selector(response.text)
        title = selector.css('.bookname h1::text').get()
        # getall() 获取所有的标签内容 返回的是列表 get 获取一个 返回的是字符串
        content_list = selector.css('#content::text').getall()
        # 把列表转换成字符串
        content = ''.join(content_list)
        save(name, title, content)
 #python学习群:748989764

if __name__ == '__main__':
    # url = 'https://www.biquges.com/10_10770/index.html'
    # get_novel_url(url)
    # print(response.text)
    while True:
        print('输入0即可退出程序')
        word = input('请输入你要下载的小说名字(作者): ')
        if word == '0':
            break
        search_url = 'https://www.biquges.com/modules/article/search.php'
        data = {
    
    
            'searchkey': word,
            'searchtype': 'articlename'
        }
        response_1 = requests.post(url=search_url, data=data, headers=headers)
        response_1.encoding = response_1.apparent_encoding
        selector_1 = parsel.Selector(response_1.text)
        # 第一次提取
        trs = selector_1.css('#nr')
        lis = []
        if trs:
            for tr in trs:
                novel_name = tr.css('td:nth-child(1) a::text').get()
                novel_id = tr.css('td:nth-child(1) a::attr(href)').get().replace('/', '')
                author = tr.css('td:nth-child(3)::text').get()
                dit = {
    
    
                    '书名': novel_name,
                    '作者': author,
                    '书ID': novel_id,
                }
                lis.append(dit)
            print(f'一共搜索到{
      
      len(lis)}数据内容')
            search_result = pd.DataFrame(lis)
            print(search_result)
            num = input('请输入你要下载的小说序号: ')
            # 序号对应的就是列表里面索引位置
            num_id = lis[int(num)]['书ID'] # 直接获取小说ID
            link_url = f'https://www.biquges.com/{
      
      num_id}/index.html'
            get_novel_url(link_url)
            print(f"{
      
      lis[int(num)]['书名']}已经下载完成了")
        else:
            print('抱歉,搜索没有结果^_^')

Guess you like

Origin blog.csdn.net/weixin_45841831/article/details/129881465