I've been having trouble with novels lately, because they're all about novels.
Today I will use Python to download and save the novel called Tomatoes.
Need to prepare
Environmental use
- Python 3.8
- Pycharm 2023
Module usage
- requests
- re
- parcel
requests is a third-party module. Just win + R and enter cmd, and then enter the command pip install requests to install. The other two are built-in modules and do not need to be installed.
If you don’t have software and pycharm permanent jihuo code, you can pick up the business card at the end of the article~
Source code
import requests
import re
import parsel
from prettytable import PrettyTable
from tqdm import tqdm
while True:
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36'
}
key = input('请输入你要下载的小说: 输入00退出 ')
if key == '00':
break
tb = PrettyTable()
tb.field_names = ['序号', '书名', '作者', '类型', '最新章节', 'ID']
num = 0
info = []
print('正在检索中, 请稍后.....')
for page in tqdm(range(30)):
search_url = 'https://大家自己替换一下地址.com/api/author/search/search_book/v1'
search_params = {
'filter': '127,127,127,127',
'page_count': '10',
'page_index': page,
'query_type': '0',
'query_word': key,
}
search_data = requests.get(url=search_url, params=search_params, headers=headers).json()
for i in search_data['data']['search_book_data_list']:
book_name = i['book_name']
author = i['author']
book_id = i['book_id']
category = i['category']
last_chapter_title = i['last_chapter_title']
dit = {
'book_name': book_name,
'author': author,
'category': category,
'last_chapter_title': last_chapter_title,
'book_id': book_id,
}
info.append(dit)
tb.add_row([num, book_name, author, category, last_chapter_title, book_id])
num += 1
print(tb)
book = input('请输入你要下载小说序号: ')
url = f'https://大家自己替换一下.com/page/{info[int(book)]["book_id"]}'
response = requests.get(url=url, headers=headers)
html_data = response.text
name = re.findall('<div class="info-name"><h1>(.*?)</h1', html_data)[0]
selector = parsel.Selector(html_data)
css_name = selector.css('.info-name h1::text').get()
href = selector.css('.chapter-item a::attr(href)').getall()
print(f'{name}, 小说正在下载, 请稍后....')
for index in tqdm(href):
chapter_id = index.split('/')[-1]
link = f'https://替换掉了.com/api/novel/book/reader/full/v1/?device_platform=android&parent_enterfrom=novel_channel_search.tab.&aid=2329&platform_id=1&group_id={chapter_id}&item_id={chapter_id}'
json_data = requests.get(url=link, headers=headers).json()['data']['content']
title = re.findall('<div class="tt-title">(.*?)</div>', json_data)[0]
content = '\n'.join(re.findall('<p>(.*?)</p>', json_data))
with open(f'{name}.txt', mode='a', encoding='utf-8') as f:
f.write(title)
f.write('\n')
f.write(content)
f.write('\n')
Effect
Search and download
It is still very simple. I have packed the complete code and video explanation. You can pick up the business card at the end of the article.
Okay, this sharing ends here, see you next time~