Python crawler crawls the novel website and converts it into a voice file

foreword

As a technical geek, I think about it from the perspective of my eyes. If I can look at the screen as little as possible, but I am also a fan of novels, then use my brain to read the novel and convert it into voice to listen to the book.

Chapter 1: Crawling Fiction Files

Set the target on a website with a relatively large storage capacity of novels: Qidian Chinese Portal

The crawler is supposed to crawl all of them, but the amount is a bit large, so think about adding two lines of code, so that users can choose the novels you need, of course, you can also crawl all of them, and upload the code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2018/12/30 23:20
# @Author  : Fang
# @E-mail  : [email protected]
# @Site    : 
# @File    : qidian.py
# @Software: PyCharm


import requests
# from lxml import etree
import os
import python_baidu_api  #语音转换模块,在后面会讲,请先注释掉,否则报错

## python3.7 有点坑啊,装了lxml模块却没有etree,后来查资料用以下方法即可导入使用
import lxml.html
etree = lxml.html.etree   

class Spider(object):
    def start_request(self):
        response = requests.get("https://www.qidian.com/all")
        html = etree.HTML(response.content.decode())
        Bigtit_list=html.xpath('//div[@class="book-mid-info"]/h4/a/text()')
        Bigsrc_list=html.xpath('//div[@class="book-mid-info"]/h4/a/@href')
        # print(Bigsrc_list)
        book_name = input("请输入您要爬取的书名(eg:凡人修仙传之仙界篇):")
        for Bigsrc,Bigtit in zip(Bigsrc_list,Bigtit_list):
            if Bigtit in book_name  or book_name in Bigtit:
                if os.path.exists(Bigtit) == False:
                    os.mkdir(Bigtit)		##创建以小说名为名字的文件夹存储小说
                    print("目标文件夹已创建")
                self.xpath_data(Bigsrc,Bigtit)


    def xpath_data(self,Bigsrc,Bigtit):
        response = requests.get("https:"+Bigsrc)
        html = etree.HTML(response.content.decode())
        Littit_list = html.xpath('//ul[@class="cf"]/li/a/text()')
        Litsrc_list = html.xpath('//ul[@class="cf"]/li/a/@href')
        for Litsrc,Littit in zip(Litsrc_list,Littit_list):
            self.finally_file(Littit,Litsrc,Bigtit)

    def finally_file(self,tit,url,Bigtit):
        response = requests.get("http:"+url)
        html = etree.HTML(response.content.decode())
        content = "\n".join(html.xpath('//div[@class="read-content j_readContent"]/p/text()'))
        file_name = Bigtit + "\\" + tit +".txt"
        audio_name = Bigtit + "\\" + tit +".mp3"		#语音文件名称
        print("正在抓取文章:" + file_name)
        with open(file_name, "a", encoding="utf-8") as f:
            f.write(content)
        python_baidu_api.convert(file_name,audio_name)		#调用转语音模块进行转换

if __name__ == '__main__':
    spider=Spider()
    spider.start_request()

If you don’t need the voice conversion function, please comment out the following two lines of voice conversion code to crawl the novel txt file

import python_baidu_api  #语音转换模块,在后面会讲,请先注释掉,否则报错
python_baidu_api.convert(file_name,audio_name)		#调用转语音模块进行转换

Chapter 2: Text to Speech

This step needs to go to the Baidu ai open platform for voice synthesis to register and use. You can try it for free first, get APPID AK SK, and fill in the corresponding position of APPID AK SK in the following code. I will not make it public here. The code is replaced by xxx. Please replace it by yourself after applying

Create another python_baidu_api.py file in the current folder , which is the voice module just imported. The code inside is as follows:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2018/12/30 22:53
# @Author  : Fang
# @E-mail  : [email protected]
# @Site    : 
# @File    : python_baidu_api.py
# @Software: PyCharm
from aip import AipSpeech
import os

#这里的xxx请替换成你自己的 APPID AK SK
APP_ID = 'xxx'
API_KEY = 'xxx'
SECRET_KEY = 'xxx'
client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

def convert(file,audio_name):
    with open(file,"r",encoding="utf-8") as file_object:
        contents = file_object.read()
        print("正在转换{}".format(file))
        while len(contents)>=2000:
            tmp = contents[:2000]
            result = client.synthesis(tmp,"zh",1,{
                "vol":5,   #音量,取值0-15,默认为5中音量
                "spd":4,	#	语速,取值0-9,默认为5中语速
                "pit":9,	#	音调,取值0-9,默认为5中语调
                "per":3,	#	发音人选择, 0为女声,1为男声,3为情感合成-度逍遥,4为情感合成-度丫丫,默认为普通女
            })
            contents = contents[2000:]
            # with open("{}.mp3".format("./txtaudio/{}".format(file)),"wb") as f:
            try:
                with open("{}.mp3".format(audio_name),"ab") as f:
                    f.write(result)
                    print("{}转换完成".format(audio_name))
            except:
                print("error")
if __name__ == '__main__':
    convert(file,audio_name)

Here you can synthesize speech, go back to the first py file you just wrote, and run it:
insert image description here
each txt corresponds to an mp3 file

insert image description here

Then you can enjoy the novel with your ears
insert image description here

Guess you like

Origin blog.csdn.net/Running_free/article/details/85553373