Reptile source code --- crawl the novel you want to read

Foreword:

As a pastime tool in our free time, novels are very helpful for us to pass free time, and when we browse novels on the website, our viewing experience will be affected by advertisements and other things, and at this time we can use crawlers to Download the novels we want to watch so we don't have to worry about ads.

One: Environment configuration

Python version: 3.7.3

IDE:PyCharm

Required libraries: requests, lxml, time

Two: Preparation

1: Install the libraries we need. 

2: We need to create a folder in a specified location on the computer to save the novels we crawled.

3: We need to download the XPATH plug-in so that we can get the name of the novel (the resource has been uploaded, you can download and install it yourself).

Three: specific code implementation

import requests
from lxml import etree
import time
url = 'https://www.biquge365.net/newbook/33411/'
head = {
    'Referer': 'https://www.biquge365.net/book/33411/',
    'users-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.39'
}
response = requests.get(url,headers = head,verify = False)
# print(response.text)
html = etree.HTML(response.text)
novel_name = html.xpath('/html/body/div[1]/div[3]/div[1]/h1/text()')[0]
novel_directory = html.xpath('/html/body/div[1]/div[4]/ul/li[*]/a/@href')
#由于网站可能具有反扒措施,所以我们设置一下时间,防止被反扒
time.sleep(6)
for i in novel_directory:
    com_url = 'https://www.biquge365.net'+i
    response2 = requests.get(com_url,headers=head)
    html2 = etree.HTML(response2.text)
    novel_chapter = html2.xpath('//*[@id="neirong"]/h1/text()')[0]
    novel_content = '\n'.join(html2.xpath('//*[@id="txt"]/text()'))
    with open('E:\\python源码\\爬虫教程\\小说.txt'+novel_chapter+'.txt','w',encoding='utf-8') as file:
        file.write(novel_chapter+'\n'+novel_content+'\n')
        file.close()
        print("下载成功"+novel_chapter)

Four: Results display

d93252edb9fc4bfa9704bd0464967458.png

Guess you like

Origin blog.csdn.net/qq_52351946/article/details/132679514