scrapy, wrote a simple python crawler project, the function is to collect all the novels of a novel website and save them locally
Give it to python friends who have just learned scrapy to learn.
Part of the Python code:
# -*- coding: utf-8 -*-
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
import them
class BiqugePipeline(object):
def process_item(self, item, spider):
#return item
curPath = 'E:/novel/'
tempPath = str(item['name'])
targetPath = curPath+ tempPath
#print('-----')
#print(targetPath)
if not os.path.exists(targetPath):
os.makedirs(targetPath)
filename_path = targetPath+'/'+ str(item['chapter_name']) + '.txt'
print('------')
print(filename_path)
print(item['chapter_content'])
with open(filename_path, 'a', encoding='utf-8') as f:
f.writelines(item['chapter_content'])
return item