Python learning crawler project ScrapyProject summary

Project name: ScrapyProject

Project Introduction:

1 Crawl books http://www.shicimingju.com:

1). 请求图书详情页parse(self, response)函数的修改-ScrapyProject/ScrapyProject/spiders/book.py
2). 对章节详情页进行解析parse_chapter_detail函数的修改-ScrapyProject/ScrapyProject/spiders/book.py
3). 将采集的数据存储到文件中, pipeeline组件-ScrapyProject/ScrapyProject/pipelines.py
4). 设置文件中启动pipeline组件-ScrapyProject/ScrapyProject/settings.py

2 Grab the detailed information of the goods, the situation of storage

 1. 用过sqlalchemy(ORM)将数据信息添加到数据库中
  2.日志信息的配置,图片的配置
  3.根据python数据类型解析商品的详情信息
  4.将尺寸信息序列化为json字符串,如果总库存存在 存储商品信息

Technical Difficulties:

   1) 如何处理解析后的数据?
    2). 如何获取/下载小说章节详情页的链接并下载到本地?

Difficulties I encountered in writing the project

1. How to analyze the important information to be crawled on the website
2. How to understand the role of items on the project
3. The choice of crawling project resources to write to the database, meaning
4. If the URL to be crawled has a user password verification code to log in how to solve
Project URL: https://gitee.com/huojin181/ScrapyProject.git

Guess you like

Origin blog.51cto.com/13810716/2489376