Verbatim https://www.dazhuanlan.com/2019/08/25/5d6235bb190fd/
Download reptile write comics with koa2.x:
Use koa2.x of async, await an asynchronous resolve the problem, write a comic download reptile, code, there are surprises and benefits Oh!
Project to build
- Installation nodejs> 7.6, koa-generator installation
- Direct
koa2 spider
, build the project - 安装request,request-promise,cheerio,mkdirp
- npm install install dependencies
Thinking
Pictures or cartoons reptile idea is very simple, the first observation of the laws url, the url according to the law added to the download task, in fact, to request html content, and then parse html, locate the downloaded images url (usually img src tag property values), save the url into an array, using async await control all tasks until all the images downloaded.
difficulty
But nodejs itself on asynchronous, if you go in for cycling downloaded directly, certainly not enough to be the key to the good execution of asynchronous control.
Reptiles easy to handle asynchronous difficult. Here I am using async es7, await an asynchronous with the promise to solve the problem, you can also use the async module, eventproxy, etc. asynchronous control module to solve.
Core code, spider.js
|
|
基础配置
由于爬虫的复杂性基于不同的网站,不同的任务很不一样,这里只是把几个常用的变量抽取到了config.js。
|
|
运行代码
- 下载我上传的代码koa-spider
- npm install,npm start即可运行
总结
In fact, both written reptile or some other process, using a large part of nodejs are to be processed asynchronously, must learn to learn nodejs asynchronous processing.