No programming reptiles crawler

Train collector

URL: http://www.locoy.com/
train collectors have been 13 years old, is a veteran of acquisition tool. Not only do crawlers, you can do data cleaning, analysis, visualization has been digging work. Data source for most of the web page, the page can see content can crawl through the acquisition rules.

Octopus

URL: https://www.bazhuayu.com/
octopus is also a well-known collection tool, it has two versions, a free collection of templates, there is a cloud acquisition (fee).

Free template collection is actually content acquisition rules, including electrical business class, life services, forums and social media-like sites can be collected with ease of use. Of course, you can also customize the task.

So what is cloud collect it? Your task is to configure the collection, it can give octopus cloud collection. Octopus, a total of 5,000 servers, multi-node concurrent acquisition through the clouds, collecting much faster than local acquisition. Furthermore can also automatically switching a plurality of IP, IP avoid blocked affect acquisition.

Many times IP and cloud key collection is automated collection of automatic switching.

Set off search

This feature of the tool is completely visual operation, without programming. The entire acquisition process is WYSIWYG, crawl result information, error messages, and so the reaction in the software. Octopus in it, did not set off the search process compared to the concept, users only need to focus on what data to fetch, and the flow entirely to the details set off to search processing.

The disadvantage is not set off the search cloud collection features all the reptiles are running on the user's own computer.

Updated: 2019-12-31

Published 291 original articles · won praise 104 · views 410 000 +

Guess you like

Origin blog.csdn.net/Enjolras_fuu/article/details/103778746