Python crawlers usually use what framework? Five common framework introduced!

  Python crawlers usually use what framework? What a good framework? Python development so that we understand it more, and for large enterprises, Python framework is very important, so Python framework reptiles What? Introduction to introduce five common types.

  1, Scrapy: Scrapy is a website for crawling data, extract structured data written application framework. Applications can include program data mining, processed, or stored in a series of historical data. It is very powerful reptile framework to meet the simple page crawling, such as can be clearly informed of the situation url pattern. With this framework, data can easily climb down commodity information such as the Amazon and the like. But for a little more complex pages, such as pages of information on weibo, this framework will not meet the needs. Its characteristics are: HTML, XML source data selection and extraction of built-in support; Providing a series of filters can be multiplexed shared between the spider (i.e. Item Loaders), the intelligent data processing crawling provide built-in support.

  2, PySpider: pyspider is a powerful web crawler system with a python implementation, scheduling can be a script written in the browser interface, functions and crawling results of real-time viewing, using a common back-end database crawling results storage, but also to set the regular tasks and task priority.

  3, Crawley: Crawley high-speed crawling content corresponding to the site to support relational and non-relational databases, data can be exported as JSON, XML and so on.

  4, Portia: is an open source visualization crawler tool that allows users to crawl the site without requiring any programming knowledge simply annotate the page they are interested in, Portia will create a spider to extract data from a similar page!. Simply put, it is based on scrapy kernel; visualization crawling content, does not require any development expertise; dynamic content to match the same template.

  5, Grab: Grab is a Python framework for building Web scraper. With Grab, you can build complex web crawler, from simple 5-line script to handle millions of pages of complex asynchronous website crawlers. Grab provides an API for performing network requests and process the received content, for example, to interact with HTML documents DOM tree.

  These are the five common framework introduced the mainstream Python reptile, five framework is different, we can determine the practical scene according to their needs.

Guess you like

Origin blog.51cto.com/14596632/2456086