Python distributed crawler frame Scrapy 1-1 Introduction

The era of artificial intelligence, data first. With the advent of the era of big data, based on data provided by the service more and more, almost all of the data acquired by reptiles and completed standardized extract.

This series of blog, explain the use of Scrapy build a distributed crawler and search engine sites set up by Elasticsearch. And django, on the one hand allows the reader to have the ability to get the data, but also allows the reader in-depth knowledge of network knowledge and programming knowledge.

This series of blog ideas:

  1. Environment configuration and bedding basics
  2. Crawling real data
  3. scrapy breakthrough anti-crawler technology
  4. Advanced scrapy
  5. scrapy redis distributed reptiles
  6. elasticsearch & django achieve search engine

The following is a detailed technical content:

First, the environment configuration and bedding basics

Second, crawling real data

Three, scrapy breakthrough anti-crawler technology

 

Four, scrapy Advanced

Five, scrapy redis distributed reptiles

Six, elasticsearch & django achieve search engine

This series of blog to your experience:

  • The development of reptiles need to use technology and web analytics skills
  • Understand and use the principle of using the principles scrapy and all components, and distributed the reptile scrapy-redis
  • Understand the principle of distributed open source search engine elasticsearch and use search engines
  • Experience django how to quickly set up a website to achieve a similar effect with Baidu phase.
Published 101 original articles · won praise 26 · views 10000 +

Guess you like

Origin blog.csdn.net/liujh_990807/article/details/100026570