Python crawler| Summary of resources on the whole network

The public account " Swordsman Algorithm Jianghu " is one step ahead to get more content

With the rapid development of artificial intelligence and big data, all walks of life are changing with each passing day. Internet resources have a large amount of information carriers. How to extract and use it better and effectively requires crawler technology to play a key role. This article collects and selects the whole web crawler tutorials, from the initial entry to the Scrapy framework, one by one.

Getting Started with Detailed Tutorial of Python Crawler Basics

Python crawler basics detailed tutorial https://blog.csdn.net/m0_53602804/article/details/124204500

Reptile introduction, classification, use

A brief introduction to crawlers https://blog.csdn.net/qq_46601384/article/details/126411941

robots protocol

Robots protocol for web crawlers https://blog.csdn.net/sk_berry/article/details/110498687?spm=1001.2101.3001.6661.1&utm_medium=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-1 -110498687-blog-124896445.pc_relevant_recovery_v2&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-1-110498687-blog-124896445.pc_ relevant_recovery_v2&utm_relevant_index=1)
Introduction and detailed explanation of web crawler exclusion protocol robots.txt ult %7ECTRLIST%7ERate-1-39319157-blog-110498687.pc_relevant_multi_platform_whitelistv3&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-1-3931915 7-blog-110498687.pc_relevant_multi_platform_whitelistv3&utm_relevant_index=1

Basic use of urlib

Basic use of Python crawler urllib learning https://blog.csdn.net/weixin_51624761/article/details/125793217

re module

Python standard module re module https://blog.csdn.net/m0_54510474/article/details/119392699

regular expression

Regular expression - detailed version + common expressions 3A%22article%22%2C%22rId%22%3A%22127133108%22%2C%22source%22%3A%22BLWY_1124%22%7D

Persistent storage of crawler data

Crawler persistent storage https://blog.csdn.net/liaojsgtcg/article/details/120979546

requests module

Reptile requests module https://www.cnblogs.com/12345huangchun/p/10461211.html

requests module advanced

Advanced usage of crawler requests module https://www.cnblogs.com/supery007/p/8303472.html

Unstructured Data Crawling

Python crawls unstructured data and downloads it locally https://www.cnblogs.com/foolangirl/p/14164631.html

User-Agent and proxy IP

User-Agent and IP proxy in crawler https://www.codenong.com/cs106834522/

lxml parsing, BeautifulSoup, pyquery use

Use of crawler parsing library (lxml library BeautifulSoup library pyquery library) https://blog.csdn.net/weixin_46287157/article/details/116432393

Cookie impersonation login

Cookie simulation login https://www.cnblogs.com/maplethefox/p/11360356.html

JS responds to anti-crawling

Teach you how to handle JS reverse CSS offset https://blog.51cto.com/xingag/5342685

Ajax dynamically loads data

Dynamic loading content crawling, Ajax crawling example https://blog.csdn.net/m0_61791601/article/details/125889849

JSON module

Basic explanation of Python crawler: data persistence - introduction to json and CSV modules https://blog.csdn.net/weixin_62853513/article/details/123362153

Selenium+phantomjs chromedriver

Python crawler selenium (Selenium entry, chromedriver, Phantomjs) https://blog.csdn.net/hwwaizs/article/details/119929286

Multi-threaded, multi-process crawler

Multi-threaded crawler of Python crawler https://www.cnblogs.com/chenyangqit/p/16594946.html

Scrapy framework

Detailed explanation of crawler framework Scrapy https://blog.csdn.net/m0_67403076/article/details/126081516
Use of Python web crawler-scrapy framework https://zhuanlan.zhihu.com/p/98507774