Python web crawler tool library collection

Students who often visit GitHub may have heard of the famous awesome repository, yes, this is it: https://github.com/sindresorhus/awesome.

 

This library can be described as a great treasure, including materials, tools and libraries in almost all fields of technology, such as platforms, programming languages, front-end development, back-end development, big data, data science, databases, security, hardware, DevOps and so on, pretty much everything that comes to mind.

Take the Platform branch as an example. There are further subdivisions, such as iOS, Android, Linux, macOS, JVM, etc., and each of them is a new repository starting with awesome, such as:

  • awesome-linux:https://github.com/inputsh/awesome-linux

  • awesome-android:https://github.com/JStumpp/awesome-android

  • awesome-macOS:https://github.com/iCHAIT/awesome-macOS

Yes, each sub-repository contains almost all materials, tools, libraries, etc. about the field.

That is to say, the awesome (https://github.com/sindresorhus/awesome) library is the root, and then awesome sub-repositories in various fields and directions are derived to collect data and tool libraries in the corresponding fields, and are used by programs around the world. Members maintain and contribute together.

Really full of treasures!

Some friends will be curious now, is there an awesome library for crawlers? have!

awesome-web-scraping

This is it: https://github.com/lorien/awesome-web-scraping

It collects various information about web crawlers, a list of tool libraries, not only Python, but also Go, Ruby, JavaScript, PHP, etc. The home page is as follows:

There are also some crawler commercial services, console tools, headless browsers, captcha cracking websites, and more.

For example, let's take a look at Python, which collects various request libraries, parsing libraries, data processing libraries, etc.:

I won't put them all here, is it big and complete?

awesome-web-scraping Chinese version

Yes, the awesome series repositories also have smaller branches, which are branches divided by language, such as Chinese version, Japanese version, Russian version, etc. For example, awesome-windows is divided into Chinese: https:// github.com/Awesome-Windows/Awesome,

Many other awesome repositories also have Chinese, such as:

  • awesome-anrdoid Chinese: https://github.com/jobbole/awesome-android-cn

  • awesome-ios Chinese: https://github.com/jobbole/awesome-ios-cn

 awesome-web-scraping Chinese version repository: https://github.com/Germey/AwesomeWebScraping.

In fact, it is a translation of the original warehouse, and it also distinguishes various languages, such as Python, JavaScript and other language tool libraries are all in it! In addition, each language has many categories, such as request library, crawling framework, parsing library, natural language processing, message queue, etc., all here:

For example, some of the contents are roughly as follows:

Attach github:

https://github.com/Germey/AwesomeWebScraping

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324342426&siteId=291194637