Python3 crawler video learning tutorial

The following is to forward the blog content, it is very useful

Hello everyone, it’s been more than two years since I blogged. Maybe you come here to see more blog posts about reptiles. First of all, thank you very much for your support. I hope my blog posts will be helpful to you!

Before, I wrote some articles on Python crawler, Python crawler learning series , which involved some basic and advanced content. At that time, Urllib and regularization were used more. Later, some articles were added one after another. During the learning process I have gradually accumulated and gradually formed a set of tutorials that are not a tutorial. Later, more and more small partners learned and supported me. I am very happy, thank you again!

However, there are some problems with these tutorials in general:

  1. It was written in Python2 at the time. When it was first written, the Scrapy framework did not support Python3, and some Python3 crawler libraries were not very mature, so Python2 was chosen at that time. But now, Python3 has developed rapidly, the crawler library has become more and more mature, and Python2 will stop maintenance in the near future, so slowly, my language focus has gradually shifted to Python3, and I also believe that Python3 will become mainstream. So the previous set of courses is a bit outdated, I believe you are still looking for some Python3 tutorials.
  2. When I was learning at that time, I mainly used urllib and regular, so the larger space of these articles was also some things of urllib and regular, and some advanced libraries were added later, and some advanced framework usages were not Doing in-depth explanations, so I feel that the whole content is a bit top-heavy and the arrangement is unreasonable. And now that distributed is getting more and more popular, the application of distributed crawler must be more and more extensive, and the previous courses did not give systematic explanations.

  3. When introducing some operations, the introduction may not be comprehensive, and the configuration of the environment does not take into account various platforms, so some small partners may be confused, and may be stuck at a certain step and do not know what to do next.

So to sum up the above problems, I recently spent nearly a month recording a new set of Pyhthon3 crawler video tutorials, reorganizing and integrating some of my previous crawler experience, using Python3 to write, from the environment configuration , The basic library is explained to the actual case, the use of the framework, and finally to the distributed crawler for a more systematic explanation.

The course content is like this:

1. Environment

  • Python3+Pip environment configuration
  • MongoDB environment configuration
  • Redis environment configuration
  • MySQL environment configuration
  • Python multi-version coexistence configuration
  • Installation of common libraries for Python crawlers

2. Basics

  • Basic principles of reptiles
  • Urllib library basic use
  • Basic use of the Requests library
  • Regular Expression Basics
  • BeautifulSoup in detail
  • Detailed PyQuery
  • Selenium details

Three, actual combat

  • Use Requests + regular expressions to crawl cat's eye movies
  • Analyze Ajax requests and capture today's headlines
  • Use Selenium to simulate the browser to grab the food information of Taobao products
  • Use Redis+Flask to maintain dynamic proxy pool
  • Use a proxy to handle anti-crawling and crawling WeChat articles
  • Use Redis+Flask to maintain dynamic cookies pool

Fourth, the framework 

  • Basic use of PySpider framework and actual combat for grabbing TripAdvisor
  • PySpider architecture overview and usage details
  • Installation of Scrapy Framework
  • Basic use of Scrapy framework
  • Scrapy command line details
  • Usage of selectors in Scrapy
  • Usage of Spiders in Scrapy
  • Usage of Item Pipeline in Scrapy
  • Usage of Download Middleware in Scrapy
  • Scrapy crawls Zhihu user information in practice
  • Scrapy+Cookies pool to crawl Sina Weibo
  • Scrapy+Tushare crawls Weibo stock data

5. Distributed articles

  • Scrapy distributed principle and Scrapy-Redis source code analysis
  • Scrapy distributed architecture builds crawling knowledge
  • Scrapy distributed deployment details

The whole course starts from Xiaobai, starting from the environment configuration and basics. The three major platforms in the environment installation part are introduced. I will explain the practical part while writing, and some distributed crawler construction processes are also introduced.

However, this course is charged. In fact, it also contains my experience and sweat since learning crawler. When I explain, I will also explain some ideas and ideas of my crawler learning, so as to avoid some detours. I hope everyone can Support!

However, there are free videos here, which are part of the whole course, and you can watch them directly

Practical sharing of three major cases of Python3 crawler

http://www.meimei689.cn/

The whole set of video courses is placed on Tianshan Intelligence. If you are interested, you can buy it directly here for 499 yuan.

Course links are below:

自己动手,丰衣足食!Python3网络爬虫实战案例

http://www.gg4493.cn/

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326565620&siteId=291194637