Earn 6000 per month if you learn crawling for 1 month? Don't be deceived, the master tells you the real situation of reptiles!

Here's a real event I saw the other day and why I wrote this article:

A few days ago, a fan reported to me that someone from a certain institution told him that he could take orders within a month of learning reptiles, and asked this young man to sign up for the reptile course of that institution. Tuition earned back. Maybe it's because I have a lot of communication with fans, so the guy found me and asked me the truth of this matter, I couldn't help but be speechless...

Can you earn more than 6,000 yuan by taking orders after learning crawling for 1 month? Now there are countless people who can crawl, and a novice can earn 6,000 per month in one month?

insert image description here

With an objective attitude, even if I didn’t believe it, I didn’t go to a conclusion, but took a look at their curriculum system. The result was not what I expected. Most of the courses were about Python introductory knowledge (functions, etc.), requests and XPath. Wait, isn't this all the knowledge of some junior crawlers? Can you earn 6000 a month? Why don't you teach young people to go out and grab money on the street?

I have been doing things for extra money for many years, and crawlers are naturally no problem. So today I will talk about 5 in-depth crawling problems, so that you can understand the real situation of crawlers:

1. Can the current crawlers earn 6,000 extra dollars a month by taking orders?

2. Junior crawler can only take some small orders, what is the level of junior crawler?

3. Intermediate reptiles are professional reptile engineers. What do they need to have?

4. Advanced reptiles can be said to be the gods of reptiles. What technologies do they need to master?

5. What do reptiles need to learn at a higher level? What does the pinnacle reptile look like?


1. Can crawlers earn 6,000 extra a month?

The answer is definitely yes, but it depends on your crawler skill level.

If you are just a junior crawler, you can only accept orders by luck . Some crawler works you can produce may not be able to enter the eyes of the big single sponsors. God, most of the junior crawler people will not take more than 200 yuan of orders, most of them are orders of tens of dollars, how many orders do you need to take to earn 6,000 per month? Even if the average price of your order is 100 yuan, then you still need 60 orders!

Anyone who has worked part-time knows that 60 private jobs a month is almost impossible, unless you have special channels.

Furthermore, aside from the primary crawlers and even product managers, there are so many third-party websites that provide powerful crawler functions. People who don’t know how to crawl can also solve it with a little money, such as a claw fish and a certain descendant collector. , both in terms of time and cost, it is better than finding and paying for a new crawler.

The fact that a novice can earn 6,000 yuan a month by learning crawling for one month, I dare to guarantee that this is just to encourage you to sign up for classes. This kind of method is not uncommon in the Internet education industry where the good and the bad are mixed. I will directly give my conclusion: it is not worth it . Money, you can't make 6,000 a month from reptiles after you finish your studies. At this level, you can't make a lot of money for a year.
insert image description here

But if your skills are at an intermediate crawler or higher, it's all about strength and luck . From a technical point of view, there is no problem with taking larger orders. The price of a single order is also in the range of 300 to several thousand. If the average price is 600 yuan per order, you can make four or five orders a month and earn thousands of dollars. There is no problem with dollars. Those who fight a little or have better skills may earn more. The premise is that you have to have this skill. Slap in the face and pretend to be fat will capsize.

It is possible to earn 6,000 yuan , and I have done a list of several thousand yuan before.

insert image description here

As for where to go to take orders, it’s a commonplace. I won’t talk about it here. Go to Baidu. Baidu has everything. Let’s continue with the following topics to see what the primary, intermediate, advanced and peak levels of crawlers look like. !


2. Primary reptiles

According to my understanding of crawlers over the years, the level of primary crawlers is probably like this:

insert image description here
(Compared with people who steal pictures and texts recently, the pictures are watermarked to prevent unscrupulous CV Dafa. Those who need source files can chat with me privately.)

What can this level do? It is some basic websites of crawlers, involving a little anti-crawling and GG.

For example, if we go to crawl an article of a certain website, this website does not have an anti-crawling mechanism, then it is enough to use a library such as requests, and use XPath, BeautifulSoup, PyQuery or regular expression to parse the source code of the webpage, and add a text Write and save and you're done.

The difficulty is not too big, it is nothing more than a few method calls and cyclic storage. If the storage is slightly expanded, you can connect to MySQL, MongoDB, Elasticsearch, Kafka, etc. to save data and achieve persistent storage. It will be more convenient to query or operate in the future.

This is the level of primary reptiles. It can crawl, but there is still a long way to go from "visible to crawl", and it can be imagined that it will be more difficult to receive orders. Although it is very basic, it is a must for you to learn reptiles. road.

So let's review what happened to the guy in front. Can the above things be learned in one month for beginners? I think the difficulty is not small, I will not say anything else, just say that the introduction to Python includes a lot of things.
insert image description here
insert image description here

Learning 4 hours a day, if you don’t have the basics, it may take you 2 weeks to learn and stabilize in the Python entry. In the remaining two weeks, can you finish and master the remaining knowledge of the primary crawler?

Quick success is very taboo on the road of technology . I know it only takes a few days for you to read and understand a book from start to finish, but can you use it after reading it? After reading it, you can't remember what you saw. You need to practice repeatedly. Similarly, you can follow it in one month without any problem, but whether you can stand firm or not is still a problem.

What's more, the courses of some institutions are picky.


3. Intermediate reptiles

The level of intermediate reptiles can be regarded as the basic level of professional reptile masters. In addition to the knowledge points of primary reptiles, you should also master the following knowledge points:

insert image description here

1. Crawling method

When your requests are not useful (the ones that climb down are different from those displayed on the web page), you should think that the data source may be Ajax, and you need to understand JavaScript when you analyze the website; if you want to bypass the analysis of Ajax and some To crawl data through the process of JavaScript logic, we have to use Puppeteer, Pyppeteer, Selenium, Splash, etc. to simulate browsers to crawl.

2. Crawling speed

In addition to the crawling method, there is also the crawling speed. At this time, you have to have a knowledge reserve of multi-process, multi-thread, and coroutine.

3. Climbing APP

If you can only crawl web pages, then you are not at the level of intermediate crawlers, you have to be able to crawl APP, and APP also occupies half of the country .

At this time, you have to capture the packets with Charles and Fiddler, and then use them to simulate; if the interface is encrypted, you can use mitmproxy to directly monitor the interface data or use Hook, for example, Xposed can also get it.

Another important thing when crawling APP is automatic crawling . If you manually poke to realize the crawler, it is useless to give more money. This is not a personal job... A better solution is adb tool and Appium. Do you think you should learn it?

insert image description here

Fan benefits, click to view


4. Advanced reptiles

Senior reptile masters have great advantages both in the workplace and part-time jobs. Advanced reptiles should master the following technologies:

insert image description here

1. Enterprise level crawler

Anyone who has been in contact with large-scale crawlers will realize that although multi-threading, multi-process and coroutine can speed up crawling, it is still a stand-alone crawler, which is inferior to more advanced distributed crawlers Many, distributed crawlers can be regarded as enterprise-level crawlers.

The focus of distributed crawlers is resource sharing, so what we need to master is RabbitMQ, Celery, Kafka, and use these basic queues or components to achieve distribution; followed by our famous Scrapy crawler framework, which is also currently The most used crawler framework is essential for understanding and mastering Scrapy's Redis, Redis-BloomFilter, and Cluster.

After mastering these things, your crawler can reach an enterprise-level high-efficiency crawler.

insert image description here

2. Technology to deal with anti-climbing

Another focus that should be considered at the advanced crawler level is anti-crawling.

The common operation of the webpage anti-crawling mechanism is the verification code , such as slider verification, physical check, addition and subtraction, etc. There are endless moves. At this time, you have to know how to deal with these common verification codes.

There is also the common IP detection in anti-crawling. If it is not good, your account will be blocked, so the countermeasures are also necessary. Whether you use a free proxy or a paid proxy to change the proxy IP, it is possible.

And the diversion technology to deal with anti-climbing to avoid account being blocked , diversion technology has to build a pool, Cookies pool, Token pool, Sign pool, all of them can, after having a pool, the probability of your being blocked will be reduced, and you don’t want to crawl As a result of the public account, WX was blocked, right?

insert image description here


5. Higher level reptiles (the peak of reptiles)

For a higher level of crawler, the following 4 points are a must:

insert image description here

1. JS reverse

Why learn JS reverse crawling? In the confrontation between anti-climbing and anti-anti-climbing, it is also possible to use Selenium and other methods to climb, but the efficiency is still low. After all, it simulates the whole process of web page rendering, and the real data may only be hidden in a small interface. , so JS reverse is a higher-level crawling technology, especially in the data crawling of large websites, such as Duoduo and Baobao, if you can use JS to reverse climb down, it is undoubtedly one of the proofs of superb technology. But JS reverse is not something anyone can practice, and it does burn their hair.

Not to mention the reverse of the APP, the web page can be reversed, and the APP can also be reversed, then you are worthy of the word "brilliant".

2. Intelligent crawler

What is an intelligent crawler? For example, under normal circumstances, to write a crawler to crawl a novel website, you need to write different extraction rules according to different websites to extract the desired content. And if you use intelligent parsing, no matter which website it is, you only need to pass the url of the webpage to it, and the algorithm can intelligently identify information such as title, content, update time, etc., without repeatedly writing extraction rules.

In short, intelligent crawler is the combination of crawler and machine learning technology to make the crawler more intelligent . Otherwise, if we need to crawl 10,000 websites, do we need to write 10,000 crawler scripts?

insert image description here

3. Crawler and operation and maintenance

When did crawlers have a relationship with operation and maintenance? They have always been inseparable, but your crawler requirements or level have not been reached, so they will not be considered.

The relationship between crawlers and operation and maintenance is mainly reflected in the aspects of deployment and distribution, data storage and monitoring.

For example, how to quickly deploy 1 crawler to 100 hosts to run? For example, how to monitor the occupied memory and CPU status of some crawlers? For example, how to set up an alarm mechanism for crawlers to ensure the safety of crawler projects?

Kubernetes, Prometheus, and Grafana are the technologies that crawlers use more in operation and maintenance, and I often use them to escort them when doing larger crawler projects.

4. The pinnacle of reptiles

What is the pinnacle? There may never be a peak...As long as I don't have a strong hairstyle (full baldness) for one day, I can't say that I have seen the peak...

I vaguely feel that crawlers have achieved the ultimate, not only can do full stack, but also do data analysis, or it is still an algorithm master, maybe it can still make achievements in artificial intelligence, is this difficult to be the peak of crawlers?

That's all for today's sharing. May you and I both become men at the top of the pyramid!

insert image description here

Guess you like

Origin blog.csdn.net/zhiguigu/article/details/119183756