Is it difficult to teach yourself reptiles?

Preface

The threshold for crawling technology is not high. If you want to learn crawling by yourself from scratch, this is a must!

"If you want to do your job well, you must first sharpen your tools." Python has powerful functions and simple and easy-to-use syntax. It is a powerful tool for web crawlers. It is recommended to start with Python language.

Wei Shidong, senior crawler engineer, author of "Python 3 Anti-crawler Principles and Bypass Practice", he is from the Internet operation position Xiaobai taught himself Python and changed his career to become a senior crawler engineer.

How to take the crawler road? He saidit is very necessary to plan ahead. Based on his own experience of learning reptiles from scratch, he gave 5 suggestions to beginners. Hope this helps.

Keep this reptile’s martial arts secrets!

1. From beginner to master, how many pitfalls are there in the middle?

It is not difficult to get started with crawlers, but crawlers, as a comprehensive technology, require crawler engineers to have strongcomprehensive capabilities.

Not only do you need to understand data extraction and network requests, but you also need to understand front-end, back-end, APP, and even PC-side applications. In this process, you need to overcome 3 difficulties.

JavaScript is one of them. It will bring certain difficulties to actual operations, such as code obfuscation, parameter encryption, and some operations that you must click with the mouse to complete in response to events, which requires you to understand JavaScript.

APP is another difficulty. In addition to code obfuscation and parameter encryption, the APP will also add a shell on the outside and then reinforce it. Even if you reverse engineer, it will be difficult to see its code.

Deep learning is the third difficulty. Deep learning is a technical field that is currently being integrated into all major industries. Use deep learning for verification code recognition, font anti-crawling, etc.

I found that many friends actually have an inexplicable fear of verification code recognition, JavaScript obfuscation, WebSocket and font anti-crawlers, and feel that these are difficult problems to solve.

In fact, as long as we understand how it works, we can find a breakthrough. Crawlers and anti-crawlers are both applications of comprehensive knowledge. It is not enough to simply understand a certain anti-crawler implementation method or bypass technique. We should have an in-depth understanding of its implementation principles so that we can go further on the career path of a crawler engineer. .

2. Three minutes of heat, how should I persist?

Persistence is hard. Growing from a junior crawler engineer to a senior crawler engineer must go through many difficulties in the process. Remember the three-minute heat, and learn to set small stage goals for yourself.

The first stage: Reserve basic knowledge, first find a crawler-related job, and start practicing. At this stage, you can try to help other friends solve problems in the community, gain recognition and a sense of accomplishment, and give yourself a motivation to move forward.

The second stage: As your business volume continues to increase, you need to reserve more knowledge and start to get into deeper contact with crawlers.

The third stage: Any crawler engineer will be exposed to anti-crawlers. While crawling other people's data, you must also prevent your own data from being crawled.

The fourth stage: Pursue the refinement and accuracy of data.

In the process of learning, you will definitely encounter various operational problems. At this time, you should diligently read documents and read more source codes, or you can write your own problem-solving process into technical articles< /span>, change the perspective to look at the problem, and the problem seems to be solved.

Let knowledge move from absorption to transformation, from ignorance to understanding to mastery. In addition, through its own technical output, crawlers can also generate value and convert it into income.

You can write your technical journey into a book, or a blog, or make it into a live class. These can not only help other developers who are getting started, but also provide motivation for you to continue learning.

3. Why do I feel more and more difficult?

Whether in the process of learning or at work, I always encounter all kinds of strange needs and anti-crawlers.

As a crawler engineer, you are destined to encounter strange needs and anti-crawlers. This is just like back-end R&D facing product managers and concurrency challenges, and more like martial arts students often competing against different opponents.

Encountering these will only make you stronger!

Although I kept learning and made some progress, I always felt that the challenges I encountered were getting more and more difficult.

If you encounter the above problems, it means that you are in a technical bottleneck period. The earlier the bottleneck period comes, the faster you make progress.

How to break through the bottleneck period?

Persistence and learningThe best way to hold on until you break through the bottleneck, even though you may feel uncomfortable. The best way to solve technical problems is either reading books or doing experiments. If the problems you encounter can be solved through learning, then quickly buy a book or a tutorial; if the problems you encounter cannot be solved through learning, then do more Some experiments.

Nirvana: Sometimes if you can’t think of a problem for a few days, go out for a walk and you will have new ideas when you come back.

I am very busy at work and spend most of my time writing path search syntax (Xpath, CSS selectors). I have very little time to study and research, and it seems difficult to make progress.

You must already know the path search syntax. You can try to communicate with company leaders to see if you can reduce the workload of path search syntax (this type of work is usually given to newly hired engineers or interns. On the one hand, it allows them to quickly familiarize themselves with the business, and on the other hand, it can reduce the workload of path search syntax. Duplicate workload of technical main force), spend more time on the research of technical difficulties.

4. Career path of a crawler engineer

If you are a crawler engineer, then you are most likely a Python developer. From getting started with Python to becoming a crawler engineer, the general route is as follows: Python developer - Getting started with crawlers - Junior crawler engineer...

img

Reptile jobs are mostly in first-tier cities. In data-driven companies, crawler engineers will be more valued. From junior crawler engineer to senior crawler engineer, the salary fluctuates between 10k and 30k due to different responsibilities.

img

Crawler engineers have to face ever-changing web pages every day, which is full of freshness and challenges. Sometimes you feel that this job is pretty good, but sometimes you feel that the job is not particularly good, so whether you should change careers or not has always been a dilemma for you.

Instead of getting entangled, it is better to choose to take root in the current field, go vertically, and avoid wavering. After all, if you change careers midway, you have to start from scratch, your salary will be reduced in half, and you will have to learn new knowledge in other fields. The gains and losses in between must be carefully considered.

5. Are crawlers legal?

In 2010, software engineer Pete Warden received a cease-and-desist letter from Facebook for building a web crawler to collect data from Facebook. He immediately stopped his actions. Someone asked him why he complied with Facebook's request. He said: "Although big data is cheap, legal fees are not cheap."

Therefore, gentlemen mustobey the robots agreement.

In the era of big data, many companies use web crawlers to collect public information. Although there is currently no legal provision that completely targets crawlers, crawler engineers still have to keep a line in mind and never cross it. Otherwise, if you are not careful, you may go from entry to jail.

In daily reptile work, you should be aware of some precautions. Information related to personal privacy and corporate details cannot be crawled. Some data with commercial purposes, copyrighted data or confidential information cannot be crawled.

When doing crawler work, you should pay attention to controlling the access frequency of the crawler. When the traffic generated by the crawler program exceeds 1/3 of the website traffic, you will be responsible for any problems.

Also pay attention to the final flow of the data to see if it is used for illegal purposes. If you crack someone else's product illegally and disclose the specific method, this is also not allowed.

In addition, not all data can be shared. While you are proficient in business, you should also pay attention to these legal issues to avoid causing trouble to yourself or the company.

If you are a crawler engineer, you may encounter all of the above problems at work. Although it is not difficult to get started with crawlers quickly, the most important thing is persistence, reading more source code and reading more documents. I hope that everyone who is getting started and learning crawlers can calm down, study seriously, and achieve breakthroughs.

Wang Guowei said in "Human Words":

Those who have achieved great things and learned great things in ancient and modern times must pass through three realms: "Last night, the west wind withered the green trees, and I climbed up to the tall building alone, looking to the end of the world." This is the first realm. "The belt is getting wider and wider, but I don't regret it anymore. I feel haggard because of the loss of my clothes." This is the second state. "The crowd searched for him thousands of times, but suddenly I looked back, and there he was, in a dimly lit place." This is the third realm.

In other words, only after the first stage of climbing high and looking far ahead, summarizing and learning from previous experience, and the second stage of focusing and persevering in learning to achieve goals can we achieve the third stage of enlightenment and achievements. Let’s encourage you!

-END-


1. Introduction to Python

The following content is the basic knowledge necessary for all application directions of Python. If you want to do crawlers, data analysis or artificial intelligence, you must first learn them. Anything high-end is built on a primitive foundation. By laying a good foundation, the road ahead will be more stable.All information is available for free at the end of the article!!!

Include:

Computer Basics

Insert image description here

python basics

Insert image description here

Python introductory video episode 600:

Watch zero-based learning videos. Watching videos is the fastest and most effective way to learn. It is easy to get started by following the teacher's ideas in the video, from basic to in-depth.

2. Python crawler

As a popular direction, crawlers are a good choice whether you use them part-time or as an auxiliary skill to improve work efficiency.

Through crawler technology, relevant content can be collected, analyzed and selected to get the information we really need.

This information collection, analysis and integration work can be applied to a very wide range of areas. Whether it is life services, travel, financial investment, product market demand of various manufacturing industries, etc., crawler technology can be used to obtain more accurate and effective information. use.

Insert image description here

Python crawler video information

Insert image description here

3. Data analysis

The "Digital Transformation of China's Economy: Talent and Employment" report released by Tsinghua University School of Economics and Management shows that the data analysis talent gap is expected to reach 2.3 million in 2025.

With such a huge talent gap, data analysis is like a vast blue ocean! Starting salary of 10K is really commonplace.

Insert image description here

4. Database and ETL data warehouse

Enterprises need to regularly transfer cold data from the business database and store it in a warehouse dedicated to storing historical data. Each department can provide unified data services based on its own business characteristics. This warehouse is a data warehouse.

The traditional data warehouse integrated processing architecture is ETL. Using the capabilities of the ETL platform, E = extract data from the source database, L = clean the data (data that does not comply with the rules) and transform the table (perform different dimensions and granularity on the table according to business needs) degree, different business rules calculation and statistics), T=load the processed table to the data warehouse in increments, full quantities, and different times.

Insert image description here

5. Machine Learning

Machine learning is to learn from a part of the computer data, and then predict and judge other data.

The core of machine learning is "using algorithms to parse data, learn from it, and then make decisions or predictions about new data." That is to say, the computer uses the data obtained to derive a certain model, and then uses this model to make predictions. This process is somewhat similar to the human learning process. For example, after a person acquires certain experience, he or she can predict new problems.

Insert image description here

Machine learning materials:

Insert image description here

6. Advanced Python

From basic syntax content to many in-depth advanced knowledge points and understanding of programming language design, after studying here, you will basically understand all the knowledge points from entry to advanced python.

Insert image description here

At this point, you can basically meet the company's employment requirements. If you still don't know where to find interview materials and resume templates, I have compiled one here for you. It can really be said to be a systematic learning route for nannies and caregivers. .

Insert image description here
But learning programming does not happen overnight, but requires long-term persistence and training. In organizing this learning route, I hope to make progress together with everyone, and I can also review some technical points myself. Whether you are a newbie in programming or an experienced programmer who needs to advance, I believe everyone can gain something from it.

It doesn’t happen overnight, but requires long-term persistence and training. In organizing this learning route, I hope to make progress together with everyone, and I can also review some technical points myself. Whether you are a newbie in programming or an experienced programmer who needs to advance, I believe everyone can gain something from it.

Data collection

This complete version of Python learning materials has been uploaded to CSDN official. If you need it, you can click on the CSDN official certification WeChat card below to get it for free↓↓↓< a i=2>【Guaranteed 100% Free】

Insert image description here

Guess you like

Origin blog.csdn.net/weixin_49892805/article/details/134909686