What exactly is crawler technology? Briefly on the concept of crawlers

      This problem has troubled me for a long time, and it makes me very confused about what a crawler is. Is it a physical tool? , A treasure 9.9 free shipping? It wasn't until I deeply analyzed and understood that the mysterious veil was unveiled.

      What is a crawler? There are simple crawlers and complex crawlers. In fact, a simple crawler is a script , a program or script that automatically grabs information on the World Wide Web according to certain rules.

     Scripts are crude, but often very useful small programs (generally no more than a few thousand lines, sometimes just hundreds of lines of code). For a simple example, you are now going to read information about rented student apartments from a student renting website. It is definitely unrealistic for you to copy one by one. So use crawlers. Thousands of information on an information website can be picked up all at once. You can also think of search engines like Baidu and Google as a kind of crawler, but this kind of technology is very complicated, not a simple script.

      How do search engines work? In fact, through web crawler technology, tens of billions of web pages in the Internet are saved locally to form a mirror file to provide data support for the entire search engine.

      Such a technology will first involve a very important and everyone's concern-is it illegal?

      After careful investigation, I concluded the following points:

        1. Comply with the Robots protocol, but the presence or absence of Robots does not mean that you can crawl casually.
        2. Limit your crawling behavior and prohibit the frequency of requests that are close to DDOS. Once the server is paralyzed, it is about a network attack;
        3. For obvious anti-crawling, or Pages that cannot be reached under normal circumstances cannot be forcibly broken, otherwise it is Hacker's behavior;
        4. Check the content you crawled clearly, and never touch the red line of the law.

At this point, you should understand that the crawler itself is not illegal, but depends on the way and purpose of your use, but also on its commercial use.

 

 


 

Guess you like

Origin blog.csdn.net/weixin_43730875/article/details/106919936
Recommended