How to identify fake crawlers?

When we check website logs, we often encounter various crawlers. Some are normal crawlers , for example: search engine crawlers ( Baidu search engine crawlers , Google search engine crawlers , Bing search engine crawlers , YandexBot  , etc.), and some crawlers with various functions, you can view them here: list crawlers .

However, not all crawlers on the Internet are beneficial, and some crawlers will learn some characteristics of real crawlers in order to hide themselves as much as possible. There are also some fake crawlers, that is, crawlers that forge those search engines, and will come to grab the data of your website. Although the User-agent looks the same as that of a search engine, but the IP does not belong to the search engine. At this time We accurately identify the IP addresses of these fake reptiles as needed.

Through the crawler IP query tool , we can easily identify fake crawlers, for example:

34.68.229.128 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

This is my simplified log record. The front is the IP address, and the back is the User-agent who accessed the crawler. Through the User-agent, we can see that he is a Google search engine spider.

Through the query, we can see that this is a fake  Google spider , the screenshot is as follows:

We only need to enter the IP address of the fake crawler, and we can see some information about the crawler . In this way, no matter whether it is true or false Li Kui (reptiles, true or false), they will not be able to escape our eyes.

At the same time, if we want to see more fake crawlers, we can go here: list crawlers fake bot , which sorts out common fake crawlers on the Internet.

Summarize

By introducing what a fake crawler is and how to query this tool through the crawler IP, it can accurately identify fake crawlers.

 

Guess you like

Origin blog.csdn.net/oHuangBing/article/details/126073517
Recommended