First contact with the UA (browser identification information) problem involved in the python crawler requests.get

When I first came into contact with Cui Dashen's python crawler, the UA (browser identification information) problem was involved when using requests.get to crawl web pages. I didn't understand it at first, the great god just simply reminded the crawler that it must be added, and no other information was prompted. In the end, he also ignored it.

Today, I am learning other people's crawler code, and I encountered a similar UA code above, but I don't understand it. So Baidu took it and shared it below for the reference of IT Xiaobai colleagues.

1. What is UA 

User Agent is called User Agent in Chinese , or UA for short. It is a special string header that enables the server to identify the operating system and version, CPU type, browser and version, browser rendering engine, browser language, and browser used by the client. plug-ins etc.

For details, you can quickly learn about Baidu Encyclopedia. Click to open the link

2. The role of UA is provided when the crawler (Baidu reprints other people's...)

  1. Through this identification, the website visited by the user can display different layouts, so as to provide the user with a better experience or conduct information statistics. For example, Baidu, Sina and other websites are different from mobile phone access and computer access. This is because the website has different settings and processing after judging by the visitor's UA.
  2. Use User-Agent to forge browsers, falsely claim identity to deceive servers IE, FireFox, Opera, Maxthon, Chrome, Safari, iPhone, ipad... . Through iphone.. is to read User-Agent, of course this can be disguised
  3. For seo, there is a SEO technology, which is to judge the user-agent. If it is a crawler of the search engine, it will display the content. Otherwise, it will only be displayed to paying users. Therefore, some websites can be searched by Google, but after clicking on the link, it displays "unregistered" and "not yet a member". The corresponding purpose can be achieved by disguising the user-agent. .

3. How to get the UA of your own browser by yourself

  • 1. Enter: about:version in the address bar (recommended method, available for personal testing)
  • 2.  Enter : javascript:alert(navigator.userAgent)  in the address bar (unsuccessful...maybe there is a problem with the network connection
  • 3. Two additional websites obtained online: (available for personal testing)

    http://www.useragentstring.com/     

    http://tools.jb51.net/table/useragent


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325658686&siteId=291194637