Crawler Request Basics

http request process

ask

  • The request contains four parts: the request method, the requested URL, the request header and the request body
  • The request methods post and get are the most common
  • The request URL is the url
  • The common header information of the request header is as follows
    • accept : request header field, used to specify which types of information the client can accept
    • Accept-Language : Specifies the language type acceptable to the client
    • Accept-encoding: Specifies the content encoding acceptable to the client
    • Host : used to specify the host IP and port number for requesting resources
    • cookie
    • referer : used to identify the page from which the request was sent
    • user-agent : UA for short , a special service string that enables the server to identify the operating system version, browser and other information used by the client. Adding this information when doing a crawler can disguise it as a browser. If it is not added, it is likely to be identified. for reptiles
    • Content-type : Internet media type   
  • The request body generally carries the form data in the post request. For the get request, the request body is empty.

response

  • Returned by the server to the client, it is divided into three parts, the response status code, the response header, and the response body
  • Response status code to understand by yourself
  • response header
    • Data : identifies when the response was generated
    • last-modified : Specifies when the resource was last modified
    • Content-type : the document type, specifying what is the data type returned
    • Set-cookie : set cookies
    • expires : Specifies the expiration time for the response
  • The response body data is all in the response body

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326523106&siteId=291194637