Http basic principles (web page request and response)

The request to the webpage can be divided into the following parts

  1. Request URL: the requested URL
  2. Request headers: request header
  3. Request body: Request body
  4. Request method: request method

Elaborate on the components of the request

  1. Request URL: URL, also called uniform resource locator, through which you can access specific resources in the server, that is, tell the browser what information you want it to store
  2. Request headers: request header to the server to specify additional information to be used, the following list some of the more important the request header information cookie:用来维持登录状态,每次你打开网址时,例如优酷视频时发现不用自己输入账号密码就可以登录这都是cookie的功劳
    User-agent:用户代理,给自己的爬虫附加上这个信息,可以把爬虫伪装成浏览器
    content-type:表示具体请求中媒体类型信息,常见的时text,json
    `

    The request header is an important part of the request. Most crawlers need to attach this information, which means that some crawlers may not include the request header information.

  3. Request method: request method, here only introduce the two most practicalpost:POST请求大多用于提交表单,这些表单通常包含一些加密信息,同时也可以处理上传文件的功能,可以说这是一个比较低调的大佬
    GET:相比POST,GET的所有行为都会在URL中体现
  4. Request body: Generally speaking, this is something that exists relative to a POST request. It contains the form data contained in the sent request. Only this relatively low-key boss is equipped with this kind of treatment, haha

Server response

The server's response can be divided into three parts:

  1. Response status code: status code, here is a list of commonly used status codes:
    100:继续,服务器已收到请求等待下一波攻击;200:服务器已经成功处理的请求;202:服务器已经接受请求但是尚未处理;204:服务器已经成功处理了请求但是没有返回任何内容;301:网页永久搬家;400:错误请求,服务器无法解析该请求;;401:未授权;403:拒绝访问;404:找不到网页;

  2. Response header: Here are a few common valuescontent-type:说明返回内容的格式,applicatio/json,返回的内容就是json格式的内容,text/html:html文件;content-enconding:指定响应内容的编码方式

  3. Response body: This is the big brother. Our crawler is the analysis of the response body, which is the body data of the response obtained after we initiate a request to the URL.

Guess you like

Origin blog.csdn.net/weixin_47249161/article/details/113959980