[Won't know new] HTTP report

HTTP messages are chunks of data sent between HTTP applications. These data blocks begin with some textual meta-information that describes the content and meaning of the message.

message flow

The flow of messages between clients, servers, and proxies is called message flow. HTTP uses the terms inflow and outflow to describe the direction of transaction processing. Like water, packets flow from upstream to downstream, not from downstream to upstream. Assume that there are two proxy servers between the client and the server. The client sends a request and the server responds. During this period, the message flow is: client -> proxy 1 -> proxy 2 -> server -> proxy 2 -> proxy 1 -> Client.
Packet flow.png

Components of a message

Each message contains a request from the client, or a response from the server. They consist of three parts: a start line describing the message, a header block containing attributes, and an optional body containing data.

The start line and header are line-separated ASCII text, each line terminated by a carriage return and a newline, which can be written as CRLF. The body can contain textual or binary data, or it can be empty.

Extension: In Unix systems, each line ends with only "<newline>", that is, "\n"; in Windows systems, each line ends with "<carriage return><newline>", that is, "\r\n"; Mac system , each line ends with "<carriage return>". A direct consequence is that if a file under Unix/Mac system is opened in Windows, all text will become one line; and if a file in Windows is opened under Unix/Mac, there may be an extra ^M at the end of each line. symbol.

All HTTP messages can be divided into two categories: request message and response message. The basic message structure of the two is the same, but the specific content is slightly different.
HTTP message basic structure.png

The format of the request message is as follows:

<method> <request-URL> <version>    // 起始行
<headers>                           // 首部

<entity-body>                       //  主体      

The format of the response message is as follows:

<version> <status> <reason-phrase>    // 起始行
<headers>                             // 首部

<entity-body>                         //  主体    

An overview of each item in the message format is given below,

method

The start line of a request starts with a method, which tells the server what to do. For example GET /test/index.text HTTP/1.1, the method is GET, and the request is to get the content of the /test/idnex.text file from the server. Commonly used HTTP methods are,

HTTP method describe whether to include the request body
GET Safe method, the server sends the named resource to the client no
PUT Store data from clients in a named server resource Yes
POST Send client data to a server gateway application Yes
DELETE To delete a named resource from the server , the client cannot guarantee that the delete request will be executed, because the HTTP specification allows the server to withdraw the request without notifying the client no
HEAD The security method, which only sends the HTTP header in the response of the named resource , is used to check the resource header without obtaining the actual resource, to determine whether an object exists, and to test whether the resource has been modified. no
TRACE Track the messages that may pass through the proxy server to the server , to see whether the original message has been changed, or what changes have been made no
OPTIONS Decide which methods can be executed on the server no

The above methods are not supported by all browsers, nor are they supported by all servers. A secure method is one that does not perform any content changes on the server and produce any results.
Although there are so many methods, the current browser requests are almost covered by GET and POST, why? There is an unofficial netizen answer below, I think the explanation is good, so I excerpted,

By definition, assuming that there is a resource group A, which contains the resources 1, 2, and 3, then

Get Resource A1: Get Url A1
Add Resource A4: Post Url A + RAWDATA 4
Modify Resource A4: Put Url A + RAWDATA 4
Delete Resource A2: Delete Url A2

In practice, however, most people do

Get Resource A1: Get Url A1
Add Resource A4: Post Url A + RAWDATA 4
Modify Resource A4: Post Url A4 + RAWDATA 4
Delete Resource A2: Get Url delete A1

Even Sina Weibo uses the method of POST Url+access_token + RAWDATA empty. More people are using Get + delete parameters to achieve, do not know their reasons do not know, do not understand, or do not want to.

Furthermore, the network environment is bad. In fact, in many cases, port 80 of our network is hijacked, and almost 100% of secondary operators are hijacked. Some devices will proxy you for http access after hijacking, and these devices do not know other request methods. If the agent does not know it, it cannot perform normal identification and forwarding, which leads to the failure of other protocols to be used normally.

Some secondary operators' hijacked devices, Get, Post, Connection, will return you 403 error or 500 or 502 if they encounter other request methods. For example, there are a large number of cross-domain requests that require Option, and this kind of network is completely scrapped. It is very abnormal here, but the status code it returns is also wrong. The wrong request method is generally 400, 405, etc., but it is meaningless to accuse a device that is not normal in itself.
(The above answer is taken from: https://segmentfault.com/q/1010000007736404 )

version

The HTTP version format used by the message is HTTP/<major>.<minor>, where the major version number (majoi) and the minor version number (minor) are both integers. The version number specification refers to the highest HTTP version supported by the application, not a specific version. An HTTP 1.2 application communicating with an HTTP 1.1 application should be aware that it cannot support any of the new HTTP 1.2 features, but only up to HTTP 1.1 features.

Status code (status) and reason-phrase (reason-phrase)

Status codes and reason phrases are descriptions that the server uses to tell the client what happened. The Reason Phrase is the readable version of the status code and is important for humans to read and interpret the status code. For example, the server returns a status code 304 Not Modified, where the reason phrase Not Modified is 304the is more readable and memorable.
The status code is located in the starting line of the response message, and different status codes are classified by three digital codes, as shown in the following table:

overall scope Classification
100 ~ 199 Information prompt, such as: 101 Switching Protocols, indicating that the server is switching the protocol to the protocol listed in the Update header according to the client's specification
200 ~ 299 Success, such as: 200 OK, the request is OK, the body of the entity contains the requested resource
300 ~ 399 Redirection, such as 301 Moved Permanently, the requested URL resource has been permanently removed, and the location header of the response header should contain the current URL of the resource; 304 Not Modified, the resource has not changed, no need to download the resource from the server
400 ~ 499 Client error, such as 404 Not Found, the server cannot find the requested URL
500 ~ 599 Server error, such as 504 Gateway Timeout, the response came from a gateway or proxy, the response to the request timed out while waiting for another server

缓存
200 ok / 200 from cache/ 200 from disk cache/ 304 not modified

Headers

Generic headers
Some headers provide the most basic information related to the message and are called generic headers. These headers can be used in both request and response packets.

Neck describe
Connection Allows clients and servers to specify data related to request/response connections. For example, HTTP 1.0 needs to maintain the connection between the client and the server, it is necessary to set Connection: keep-alive
Date Provides a date and time stamp indicating when the message was created
Trailer If the message adopts the block transmission code method, this header can be used to list the header set located in the slippers part of the message
Transfer-Encoding Tell the receiving end what encoding method to use for the message in order to ensure reliable transmission of the message
Via Shows the intermediate nodes that the message passes through
Cache-control It is used to send cache instructions with the message. The 304 status code returned by the browser request is closely related to Cache-control.

Request Header The
request header is the only meaningful header in the request message. Used to describe who is or to describe who is sending the request, where the request is coming from, or the client's preferences and capabilities.

Neck describe
User-agent Tell the server the name of the application that made the request, for example: When using Chrome on iMac to send the request, the User-agent may beMozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36
Accept tell the server which media types to be able to send
Accept-Encoding Tell the server what encodings to send
Accept-Language Tell the server which natural languages ​​to send
Cookie The client sends a token to the server. The implementation of user login basically requires the use of cookies. After setting, they are automatically sent by the browser after each request.
Proxy-Connection The same as the Connection header, but this header is used in the same dumb proxy (a proxy that does not recognize connection: keep-alive)

请求首部字段 Referer
一般Referer主要用于统计,像百度统计可以通过 Referer 统计访问流量的来源和搜索的关键词。Referer 是由浏览器自动加上的,刷新页面也不会消失,但某些情况不会发送 Referer 头部,如下:
● 直接输入网址或通过浏览器书签访问
● 来源页面采用的协议为表示本地文件的 "file" 或者 "data" URI
● 当前请求页面采用的是非安全协议,而来源页面采用的是安全协议 HTTPS(摘自 https://developer.mozilla.org/zh-CN/docs/Web/HTTP/Headers/Referer, 对此持有怀疑态度,验证未通过,验证结果详情参见下文)

通过 Google 搜索 “风铃” ,然后访问 zhan.qq.com ,可以看到请求头中包含 referer 头部,显示 https://www.google.com.hk/(风铃的域名是 zhan.qq.com), 如下图,

Google search wind chimes.png

直接在浏览器中输入网址: zhan.qq.com 可以看到,请求头中并没有出现 referer 头部,如下图,
Enter the URL directly to access.png

通过 file 协议地址跳转至目标网站 zhan.qq.com ,可以看到,请求头中依旧没有出现 referer 头部,如下图,
file protocol address.png

file protocol jump to wind chimes.png

Google 安全协议 HTTPS 跳转至 智慧校园 非安全协议 HTTP 依旧有 referer 出现 ,与 https://developer.mozilla.org/zh-CN/docs/Web/HTTP/Headers/Referer 中描述的 “当前请求页面采用的是非安全协议,而来源页面采用的是安全协议 HTTPS ” 该情况不会产生 referer不一致,
The https protocol jumps to the http protocol.png

响应首部
响应报文有自己的首部集,首部集为客户端提供一些额外的信息,比如 谁在发送响应、响应者的功能,甚至与响应相关的一些特殊指令。

首部 描述
Age 从最初创建开始响应持续时间
Server 服务器应用程序软件的名称和版本
Warning 比原因短语更详细一些的警告报文
Proxy-Authenticate 来自代理的对客户端的质询列表,指定了获取 proxy server(代理服务器)上的资源访问权限而采用的身份验证方式
www-Authenticate 来自服务器的对客户端的质询列表,使用这些验证方式去获取对资源的连接

实体的主体部分

实体首部提供了有关实体机器内容的大量信息,从有关对象类型的信息,到能够对资源使用的各种有效的请求方法。

首部 描述
Allow 列出了可以对此实体执行的请求方法
Location 告知客户端实体实际上位于何处,用于将接受端定位到资源的位置上去
Content-Length 主体的长度或尺寸
Content-MD5 主体的 MD5 校验和
Content-type 主体对象类型
ETag 某个特定资源的版本标识符, 服务器返回 304 状态码与其息息相关
Expires 实体不再有效,要从原始的源端再次获取此实体的日期和时间
Last-Modified 这个实体最后一次被修改的日期和时间

Cache-Control与Expires的作用一致,Last-Modified与ETag的作用也相近。Cache-Control 与 ETag 出现于较新的 HTTP/1.1 版本,优先级高于 Expires 和 Lat-Modified 。一般情况使用两对中的一对即可。Cache-Control 和 ETag 决定了 服务器是否返回 304 状态码, 详情请参考 http://mp.weixin.qq.com/s/k-ZtFUG674V0WAdQs3li8Q

如果想知道 请求结果 200 from cache, 304 的区别,具体可以参看: http://mp.weixin.qq.com/s/O6Ko7Sl3zsyzqGz9K_ORRw, 而两者最大的区别是 200 from cache 不会向服务器发送请求,而 304 会向服务器发送请求,只是不会下载资源文件。

下图是京东(www.jd.com)首页请求部分截图,出现了 200 from disk cache 和 200 from memory cache , 两者由之前的 from cache 演变而来,如果想知道二者的区别,请参看 https://www.quora.com/What-is-the-difference-between-memory-cache-and-disk-cache-in-Chrome, 大意是 内存 比 硬盘读取速度快很多,但内存会随着进程的关闭(浏览器的关闭)而清除,但硬盘上的则不会,所以浏览器会根据实际情况选择两种缓存方式,先从内存中读取资源,没有,再从硬盘读取,再没有,发送请求,此时服务器可能返回 304,也可能返回 200 OK,不会返回 200 from disk cache 或 200 from memory cache。
disk cache & memory cache.jpg


[Reference content]
1. "HTTP Authoritative Guide" edited by David Gourley et al. Translated by Chen Juan and Zhao Zhenping. People's Posts and Telecommunications Publishing House

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326002437&siteId=291194637