Quick understanding HTTP protocol

Copyright: the original is not easy, is not permitted without the author's freely reprint! Due to limited personal ability and effort, it is inevitable omissions and inadequacies, please correct me, thank you ~ https://blog.csdn.net/lijing742180/article/details/88738419

Reference books: "graphic HTTP"
Reference blog: http://www.ruanyifeng.com/blog/2016/08/http.html

This article is mainly based on HTTP / 1.1 version.

A, TCP / IP protocol suite.

In order to make the computer world can communicate with each other via a network, you need to develop a communication rule between different hardware, software, operating systems, this rule is called protocol (protocol).

In order to achieve the global Internet communications, the IETF ( The Internet Engineering Task Force, the Internet Engineering Task Force) for all hardware and software related to the development of a set of standard protocols, collectively called the set up " TCP/IP 协议族", referred to as the " TCP/IP."

1, hierarchical model

TCP / IP four-layer model and a comparison of the OSI model:

2, the data processing flow

Data transmission between the layers, to the end of each layer will transmit the additional information a header, the receiving end of each corresponding layer header will be deleted, the final original data unpacking.

TCP / IP four-layer protocol processing data flow:

TCP / IP four-layer data processing

OSI seven layer protocol processing data flow:

Seven data processing

3、HTTP 和 IP、TCP、DNS

In the TCP / IP protocol suite, HTTP closest relationship with the three protocol is IP, TCP, DNS.

IP protocol:

  • Internet Protocol Internet protocol at the network layer.
  • Role is to transmit various kinds of data packets to each other.
  • In order to ensure the transmission of data to each other, to meet two important conditions: IP address and MAC address.
  • IP address is assigned to the specified node address, MAC address is a fixed address belonging to the network card, the two pairs.
  • IP address conversion, but essentially the same MAC address.

TCP protocol:

  • Located in the transport layer, provides reliable byte stream service to ensure the reliability
  • Three-way handshake to ensure that data can reach the target

Three-way handshake

DNS Service:

  • Domain Name SystemResponsible for domain name resolution, the application layer, the domain names into IP addresses

DNS

Two, HTTP protocol structure

  • Client: request of a party access to resources such as text or image
  • Server : one to provide resources in response to
  • HTTP: (Hyper Text Transfer Protocol)Hypertext transfer protocol for transferring data between the client and the server.

Each HTTP requests and responses follow the same format, an HTTP Header and Body comprising two portions, wherein Body is optional.

1, HTTP request packet

HTTP GET request format:

    GET /path HTTP/1.1
    Header1: Value1
    Header2: Value2
    Header3: Value3

HTTP POST request format:

    POST /path HTTP/1.1
    Header1: Value1
    Header2: Value2
    Header3: Value3

    body data goes here...

request

  • Each Header occupation line.
  • When the null lines, i.e., two consecutive line breaks \r\n, the end of the Header portion, the following data are all Body.
  • When a request Home, URI is /

URI sum URL

  • URI: Uniform Resource Identifier, a string that identifies a particular Internet resources
  • URL: Uniform Resource Locator, represents the resource location (position located on the Internet), is a subset of URI

URI format:

hate

  • Login: Optional
  • Server port number: When using the default 80 port, optional
  • Query string: query parameter to specify a file path, optional
  • Fragment identifier: mark has acquired child resource (a location within the document) resources, optional

2, HTTP response format

    HTTP/1.1 200 OK
    Header1: Value1
    Header2: Value2
    Header3: Value3

    body data goes here...

response

If the HTTP response contains body, but also through \r\n\r\nto the partition.

2.1 Response Code

Response Code Description Information Common response code Response Code Description
1xx Indication information
indicating that the request has been received, processing continues
100
101
Continue
switching protocol
2xx Successful,
indicates that the request has been successfully received, understood
200 OK
201
The client request successfully
has been created
3xx Redirection,
to fulfill the request must go a step further
301 Resources (web pages, etc.) is permanently transferred to another URL
4xx Client error,
request a syntax error or a request can not be achieved
400 Bad Request
401 Unauthorized
403 Forbidden
404 Not Found
Request a syntax error, can not be understood by the server
requested an unauthorized
server receives the request, but refused to provide the service
requested resource does not exist
5xx Server-side error,
the server failed to achieve a legitimate request
500 Internal Server Error
503 Server Unavailable
Unexpected server error occurred
server is currently unable to process the request, after the return to normal

2.2 Content-TypeandContent-Encoding

  • Body type of data by the Content-Typedetermined head instead of URL
    • Even The URL is http://www.baidu.com/meimei.jpgnot necessarily the picture.

    • Content-Type:text/html;charset=utf-8: Responsive content is a web page, is encodedUTF-8

    • Content-Type:image/png: Responsive content is a picture

  • Content-Encoding: gzip: Description Body data is compressed, compression is gzip
    • Decompressing the data first, to get the real data
    • Compression aims to reduce the size of the Body to accelerate network traffic

3, see the HTTP packets

We can see the case of packet transmission between client and server through a number of tools, which is often said that the "capture」 .

Here to Chrome browser, for example Ethereal simple explanation:

  • In the Chrome address bar www.sina.com, the browser will display Sina Home

  • By Chrome - F12 - Networkthe communication procedure between the client and the server to view

  • Click Namefor the www.sina.comrecord, the right side view Requestand Responsethe view sourcecontent.

Chrome F12 Description:

  • Elements: Page structure
  • Console: Console output
  • Source:resource
  • Network: Records of all communications browsers and servers
    • Headers: Display request header, the request and response headers body
      • view source: Shows the original / actual communications between the browser and the server
    • Response: Displays the response contents thereof

4, HTTP request process

Or to access Sina home, for example, to sum up the process HTTP requests:

  • . 1, Client sends an HTTP request to the server
    • Request includes: Method, URI, the domain name, other headers, to use POST方法it further includes Body
  • 2, Server returns the HTTP response to the client
    • Response comprising: a response code, the response type, other headers, responsive Body
  • 3, when the client requests other resources, but also to the server (if picture), sends the HTTP request again, repeat steps 1 and 2.

Browser resolution process:

  • When the browser reads the HTML source code to Sina home, it parses and displays HTML pages,
  • Then, according to various links inside HTML, and then sends an HTTP request to the server Sina, get the corresponding picture, video resources, Flash, JavaScript scripts, CSS, etc.,
  • Finally showing a full page, so we can see a lot of additional HTTP request Network below.

HTTP protocol includes a highly scalable, although the browser requests Sina home http://www.sina.com, but the resource may be another server Sina in HTML chain response, such as <img src="https://n.sinaimg.cn/index/mid_article/images/ask.png">, dispersing thereby the pressure request to each server.

And a site has links to other sites, many sites are linked to each other, to form WWW( World Wide Web).

Three, HTTP version development

Publication Year HTTP version Explanation Feature
1990 HTTP/0.9 Informal standards Only a simple GET method, the response can only be HTML
1996 HTTP/1.0 Began as an official standard, described in RFC1945, it is widely used Supports GET, POST, HEAD method
in response to a richer, increasing the header information and other
short connection
1997 HTTP/1.1 The current mainstream version, RFC2616 is the current latest revision The default is a long connectionIntroduction pipe mechanism
using chunked transfer block
2015 HTTP/2 Google developed its own SPDYprotocol from the merger Low latency transmission, optimized binary protocol multiplexing, header compression information and the like
2018 HTTP/3 Google developed QUICthe protocol from the merger Based on UDP, faster response

1、HTTP 1.0

The main drawback:

  • Short connection, a connection can have only one request, that every request to re-establish a TCP connection
    • Establish a TCP connection requires three-way handshake, with characteristic slow start, it would be a waste of resources and performance of step

Solution:

  • Partially HTTP 1.0implemented version, adds a header in the request Connection:keep-aliveflag
    • The server is not required tag closes the TCP connection to other requests reuse. The same server respond to this field.
    • Until the client or the server closes a connection.

However, this is not a standard field, to achieve different behavior may be inconsistent, and therefore not a fundamental solution, is not widely supported.

2、HTTP 1.1

HTTP 1.1 version introduces many optimization techniques, this is by far the most widely used version, but most of the server version 1.0 is also supported.

2.1 persistent connection

The biggest change is the introduction of persistent connections ( HTTP persistent connectionor HTTP connection reuse).

Features:

  • TCP 连接默认不关闭,可以被多个请求复用
  • 只要任意一端没有明确提出断开连接,则保持连接状态。
  • 建立一次 TCP 连接后,可以进行多次请求和响应交互
  • 减少了 TCP 连接的重复建立和断开所造成的额外开销,减轻服务器负载,使 web 页面显示速度更快。

目前,对于同一个域名,大多数浏览器允许同时建立 6 个持久连接。

2.2 管道机制

在持久连接的基础上还引入了管道机制(pipelining)。

特点:

  • 在同一个 TCP 连接里面,客户端可以同时并行发送多个请求,而不用等待前面请求的响应,进一步提升效率。
    • 之前版本中,发送一个请求后要等待并收到响应,才能发送下一个。

2.3 Content-Length 字段

在管道机制中,一个 TCP 连接同时发送多个请求,服务器依次处理并返回响应。

  • 为了准确区分响应数据包是属于哪一个请求的,在响应头中加入 Content-length 字段,声明本次响应的数据长度。
    • 在 1.0 版中,浏览器发现服务器关闭了TCP连接,就表明收到的数据包已经全了。

Content-Length: 3495 表示本次回应的长度是 3495 个字节,后面的字节就属于下一个回应了。

2.4 分块传输编码

使用 Content-Length 字段的前提是,服务器发送回应之前,必须知道回应的数据长度。

对于一些很耗时的动态操作、或者传输大量数据时,服务器要等到所有操作完成,才能发送数据,显然这样的效率不高。

更好的处理方法是,产生一块数据,就发送一块,采用"流模式"(stream)取代"缓存模式"(buffer),使浏览器逐步显示页面。

因此,1.1 版本又引入了 “分块传输编码”(chunked transfer encoding)。只要请求或响应的头信息有 Transfer-Encoding 字段,就表明回应将由数量未定的数据块组成。

Transfer-Encoding: chunked

每个非空的数据块之前,会有一个16进制的数值,表示这个块的长度。最后是一个大小为0的块,就表示本次回应的数据发送完了,例如:

HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked

25
This is the data in the first chunk

1C
and this is the second one

3
con

8
sequence

0

2.5 队头堵塞

虽然 1.1 版本允许复用 TCP 连接,但是同一个 TCP 连接里面,所有的数据通信是按次序进行的。服务器只有处理完一个回应,才会进行下一个回应。

要是前面的回应特别慢,后面就会有许多请求排队等着。这称为"队头堵塞"(Head-of-line blocking)。

为了避免这个问题,只有两种方法:

  • 一是减少请求数
  • 二是同时多开持久连接。

对应的,需要做很多网页优化工作,比如合并脚本和样式表、将图片嵌入CSS代码、域名分片(domain sharding)等等。

2.6 Cookie 机制

Cookie 技术最早是在 1994 年由 Netscape 公司的一名员工提出的,最终在 2011 年才被 IETF 正式纳入规范中。

HTTP 无状态特点:

  • 无状态: stateless

    • HTTP 协议自身不对请求和响应之间的通信状态进行保存
    • 简单,方便,不保存客户端状态,减少服务器 CPU 及内存消耗
  • 缺点

    • 对于要求登录认证的网站,为了使服务器识别登录信息,需要在每次请求报文中附加一些信息,在连接较多时会增加带宽压力。
  • 为了在特定场景中实现保持状态的功能,引入了 Cookie 技术来实现状态的管理。

Cookie 机制:

  • client 发出请求后,server 返回响应,并在响应报文中加入 Set-Cookie 字段,用于通知 client 保存 Cookie。
  • client 收到响应后,把 Set-Cookie 字段以文本保存在本地
  • client 再次向 server 发送请求,并在请求报文中加入 Cookie 字段。
  • server 收到带 cookie 的请求,比对服务器记录,得到之前的状态信息。

To further optimize the HTTP protocol, in recent years launched a HTTP / 2 and HTTP / 3, but so far most of the browser or the use of HTTP / 1.1 version.

Guess you like

Origin blog.csdn.net/lijing742180/article/details/88738419