PHP03 Introduction to HTTP Protocol (transfer)

The HTTP protocol is the basic protocol of the Internet and the necessary knowledge for web development. The latest version of HTTP/2 makes it a technology hotspot.

 

This article introduces the historical evolution and design ideas of the HTTP protocol.

 

 

 

一、HTTP/0.9

 

HTTP is an application layer protocol based on the TCP/IP protocol. It does not involve data packet (packet) transmission, and mainly specifies the communication format between the client and the server, and uses port 80 by default.

 

The earliest version was version 0.9 released in 1991. This version is extremely simple, with only one command GET.

 

GET /index.html

 

The above command indicates that after the TCP connection is established, the client requests the web page index.html from the server.

 

The protocol stipulates that the server can only respond to strings in HTML format, and cannot respond to other formats.

 

<html>

  <body>Hello World</body>

</html>

 

The server closes the TCP connection after sending.

 

二、HTTP/1.0

 

2.1 Introduction

 

In May 1996, the HTTP/1.0 version was released, and the content was greatly increased.

 

First, content in any format can be sent. This allows the Internet to transmit not only text, but also images, videos, and binary files. This laid the foundation for the great development of the Internet.

 

Secondly, in addition to the GET command, the POST command and the HEAD command are also introduced, which enriches the interaction between the browser and the server.

 

Again, the format of HTTP requests and responses has also changed. In addition to the data part, each communication must include header information (HTTP header) to describe some metadata.

 

Other new features include status code, multi-character set support, multi-part type, authorization, cache, content encoding, and more.

 

2.2 Request Format

 

Below is an example of a version 1.0 HTTP request.

 

GET / HTTP/1.0

User-AgentMozilla/5.0(Macintosh;Intel Mac OSX10_10_5)    

Accept: */*

 

As you can see, this format has changed a lot from version 0.9.

 

The first line is the request command, and the protocol version (HTTP/1.0) must be added at the end. It is followed by multi-line header information, describing the situation of the client.

 

2.3 Response format

 

The server's response is as follows.

 

HTTP/1.0 200 OK

Content-Type: text/plain

Content-Length: 137582

Expires: Thu, 05 Dec 1997 16:00:00 GMT

Last-Modified: Wed, 5 August 1996 15:55:28 GMT

Server: Apache 0.84

 

<html>

  <body>Hello World</body>

</html>

 

The format of the response is "header information + a blank line (\r\n) + data". Among them, the first line is "protocol version + status code (status code) + status description".

 

2.4 Content-Type field

 

Regarding the encoding of characters, version 1.0 stipulates that the header information must be ASCII code, and the following data can be in any format. Therefore, when the server responds, it must tell the client what format the data is in, which is what the Content-Type field does.

 

Below are some common Content-Type field values.

 

  • text/plain

  • text/html

  • text/css

  • image/jpeg

  • image/png

  • image/svg+xml

  • audio/mp4

  • video/mp4

  • application/javascript

  • application/pdf

  • application/zip

  • application/atom+xml

 

These data types are collectively referred to as MIME types, and each value consists of a primary type and a secondary type, separated by a slash.

 

In addition to predefined types, manufacturers can also customize types.

 

application/vnd.debian.binary-package

 

The above type indicates that the binary data package of the Debian system is sent.

 

MIME types can also use semicolons at the end to add parameters.

 

Content-Type: text/html; charset=utf-8

 

The above type indicates that a web page is being sent and the encoding is UTF-8.

 

When the client requests, you can use the Accept field to declare which data formats it can accept.

 

Accept: */*

 

In the above code, the client declares that it can accept data in any format.

 

MIME types are not only used in the HTTP protocol, but also in other places, such as HTML pages.

 

<meta http-equiv="Content-Type"content="text/html; charset=UTF-8" /> 

<!-- is equivalent to -->

<meta charset="utf-8" />

 

2.5 Content-Encoding field

 

Since the data sent can be in any format, the data can be compressed before sending. The Content-Encoding field specifies the compression method of the data.

 

Content-Encodinggzip

Content-Encodingcompress

Content-Encodingdeflate

 

When the client requests, it uses the Accept-Encoding field to indicate which compression methods it can accept.

 

Accept-Encoding: gzip, deflate

 

2.6 Disadvantages

 

The main disadvantage of HTTP/1.0 is that only one request can be sent per TCP connection. After sending the data, the connection is closed. If you want to request other resources, you must create a new connection.

 

The cost of establishing a TCP connection is high because it requires a three-way handshake between the client and the server, and the sending rate is slow at the beginning (slow start). Therefore, the performance of the HTTP 1.0 version is relatively poor. This problem becomes more prominent as more and more external resources are loaded on web pages.

 

To solve this problem, some browsers use a non-standard Connection field when requesting.

 

Connection: keep-alive

 

This field asks the server not to close the TCP connection so that other requests can be reused. The server also responds to this field.

 

Connection: keep-alive

 

A reusable TCP connection is established until the client or server actively closes the connection. However, this is not a standard field, and different implementations may behave inconsistently, so it is not a fundamental fix.

 

三、HTTP/1.1

 

In January 1997, the HTTP/1.1 version was released, only half a year later than the 1.0 version. It further perfected the HTTP protocol, which has been used for 20 years and is still the most popular version.

 

3.1 Persistent connections

 

The biggest change in version 1.1 is the introduction of persistent connections, that is, TCP connections are not closed by default and can be reused by multiple requests without declaring Connection: keep-alive.

 

When the client and server find that the other party has not been active for a period of time, they can actively close the connection. However, the standard practice is that the client sends Connection: close in the last request, explicitly asking the server to close the TCP connection.

 

Connection: close

 

Currently, most browsers allow 6 simultaneous persistent connections for the same domain name.

 

3.2 Pipeline Mechanism

 

Version 1.1 also introduced the pipeline mechanism (pipelining), that is, in the same TCP connection, the client can send multiple requests at the same time. This further improves the efficiency of the HTTP protocol.

 

For example, the client needs to request two resources. The previous practice was that in the same TCP connection, the A request was sent first, then waited for the server to respond, and then sent the B request after receiving it. The pipeline mechanism allows the browser to issue A and B requests at the same time, but the server still responds to the A request in order, and then responds to the B request after completion.

 

3.3 Content-Length field

 

A TCP connection can now transmit multiple responses, and there must be a mechanism to distinguish which response a packet belongs to. This is the role of the Content-length field, declaring the data length of this response.

 

Content-Length: 3495

 

The above code tells the browser that the length of this response is 3495 bytes, and the following bytes belong to the next response.

 

In version 1.0, the Content-Length field is not required, because the browser finds that the server has closed the TCP connection, indicating that the received packet has been full.

 

3.4 Chunked Transfer Coding

 

The prerequisite for using the Content-Length field is that the server must know the data length of the response before sending the response.

 

For some time-consuming dynamic operations, this means that the server cannot send data until all operations are completed, which is obviously inefficient. A better way to deal with it is to generate a block of data, send a block, and use "stream mode" (stream) instead of "buffer mode" (buffer).

 

Therefore, version 1.1 stipulates that instead of using the Content-Length field, "chunked transfer encoding" can be used. As long as the request or response header has a Transfer-Encoding field, it indicates that the response will consist of an unspecified number of data blocks.

 

Transfer-Encoding: chunked

 

Before each non-empty data block, there will be a hexadecimal value indicating the length of the block. Finally, there is a block of size 0, which means that the data for this response has been sent. Below is an example.

 

HTTP/1.1200OK  

Content-Typetext/plain

Transfer-Encodingchunked

 

25

Thisisthe data inthe first chunk   

 

1C

andthisisthe second one   

 

3

with

 

8

sequence

 

0

 

3.5 Other functions

 

Version 1.1 also added many new verb methods: PUT, PATCH, HEAD, OPTIONS, DELETE.

 

In addition, a new Host field is added to the header information of the client request, which is used to specify the domain name of the server.

 

Host: www.example.com

 

With the Host field, requests can be sent to different websites on the same server, laying the foundation for the rise of virtual hosting.

 

3.6 Disadvantages

 

Although version 1.1 allows multiplexing of TCP connections, within the same TCP connection, all data communication is performed in order. The server will only proceed to the next response after processing one response. If the previous response is particularly slow, many requests will be queued later. This is called "Head-of-line blocking".

 

In order to avoid this problem, there are only two ways: one is to reduce the number of requests, and the other is to open more persistent connections at the same time. This leads to a lot of web optimization tricks like merging scripts and style sheets, embedding images into CSS code, domain sharding, and more. This extra work could have been avoided if the HTTP protocol had been better designed.

 

4. SPDY Protocol

 

In 2009, Google disclosed the SPDY protocol developed by itself, mainly to solve the problem of HTTP/1.1 inefficiency.

 

After this protocol was proved to be feasible on the Chrome browser, it was used as the basis of HTTP/2, and the main features were inherited in HTTP/2.

 

五、HTTP/2

 

In 2015, HTTP/2 was released. It's not called HTTP/2.0 because the standards committee doesn't plan to release any more sub-versions, and the next new version will be HTTP/3.

 

5.1 Binary Protocol

 

The header information of HTTP/1.1 version must be text (ASCII encoding), and the data body can be text or binary. HTTP/2 is a completely binary protocol, header information and data body are binary, and collectively referred to as "frame" (frame): header information frame and data frame.

 

One benefit of binary protocols is that additional frames can be defined. HTTP/2 defines nearly ten frames, laying the foundation for future advanced applications. Parsing the data would be cumbersome if using text for this functionality, and binary parsing is much easier.

 

5.2 Multiplexing

 

HTTP/2 multiplexes TCP connections. In a connection, both the client and the browser can send multiple requests or responses at the same time, and there is no need to correspond one by one in sequence, thus avoiding "head of queue congestion".

 

For example, in a TCP connection, the server receives both the A request and the B request, so it responds to the A request first. It turns out that the processing process is very time-consuming, so it sends the processed part of the A request, and then responds to the B request. When done, send the rest of the A request.

 

Such two-way, real-time communication is called multiplexing.

 

5.3 Data flow

 

Because HTTP/2 packets are sent out of sequence, consecutive packets in the same connection may belong to different responses. Therefore, the packet must be marked to indicate which response it belongs to.

 

HTTP/2 refers to all the packets of each request or response as a stream. Each data stream has a unique number. When a data packet is sent, the data stream ID must be marked to distinguish which data stream it belongs to. In addition, it is also stipulated that the ID of the data stream sent by the client is always an odd number, and the ID of the data stream sent by the server is an even number.

 

When the data stream is halfway sent, both the client and the server can send a signal (RST_STREAM frame) to cancel the data stream. The only way to cancel a data stream in version 1.1 is to close the TCP connection. This means that HTTP/2 can cancel a request while keeping the TCP connection open and available for other requests.

 

The client can also specify the priority of the data stream. The higher the priority, the sooner the server will respond.

 

5.4 Header information compression

 

The HTTP protocol does not carry state, and all information must be attached to each request. Therefore, many fields of the request are repeated, such as Cookie and User Agent, which have the same content and must be attached to each request, which will waste a lot of bandwidth and affect the speed.

 

HTTP/2 optimizes this by introducing header compression. On the one hand, the header information is compressed by gzip or compress and then sent; on the other hand, the client and the server maintain a header information table at the same time, and all fields will be stored in this table to generate an index number, and the same field will not be sent in the future. , only the index number is sent, which increases the speed.

 

5.5 Server Push

 

HTTP/2 allows the server to actively send resources to the client without request, which is called server push.

 

A common scenario is that the client requests a web page that contains many static resources. Under normal circumstances, the client must parse the HTML source code after receiving the web page, find that there are static resources, and then send a static resource request. In fact, the server can expect that after the client requests the web page, it is likely to request static resources again, so it actively sends these static resources to the client along with the web page.

 

Information cited in this article:

*******************************************************

Author: Ruan Yifeng (@ruanyf) 

www.ruanyifeng.com/blog/2016/08/http.html

*******************************************************

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325486825&siteId=291194637