Article Directory
Reference books: "graphic HTTP"
Reference blog: http://www.ruanyifeng.com/blog/2016/08/http.html
This article is mainly based on HTTP / 1.1 version.
A, TCP / IP protocol suite.
In order to make the computer world can communicate with each other via a network, you need to develop a communication rule between different hardware, software, operating systems, this rule is called protocol (protocol).
In order to achieve the global Internet communications, the IETF ( The Internet Engineering Task Force
, the Internet Engineering Task Force) for all hardware and software related to the development of a set of standard protocols, collectively called the set up " TCP/IP 协议族
", referred to as the " TCP/IP
."
1, hierarchical model
TCP / IP four-layer model and a comparison of the OSI model:
2, the data processing flow
Data transmission between the layers, to the end of each layer will transmit the additional information a header, the receiving end of each corresponding layer header will be deleted, the final original data unpacking.
TCP / IP four-layer protocol processing data flow:
OSI seven layer protocol processing data flow:
3、HTTP 和 IP、TCP、DNS
In the TCP / IP protocol suite, HTTP closest relationship with the three protocol is IP, TCP, DNS.
IP protocol:
Internet Protocol
Internet protocol at the network layer.- Role is to transmit various kinds of data packets to each other.
- In order to ensure the transmission of data to each other, to meet two important conditions: IP address and MAC address.
- IP address is assigned to the specified node address, MAC address is a fixed address belonging to the network card, the two pairs.
- IP address conversion, but essentially the same MAC address.
TCP protocol:
- Located in the transport layer, provides reliable byte stream service to ensure the reliability
- Three-way handshake to ensure that data can reach the target
DNS Service:
Domain Name System
Responsible for domain name resolution, the application layer, the domain names into IP addresses
Two, HTTP protocol structure
- Client: request of a party access to resources such as text or image
- Server : one to provide resources in response to
HTTP
:(Hyper Text Transfer Protocol)
Hypertext transfer protocol for transferring data between the client and the server.
Each HTTP requests and responses follow the same format, an HTTP Header and Body comprising two portions, wherein Body is optional.
1, HTTP request packet
HTTP GET request format:
GET /path HTTP/1.1
Header1: Value1
Header2: Value2
Header3: Value3
HTTP POST request format:
POST /path HTTP/1.1
Header1: Value1
Header2: Value2
Header3: Value3
body data goes here...
- Each Header occupation line.
- When the null lines, i.e., two consecutive line breaks
\r\n
, the end of the Header portion, the following data are all Body. - When a request Home, URI is
/
URI sum URL
- URI: Uniform Resource Identifier, a string that identifies a particular Internet resources
- URL: Uniform Resource Locator, represents the resource location (position located on the Internet), is a subset of URI
URI format:
- Login: Optional
- Server port number: When using the default 80 port, optional
- Query string: query parameter to specify a file path, optional
- Fragment identifier: mark has acquired child resource (a location within the document) resources, optional
2, HTTP response format
HTTP/1.1 200 OK
Header1: Value1
Header2: Value2
Header3: Value3
body data goes here...
If the HTTP response contains body, but also through \r\n\r\n
to the partition.
2.1 Response Code
Response Code | Description Information | Common response code | Response Code Description |
---|---|---|---|
1xx | Indication information indicating that the request has been received, processing continues |
100 101 |
Continue switching protocol |
2xx | Successful, indicates that the request has been successfully received, understood |
200 OK 201 |
The client request successfully has been created |
3xx | Redirection, to fulfill the request must go a step further |
301 |
Resources (web pages, etc.) is permanently transferred to another URL |
4xx | Client error, request a syntax error or a request can not be achieved |
400 Bad Request 401 Unauthorized 403 Forbidden 404 Not Found |
Request a syntax error, can not be understood by the server requested an unauthorized server receives the request, but refused to provide the service requested resource does not exist |
5xx | Server-side error, the server failed to achieve a legitimate request |
500 Internal Server Error 503 Server Unavailable |
Unexpected server error occurred server is currently unable to process the request, after the return to normal |
2.2 Content-Type
andContent-Encoding
- Body type of data by the
Content-Type
determined head instead of URL-
Even The URL is
http://www.baidu.com/meimei.jpg
not necessarily the picture. -
Content-Type:text/html;charset=utf-8
: Responsive content is a web page, is encodedUTF-8
-
Content-Type:image/png
: Responsive content is a picture
-
Content-Encoding: gzip
: Description Body data is compressed, compression is gzip- Decompressing the data first, to get the real data
- Compression aims to reduce the size of the Body to accelerate network traffic
3, see the HTTP packets
We can see the case of packet transmission between client and server through a number of tools, which is often said that the "capture」 .
Here to Chrome browser, for example Ethereal simple explanation:
-
In the Chrome address bar
www.sina.com
, the browser will display Sina Home -
By
Chrome - F12 - Network
the communication procedure between the client and the server to view -
Click
Name
for thewww.sina.com
record, the right side viewRequest
andResponse
theview source
content.
Chrome F12
Description:
Elements
: Page structureConsole
: Console outputSource
:resourceNetwork
: Records of all communications browsers and serversHeaders
: Display request header, the request and response headers bodyview source
: Shows the original / actual communications between the browser and the server
Response
: Displays the response contents thereof
4, HTTP request process
Or to access Sina home, for example, to sum up the process HTTP requests:
- . 1, Client sends an HTTP request to the server
- Request includes: Method, URI, the domain name, other headers, to use
POST方法
it further includes Body
- Request includes: Method, URI, the domain name, other headers, to use
- 2, Server returns the HTTP response to the client
- Response comprising: a response code, the response type, other headers, responsive Body
- 3, when the client requests other resources, but also to the server (if picture), sends the HTTP request again, repeat steps 1 and 2.
Browser resolution process:
- When the browser reads the HTML source code to Sina home, it parses and displays HTML pages,
- Then, according to various links inside HTML, and then sends an HTTP request to the server Sina, get the corresponding picture, video resources, Flash, JavaScript scripts, CSS, etc.,
- Finally showing a full page, so we can see a lot of additional HTTP request Network below.
HTTP protocol includes a highly scalable, although the browser requests Sina home http://www.sina.com
, but the resource may be another server Sina in HTML chain response, such as <img src="https://n.sinaimg.cn/index/mid_article/images/ask.png">
, dispersing thereby the pressure request to each server.
And a site has links to other sites, many sites are linked to each other, to form WWW
( World Wide Web
).
Three, HTTP version development
Publication Year | HTTP version | Explanation | Feature |
---|---|---|---|
1990 | HTTP/0.9 | Informal standards | Only a simple GET method, the response can only be HTML |
1996 | HTTP/1.0 | Began as an official standard, described in RFC1945, it is widely used | Supports GET, POST, HEAD method in response to a richer, increasing the header information and other short connection |
1997 | HTTP/1.1 | The current mainstream version, RFC2616 is the current latest revision | The default is a long connectionIntroduction pipe mechanism using chunked transfer block |
2015 | HTTP/2 | Google developed its own SPDY protocol from the merger |
Low latency transmission, optimized binary protocol multiplexing, header compression information and the like |
2018 | HTTP/3 | Google developed QUIC the protocol from the merger |
Based on UDP, faster response |
1、HTTP 1.0
The main drawback:
- Short connection, a connection can have only one request, that every request to re-establish a TCP connection
- Establish a TCP connection requires three-way handshake, with characteristic slow start, it would be a waste of resources and performance of step
Solution:
- Partially
HTTP 1.0
implemented version, adds a header in the requestConnection:keep-alive
flag- The server is not required tag closes the TCP connection to other requests reuse. The same server respond to this field.
- Until the client or the server closes a connection.
However, this is not a standard field, to achieve different behavior may be inconsistent, and therefore not a fundamental solution, is not widely supported.
2、HTTP 1.1
HTTP 1.1 version introduces many optimization techniques, this is by far the most widely used version, but most of the server version 1.0 is also supported.
2.1 persistent connection
The biggest change is the introduction of persistent connections ( HTTP persistent connection
or HTTP connection reuse
).
Features:
- TCP 连接默认不关闭,可以被多个请求复用
- 只要任意一端没有明确提出断开连接,则保持连接状态。
- 建立一次 TCP 连接后,可以进行多次请求和响应交互
- 减少了 TCP 连接的重复建立和断开所造成的额外开销,减轻服务器负载,使 web 页面显示速度更快。
目前,对于同一个域名,大多数浏览器允许同时建立 6 个持久连接。
2.2 管道机制
在持久连接的基础上还引入了管道机制(pipelining
)。
特点:
- 在同一个 TCP 连接里面,客户端可以同时并行发送多个请求,而不用等待前面请求的响应,进一步提升效率。
- 之前版本中,发送一个请求后要等待并收到响应,才能发送下一个。
2.3 Content-Length
字段
在管道机制中,一个 TCP 连接同时发送多个请求,服务器依次处理并返回响应。
- 为了准确区分响应数据包是属于哪一个请求的,在响应头中加入
Content-length
字段,声明本次响应的数据长度。- 在 1.0 版中,浏览器发现服务器关闭了TCP连接,就表明收到的数据包已经全了。
如 Content-Length: 3495
表示本次回应的长度是 3495 个字节,后面的字节就属于下一个回应了。
2.4 分块传输编码
使用 Content-Length
字段的前提是,服务器发送回应之前,必须知道回应的数据长度。
对于一些很耗时的动态操作、或者传输大量数据时,服务器要等到所有操作完成,才能发送数据,显然这样的效率不高。
更好的处理方法是,产生一块数据,就发送一块,采用"流模式"(stream)取代"缓存模式"(buffer),使浏览器逐步显示页面。
因此,1.1 版本又引入了 “分块传输编码”(chunked transfer encoding)。只要请求或响应的头信息有 Transfer-Encoding
字段,就表明回应将由数量未定的数据块组成。
Transfer-Encoding: chunked
每个非空的数据块之前,会有一个16进制的数值,表示这个块的长度。最后是一个大小为0的块,就表示本次回应的数据发送完了,例如:
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0
2.5 队头堵塞
虽然 1.1 版本允许复用 TCP 连接,但是同一个 TCP 连接里面,所有的数据通信是按次序进行的。服务器只有处理完一个回应,才会进行下一个回应。
要是前面的回应特别慢,后面就会有许多请求排队等着。这称为"队头堵塞"(Head-of-line blocking
)。
为了避免这个问题,只有两种方法:
- 一是减少请求数
- 二是同时多开持久连接。
对应的,需要做很多网页优化工作,比如合并脚本和样式表、将图片嵌入CSS代码、域名分片(domain sharding
)等等。
2.6 Cookie 机制
Cookie 技术最早是在 1994 年由 Netscape 公司的一名员工提出的,最终在 2011 年才被 IETF 正式纳入规范中。
HTTP 无状态特点:
-
无状态:
stateless
- HTTP 协议自身不对请求和响应之间的通信状态进行保存
- 简单,方便,不保存客户端状态,减少服务器 CPU 及内存消耗
-
缺点:
- 对于要求登录认证的网站,为了使服务器识别登录信息,需要在每次请求报文中附加一些信息,在连接较多时会增加带宽压力。
-
为了在特定场景中实现保持状态的功能,引入了 Cookie 技术来实现状态的管理。
Cookie
机制:
- client 发出请求后,server 返回响应,并在响应报文中加入
Set-Cookie
字段,用于通知 client 保存 Cookie。 - client 收到响应后,把
Set-Cookie
字段以文本保存在本地 - client 再次向 server 发送请求,并在请求报文中加入
Cookie
字段。 - server 收到带 cookie 的请求,比对服务器记录,得到之前的状态信息。
To further optimize the HTTP protocol, in recent years launched a HTTP / 2 and HTTP / 3, but so far most of the browser or the use of HTTP / 1.1 version.