I. Introduction
Some time ago in order to study computer network, I looked at the "computer network top-down approach" this book. I have to say this is really a good book, detailed and easy to understand and explain, using the analogy of a large number of ways to explain, rather than mere narrative theory, while the end of each chapter there are a lot of exercises and very interesting programming problems, it is recommended to begin with the first wave. I see this book being only the second chapter, just read HTTP
the content, so writing a HTTP
related blog, on when to take notes.
Second, Detailed
2.1 HTTP Overview
HTTP
Is an application layer protocol, it stands for Hypertext Transfer Protocol , which is the web
core. HTTP
Implemented by two procedures - client and server programs, and HTTP
the effect is simply a client sent a request to the server, and the server responds upon request . HTTP
It defines the Web
client request to the server resources, and the way Web
the server to the client loopback resources the way, that is, HTTP
the request + response model. The client sends a request message to request resources, the server receives the request, the client response packet sent back to the server containing these resources.
HTTP
Based TCP
protocol, the TCP
protocol supports the transmission of data, indicating that the HTTP protocol is a reliable connection-oriented protocol . When a client requests a resource to the server, the server first establish a TCP
connection when TCP
the connection is established, you can between the client and the server through socket interface to access TCP
the client through TCP
request packet transmission connection, and servers through this TCP
connection echo response packets and resources. Because TCP
reliable transmission to ensure that the HTTP
message will be able to complete on the server, and the server's response can also be a complete return to the customer.
HTTP
The requested resource is typically a Web
page, a Web
page consists of one or more objects consisting of the object may be a html
file, a picture, a video or even a small program. For HTTP
, the composition of a Web
page of objects do not belong to the same resource, each object is a single resource, the request individually. Suppose we request a server Web
page, the page of a html
file as well as 5
photos composition ( html
by path reference picture), then this page there are 6
objects, when the server receives a client request for a page, the html
document through a response packet to return while the client receives the response of html
the file, it also found references 5
pictures, then the client will again send 5
a HTTP
request to this request were 5
pictures.
The server sends the requested file to the client, but the client does not record any information, so when you twice in a row with a resource request to the server, the server will respond to you twice, not because you have not been requested give you a response. It is because of HTTP
customer information is not recorded, so it is a stateless protocol .
2.2 Non-persistent connections and persistent HTTP connections
In most cases, we are Web
When requesting server, have more than send out a request, such as the above mentioned page that contains 5 pictures. This time we need to consider that a problem, for this same destination multiple request / response, HTTP
is every request / response using the same TCP
connection, or create a separate request each time a TCP
connection do? Here are two cases, multiple request / response using the same TCP
are referred to as connection of persistent connections , and each request / response using a single TCP
connection, it is called a non-persistent connection . The HTTP
default is to use persistent connections, but also can be configured to switch to a non-persistent connection. Here's a simple talk about the difference between the two.
(1) a non-persistent connection
Represents a non-persistent connection for each request / response, will establish a separate TCP
connection to. Suppose we said before or in the example to explain: Our request to the server that contains the 5
page images, and the path to the page assumption is HTTP://www.tewuyiang.cn/index.html (This is my personal server, currently deployed a create a simple little game), when we ask this path, the following occurs:
HTTP
Client process to the server through port 80www.tewuyiang.cn
to initiate aTCP
connection80
portHTTP
default port;HTTP
The client process sends a request message to the server over a socket, the request for the path to the resource/index.html
;HTTP
Server processes the request received through the socket from its memory (eg: RAM) searches HTTP://www.tewuyiang.cn/index.html this resource, generates a response packet, and thehtml
page into the package the response message, and this message through the socket back to the client;HTTP
Notification serverTCP
disconnect (but untilTCP
confirmation that the client has received a complete message after, will be disconnected);HTTP
Complete client process received good response message,TCP
disconnect. After the client parses the response message, found encapsulated object is ahtml
file, and thehtml
file contains5
images of reference;- Repeat the above procedure
1-4
, containing the requested page5
images;
Disadvantage of non-persistent connection is very clear, that is for each request / response needs to establish a TCP
connection, this will cause the server to greatly increase the need to maintain a connection, such as a page contains 10
pictures, a total that would have 2
established 11
a connection, this will server enormous pressure. On the upside, a plurality of connections can be established simultaneously (typically a browser may simultaneously establish 5-10
connections), represents a plurality of channels, the data transmission between the channels in parallel, a plurality of request / response may be performed simultaneously, so as not to causing queuing situation, higher efficiency.
(2) persistent connection
Represents a continuous connection request and a response sent by the server after a client establishes a connection with a server, a period of time, sent to the server, which can be performed by a connector. This should be well understood. The persistent connection is also divided into two types:
- Without persistent connection pipeline: This indicates that only a one-time request / response, and the next have to wait after the completion of the last;
- Continuously connected with the line: This represents a request for an object may be sent one, without the need for other outstanding request completion (but not completely in parallel);
For a long connector unused, HTTP
it would close it, and this can also be configured timeout period.
The benefits of persistent connections is also very clear, that is to save resources, multiple requests share a connection; but the drawback is that efficiency may be relatively lower. HTTP is the default mode with a constant connection pipeline .
2.3 HTTP message format
Next, we have to talk about the message format HTTP protocol it. HTTP
Message into the request packet and response packet.
(1) Request message
Here is a browser be taken from me down the HTTP
request packet, a resource request is a picture:
HTTP
The first line of the request message request line , the rest of the call header row , Let me line by line explanation of the contents of the above:
First, is the first line, i.e. the request line, which contains three parts: the request method, resource path, HTTP protocol version , they are separated by three spaces. The first part of the request method represents a request sent by the client to the browser category, our usual way request GET
and POST
request:
- GET : request resources from the server, the server returns the requested resource;
- The POST : the data submitted to the server and request processing (like submitting a form), the data is contained in the request body.
POST
Request may result in a revision to establish and / or existing resources, new resources;
The above embodiment two requests are HTTP1.0
defined, and HTTP1.0
in addition to the above two request methods, there is a HEAD
request:
- The HEAD : similar to the GET request, returns a response but not the specific content, for obtaining the header;
In the HTTP1.1
middle, we have added six requests, respectively OPTIONS
, PUT
, PATCH
, DELETE
, TRACE
and CONNECT
methods, which I will not define one listed, you can click the link to view the back - HTTP request method .
Immediately after the request method is a resource in HTTP
the path on the server, the message is the path /img/prop3.png
to express our request is HTTP
under server path, img
folder under prop3.png
this picture. After this HTTP1.1
indicated that the request to use the HTTP
version of the.
The request line is below the header row, and the row is the header name: value
format, name
indicate the name of the header, and value
is a concrete value of the header. The first name called a header row Host
, represents the HTTP
address of the server is located, and the address here is www.tewuyiang.cn . The second name is the header row Connection
, this representation is that we mentioned above HTTP
connection of the type, and its value is keep-alive
, is to tell the server, using a persistent connection, if the value is close, representation is non-persistent connections . The third line of the head User-Agent
role is to specify a user agent, which tells the server to send the HTTP request type of browser. The role of the fourth row Accept header tells the server what type of hope to receive the resource, if the resource server response inconsistent with this, will be thrown, and can be seen from the above message, this request want to be a picture. Referer
The role is to prevent malicious requests, improve the security of access to resources. Accept-Encoding
The role is to tell the server, the current browser supports encoding type. Accept-Language
The role is to tell the HTTP
server the client wants to acquire language version of the resource, if the server does not contain this language, the default version will be sent back.
Below this map is HTTP
a request packet standard format:
(2) response packet
Similarly, we look at a response message:
Response packet first section consists of two parts, namely, HTTP version and status code , the above message, HTTP
the version 1.1
, the same version of the request, immediately after the status code 200
, which is the most common state code indicating that the request was successful. If want to know other status codes, you can click on the rookie tutorial reference, here are four common:
- 200 - 请求成功;
- 301 - 资源(网页等)被永久转移到其它URL;
- 404 - 请求的资源(网页等)不存在;
- 500 - 内部服务器错误;
第一行之后的这些行,被称为首部行,与请求报文中的首部行类似,也是name: value
。第一个首部行的名称叫做Accept-Ranges
,它的作用是告知客户端,此资源是否支持范围请求,而范围请求可以支持断点续传和多线程分片下载,bytes
表示支持,而none
表示不支持。Last-Modified
的作用后面说缓存时单独拿出来说。Content-type
的作用就是标识资源的类型,这里image/png
表示资源是一张图片。Content-Length
表示资源的字节数,图片中的值是8729
,表示这张图片共有8729
个字节。最后一个Date
的作用就是表示服务器发送该响应报文的日期时间。
下面这一张是HTTP
响应报文的标准格式,可以看到,在最后面还有一个叫实体体的部分,这里就是用来放服务器回送的资源的,例如请求的图片。
2.4 Web缓存器
Web缓存器也叫代理服务器,它在某些情况下可以代替HTTP
服务器满足客户的需求。Web
缓存器有自己的存储空间,并保存有最近被请求资源的副本。它的作用故名思意,就是提供缓存机制的。若部署了Web缓存器,则可以配置浏览器,使得浏览器的HTTP
请求首先发送至Web
缓存器,下面我们通过一个例子来讲解Web
缓存器的机制。
假设我现在要请求www.tewuyiang.cn这个服务器上的prop3.png
这张图片,结果将发生以下情况:
HTTP
客户端创建一个到Web
缓存器的TCP
连接,并向Web
缓存器发送一个请求报文;Web
缓存器接收到请求报文,查看自己的本地是否包含被请求资源的副本,若包含,则由Web
缓存器创建响应报文,并将此副本通过响应报文返回给HTTP
客户端;- 若
Web
缓存器中不包含此资源的副本,则Web
缓存器将向HTTP
服务器(这里指的就是www.tewuyiang.cn)发起一个TCP
连接,并向服务器请求客户端需要的资源; - 服务器创建响应报文,将请求的资源响应给缓存器,缓存器接收到响应报文,解析响应报文携带的资源,并复制一份副本存储在本地,然后重新创建一份响应报文,并将副本封装进其中,发送给最初请求资源的客户端;
通过上面的步骤我们可以看到,Web
缓存器在这个过程中,既充当服务器的角色,又充当客户端的角色。而部署了Web
缓存器后,将大大减少服务器响应资源的时间。
2.5 条件GET方法
介绍完上面的Web
缓存器后,很多人可能会有一个疑问:怎么能够保证Web缓存器上的资源是最新的呢,若服务器上的资源被更新,而我们请求获得的却是缓存器上没有被改变的旧资源怎么办?HTTP
自然是有办法解决这个问题,这时候就要用到我们在讲解响应报文时跳过的首部行Last-Modified了,而这种机制叫做条件GET。
Last-Modified
首部行记录的是当前被请求的资源,在服务器上最后被修改的时间。当我们请求一个Web
缓存器上没有的资源时,Web
缓存器向HTTP
服务器转发该请求,而服务器响应缓存器,同时在响应报文中包含Last-Modified
首部行。Web
缓存器在存储资源的副本时,同时也将Last-Modified
的值存了下来。当下一次有客户端请求此资源时,Web
缓存器会发送一个条件GET
请求到服务器,请求中包含这个时间值,且此时的命名为Last-Modified-Since
。服务器接收到这个时间值后,将它与服务器本地记录的这个资源的最后修改时间进行比较,若两者相等,表示上次请求到这次请求之间,这个资源并未更新,服务器将告知Web
缓存可以直接使用它存储的副本;若两者不相同,则服务器会将最新的资源,以及新的Last-Modified
发送至Web
缓存器,Web
缓存器更新本地的副本,并响应给客户端。
三、总结
上面的内容对HTTP
协议以及它的一些机制进行了一个大致的介绍,相信看完之后,能够让你对HTTP
有一个大致的了解。当然HTTP
的内容肯定不止这些,只是限于篇幅,以及我的知识储备,这篇博客就先写上这些吧。日后有时间,再写一写HTTP
的其他部分,例如cookie
和session
。
四、参考
《计算机网络——自顶向下方法(原书第七版本)》