HTTP protocol elements combed

HTTP is what?

HTTP protocol is the Hyper Text Transfer abbreviation Protocol (Hypertext Transfer Protocol) is used from the World Wide Web (WWW: World Wide Web) server to transfer hypertext transfer protocol local browser.

Which is supported by the transport layer TCP protocol.

 
Request packet
 

A simple example of the structure and the request packet as shown in FIG
Here Insert Picture Description
Here Insert Picture Description
first macro request packet to see, will herein after some of the more important place for detailed description.

Request packet can be divided into four parts

  • Request line
  • Header field
  • Blank line
  • Request body

Wherein the third part, a blank line, header fields and used to separate two parts of the request body, and whether or not the request body, the empty rows have!

The fourth part, the body of the request, the request is part of a data carrier, such as the commonly used method POST request, which carries parameter is saved here.

Now let's look carefully, the first and second portions.

 
Part I: Request Line
 

A first portion, the first line, i.e. Request Line (request line), there are three main contents are as follows:

  • Request method
  • Resource (URL) request
  • HTTP protocol version number

Before introducing the request method, URL is necessary first to clarify the concept. So we URL, request method, HTTP protocol version that order to interpret the request line.

 
A, URL
 

URL, stands for Uniform Resource Locator, the Chinese called a Uniform Resource Locator, is the address used to identify a resource on the Internet at.

This part of the contents of the brother of reference to the following article, I think it's very clear and detailed summary, here carry over directly to the content, if infringement, please let me delete the original link is below.

Detailed composition of the URL

To the following URL for example, describes the various parts of the composition under ordinary URL

http://www.aspxfans.com:8080/news/index.asp?boardID=5&ID=24618&page=1#name

As it can be seen from the above URL, a full URL, including the following sections:
1, part of the agreement : the protocol part of the URL is "http:", which represents the web page using the HTTP protocol. More may be used in the Internet protocols, such as HTTP, FTP and the like used in this example is the HTTP protocol. In the "HTTP" behind "//" is a delimiter

2, domain name part : the part of the URL of the domain name " www.aspxfans.com ." A URL, or you can use the IP address as the domain name

3, port sections : following the domain name is to use between the ports, the domain name and port ":" as the delimiter. Port is not a necessary part of the URL, if the port portion is omitted, the default port 80

4, the virtual directory section : from the first domain name after "/" beginning to the last "/" so far, is the virtual directory section. Virtual directory is not a necessary part of the URL. In this example the virtual directory is "/ news /"

5, the file name part : from the last after the domain name "/" beginning to date, is the filename part, if not, it is from the last after the domain name "/" beginning to "#" so far "?" "?" , is part of the file, if there is no "?" and "#", then the domain name from the last "/" start to finish, it is part of the file name. In this case the file name is "index.asp". Part of the file name is not a necessary part of the URL, if you omit this part, the default file name

6, the anchor part : From the "#" beginning to end, are part of the anchor. Anchor of this embodiment is "name". Anchor part of the URL is not a necessary part

7, part parameters : From the beginning to the "#" part of the argument between the far part, also known as part of the search query part "?." Parameters of this embodiment is "boardID = 5 & ID = 24618 & page = 1". Parameter to allow a plurality of parameters, with the parameters between the parameter and the "&" as the delimiter.

To add two concepts: URI and URN

  • URI (Uniform Resource Identifier), uniform resource identifier, used to uniquely identify a resource. HTTP uses uniform resource identifier (Uniform Resource Identifiers, URI) to transmit data and establish a connection. URL is a special type of URI, contains enough information for finding a resource.
  • URN (Uniform Resource Name), Uniform Resource Name, URI is one of two forms. An identifier that uniquely identifies the entity, but does not give the location of the entity. URN and URL are all URI of the .

All in all, the URL of is a subset of URI , tells us access network location. URL should look like this:

http://bitpoetry.io/posts/hello.html

URN is a subset of the URI , including the name (given the namespace), but does not include access method, as follows:

bitpoetry.io/posts/hello.html#intro

If you understand the contents of the URL, in fact, in most cases it is enough, if you want a clearer clarify URI \ URL \ URN relations and differences of the three, can refer to the following article.

You know the difference between a URL, URI and URN three it?

 
Second, the request method
 
HTTP1.0 request defines three methods: GET, POST, and HEAD method.
HTTP1.1 five new request methods: OPTIONS, PUT, DELETE, TRACE, and CONNECT method.

The following is a description of these eight methods.

method description
GET Words to summarize, access to resources! The most common request method
POST Submitting data to the processing request specified resource (e.g., file submission form or upload). Data contained in the request body. POST request may result in a revision to establish and / or existing resources to new resources.
HEAD Used to get similar message header, and the GET method, but does not return a message entity body parts. Mainly used to confirm the validity and the date and time and other resources to update the URL.
OPTIONS URL query method to support specified.
PUT Upload files, there are security issues, generally do not use this method.
DELETE Delete files, just the opposite with the PUT method
TRACE Let the server communication path back to the client.
CONNECT Using SSL (Secure Sockets Layer, Secure Socket Layer) and TLS (Transport Layer Security, Transport Layer Security) protocol to communicate via the network contents encrypted transmission tunnel.

Involved in web development friends you should know that the most commonly used method is GET and POST requests two methods, one for obtaining data and one for data submission. The interviewer also often ask the difference between these two methods GET and POST interview, in passing also give an answer to this question.

We can from five levels to answer this question.

1. Role

As mentioned above, it GET is mainly used for access to resources, and is mainly used for transmission POST entity body .

2. Request parameter

GET 和 POST 的请求都能使用额外的参数,使用 GET 方法,请求参数是以查询字符串出现在 URL 中,而 POST 的参数存储在实体主体中

不能因为 POST 参数存储在实体主体中就认为它的安全性更高,因为照样可以通过一些抓包工具(比如 WireShark、Fiddler)查看。

下面是使用 GET 和 POST 方法携带额外参数的示例

GET /test/index.html?user_id=1&user_name=sam HTTP/1.1

POST /test/index.html HTTP/1.1
Host: test.com
 
user_id=1&user_name=sam

还有一点需要注意的是,因为 URL 只支持 ASCII 码,因此 GET 的参数中如果存在中文等字符就需要先进行编码。例如 中文 会转换为 %E4%B8%AD%E6%96%87,而空格会转换为 %20。POST 参数支持标准字符集。

3. 安全性

安全的 HTTP 方法不会改变服务器状态,也就是说它只是可读的。

GET 方法是安全的,而 POST 却不是,因为 POST 的目的是传送实体主体内容,这个内容可能是用户上传的表单数据,上传成功之后,服务器可能把这个数据存储到数据库中,因此状态也就发生了改变。

4. 幂等性

解释一下什么是幂等性。

幂等的 HTTP 方法,同样的请求被执行一次与连续执行多次的效果是一样的,服务器的状态也是一样的。换句话说就是,幂等方法不应该具有副作用(统计用途除外)。

在正确实现的条件下,GET 方法都是幂等的,而 POST 方法不是。

5. 可缓存

这里的可缓存,指的是对响应进行缓存,避免同样的数据被重复的请求。
如果要对响应进行缓存,需要满足以下条件:

  • 请求报文的 HTTP 方法本身是可缓存的,GET 就是个例子,POST 在多数情况下不可缓存的。
  • 响应报文的状态码是可缓存的,包括:200, 203, 204, 206, 300, 301, 404, 405, 410, 414, 501。
  • 响应报文的 Cache-Control 首部字段没有指定不进行缓存。

另外:
关于 GET 和 POST 方法的区别,网上有的文章会提及使用 GET 方法提交的数据长度有限制(有的说法也会提及 POST 的实体数据大小也是有限制的),这样说是不准确的。因为 GET 是通过 URL 提交数据,那么 GET 可提交的数据量就跟 URL 的长度有直接关系了。而实际上,URL不存在参数上限的问题,HTTP 协议规范没有对URL长度进行限制。这个限制是特定的浏览器及服务器对它的限制。IE 对 URL 长度的限制是2083字节(2K+35)。对于其他浏览器,如 Netscape、FireFox 等,理论上没有长度限制,其限制取决于操作系统的支持。

感兴趣的可以看下这篇文章

两个长度限制问题的分析(来源于项目)
 
三、HTTP协议版本
 

按协议规范,主要分为 HTTP/1.0 、HTTP/1.1 、HTTP/2.0 这三种。
关于这三种规范对比,会在文章末尾给出,因为还需要了解一些首部字段的知识。
这里只简单的提一下。

到这里,关于请求行的内容算是大致上撸完了,接下来就是请求报文最后要介绍的一部分了:首部字段。

 
第二部分:首部字段
 

有 4 种类型的首部字段:通用首部字段请求首部字段响应首部字段实体首部字段

各种首部字段(不全,只提供参考)及其含义如下

 
通用首部字段
Here Insert Picture Description
请求首部字段
Here Insert Picture Description
响应首部字段
Here Insert Picture Description
实体首部字段
Here Insert Picture Description

了解一下即可,别傻呵呵的去背。

至此,请求报文的内容要点大致梳理完毕,接下来,我们来看一下响应报文。

 
响应报文
 

响应报文的结构和一个简单的例子如下图所示
Here Insert Picture Description
Here Insert Picture Description
先宏观的看响应报文,之后本文会对一些比较重要的地方进行细致的介绍。

同请求报文一样,响应报文也可以被分为四个部分

  • 响应行
  • 首部字段
  • 空行
  • 响应正文

其中第二部分:首部字段 与 第三部分:空行。与请求报文同理,不多赘述。

第四部分响应报文,如其名,就是客户端请求服务端后,服务端反馈给你的你所要的资源或者是你执行了 POST 这种操作后,服务器反馈回来的额外的指示信息(比如成功或者失败)。

打个比方,你在你的浏览器地址栏中输入 www.baidu.com ,敲击回车后,根据上面我们所介绍的 HTTP 请求报文,现在你应该知道了,这时你的浏览器会向百度的服务器递交一个 HTTP 请求报文,请求方法就是 GET。而百度的服务器会将百度的首页作为响应正文反馈给你。

输入 URL (这里的例子就是百度的网址),发生的具体的一系列事情,这个也经常作为面试的考点,这里不多介绍了。有需要的话,请看我的另一篇文章,有关于这个问题的一个简单阐述。

在浏览器的地址栏中键入URL,敲击回车后经历了什么?

用 Chrome 的调试模式简单的看一下。
Here Insert Picture Description
我圈红的这一大片就是响应正文,即百度首页的 HTML 文档。

那么对于响应报文,除了第一部分:响应行,其余的三个部分也都介绍完毕了,现在我们来看一下这个响应行。

 
响应行
 

同请求行一样,响应行也由三部分组成

  • HTTP 协议版本
  • 状态码
  • 状态码描述

其中,HTTP 协议版本这个内容在请求报文中已经提到了,这里我们就只看一下状态码和其描述即可。

一共 5 种类型的状态码,分类和具体内容如下所述
Here Insert Picture Description

 
1xx 指示信息

  • 100 Continue :表明到目前为止都很正常,客户端可以继续发送请求或者忽略这个响应。

 
2xx 成功

  • 200 OK
  • 204 No Content :请求已经成功处理,但是返回的响应报文不包含实体的主体部分。一般在只需要从客户端往服务器发送信息,而不需要返回数据时使用。
  • 206 Partial Content :表示客户端进行了范围请求,响应报文包含由 Content-Range 指定范围的实体内容。

 
3xx 重定向

  • 301 Moved Permanently :永久性重定向
  • 302 Found :临时性重定向
  • 303 See Other :和 302 有着相同的功能,但是 303 明确要求客户端应该采用 GET 方法获取资源。
  • 304 Not Modified :如果请求报文首部包含一些条件,例如:If-Match,If-Modified-Since,If-None-Match,If-Range,If-Unmodified-Since,如果不满足条件,则服务器会返回 304 状态码。
  • 307 Temporary Redirect :临时重定向,与 302 的含义类似,但是 307 要求浏览器不会把重定向请求的 POST 方法改成 GET 方法。

 
4xx 客户端错误

  • 400 Bad Request :请求报文中存在语法错误。
  • 401 Unauthorized :该状态码表示发送的请求需要有认证信息(BASIC 认证、DIGEST 认证)。如果之前已进行过一次请求,则表示用户认证失败。
  • 403 Forbidden :请求被拒绝。
  • 404 Not Found : 资源没找到

 
5xx 服务端错误

  • 500 Internal Server Error :服务器正在执行请求时发生错误。
  • 503 Service Unavailable :服务器暂时处于超负载或正在进行停机维护,现在无法处理请求。

梳理完了请求/响应报文,再来看看 HTTP 其他几个比较重要的知识点

 
Cookie
 
详情见我另外一篇文章,链接在下面。

Cookie与Session概念与区别

 
连接管理
 

Here Insert Picture Description
从左到右分别为 短连接、持久连接、流水线,其中持久连接又名长连接。

1. 短连接与持久连接

当浏览器访问一个包含多张图片资源的 HTML 页面时,除了请求访问的 HTML 页面资源,还会请求这些其他的资源。如果每进行一次 HTTP 通信就要新建一个 TCP 连接,那么开销会很大。

持久连接只需要建立一次 TCP 连接就能进行多次 HTTP 通信。

从 HTTP/1.1 开始默认是持久连接的,如果要断开连接,需要由客户端或者服务器端提出断开,使用 Connection : close

在 HTTP/1.1 之前默认是短连接的,如果需要使用持久连接,则使用 Connection : Keep-Alive。

2. 流水线

默认情况下,HTTP 请求是按顺序发出的,下一个请求只有在当前请求收到响应之后才会被发出。由于受到网络延迟和带宽的限制,在下一个请求被发送到服务器之前,可能需要等待很长时间。

流水线是在同一条长连接上连续发出请求,而不用等待响应返回,这样可以减少延迟。

 
HTTP/1.1对比HTTP/1.0
 

  • 默认是持久连接,这个是最重要的区别
  • 支持流水线
  • Support open multiple TCP connections
  • Five new request methods OPTIONS, PUT, DELETE, TRACE, and CONNECT
  • New processing instruction cache max-age

A wide range of applications now mainly HTTP / 1.1
The following article on HTTP / 1.1 and contrast HTTP / 2.0 can refer to.

Difference HTTP1.0, HTTP1.1 and the HTTP2.0

Guess you like

Origin blog.csdn.net/u013568373/article/details/92561689