How does the HTTP protocol work?

1. HTTP overview

(1) The concept of HTTP

Goal: Understand the concept of HTTP

1. The concept of HTTP

HTTP is the abbreviation of HyperText Transfer Protocol, that is, Hypertext Transfer Protocol. It is a request/response protocol. After the client establishes a connection with the server, it can send a request to the server. This request is called an HTTP request, and the server will respond after receiving the request, which is called an HTTP response.Please add a picture description

2. Features of the HTTP protocol

(1) C/S mode

HTTP protocol supports client (browser is a kind of Web client)/server mode.

(2) Simple and fast

When the client requests a service from the server, it only needs to transmit the request method and path. Commonly used request methods include GET, POST, etc. Different request methods specify different types of contact between the client and the server. HTTP is relatively simple, which makes the program size of the HTTP server small, so the communication speed is very fast.

(3) Flexible

HTTP allows the transmission of any type of data, and the type of data being transmitted is marked by Content-Type.

(4) Stateless

HTTP is a stateless protocol. Stateless means that the protocol has no memory capability for transaction processing. If subsequent processing requires previous information, it must be retransmitted, which may increase the amount of data transmitted per connection.

(2) HTTP 1.0 and HTTP 1.1

Objective: To understand the characteristics and differences of HTTP 1.0 and HTTP 1.1

1. Development of HTTP

Since HTTP was born, it has gone through many versions. Among them, the earliest version is HTTP 0.9, which was released in 1990. Later, in order to further improve HTTP, HTTP version 1.0 was released in 1996, and HTTP version 1.1 was released in 1997. Since the HTTP 0.9 version is outdated, I won't explain too much here.

2. Introduction of HTTP1.0

(1) Definition of HTTP1.0

The client and server based on the HTTP 1.0 protocol need to go through four steps in the interaction process: establishing a connection, sending a request message, sending back a response message, and closing the connection.
Please add a picture description

(2) Disadvantages of HTTP1.0

After the client establishes a connection with the server, it can only process one HTTP request at a time. For content-rich web pages, this communication method is obviously flawed.
For example, a snippet of HTML code based on the HTTP 1.0 protocol

<html>
	<body>
		<img src="/image01.jpg">
		<img src="/image02.jpg">
		<img src="/image03.jpg">
	</body>	
</html>

3. Introduction of HTTP1.1

In order to overcome the time-consuming defects of the above-mentioned HTTP 1.0 client-server interaction, HTTP 1.1 version came into being. It supports persistent connections, that is to say, multiple HTTP requests and responses can be transmitted on one TCP connection, thereby reducing the consumption and delay of establishing and closing connections.
Please add a picture description

4. HTTP message

Goal: Familiarize yourself with the composition of HTTP messages

(1) The concept of HTTP message

When a user accesses a URL address in a browser, clicks a hyperlink on a web page, or submits a form on a web page, the browser will send request data to the server, that is, an HTTP request message. After the server receives the request data, it will send the processed data to the client, that is, the HTTP response message. HTTP request messages and HTTP response messages are collectively referred to as HTTP messages.

(2) Use a browser to view HTTP messages

Enter www.baidu.com in the address bar of Google Chrome to visit the Baidu homepage, press F12 to enter the developer tool debugging page, and you can see the requested URL address in the [Network] request information column.
Please add a picture description
Please add a picture description
Please add a picture description
Please add a picture description
Please add a picture description

Two, HTTP request information

(1) HTTP request line

Goal: Familiarize yourself with the HTTP request line

1. HTTP request line

The HTTP request line is located in the first line of the request message. It includes three parts, namely the request method, the resource path, and the HTTP version used. A specific example: GET /index.html HTTP/1.1 GET is the request method, index.html is the request resource path, and HTTP/1.1 is the protocol version used for communication
. It should be noted that each part in the request line needs to be separated by a space, and it must end with a carriage return and a line feed at the end.

2. HTTP request method

Request method meaning
GET Request to obtain the resource identified by the URI of the request line
POST Submit data to the specified resource and request the server for processing (such as submitting a form or uploading a file)
HEAD Request to obtain the response header of the resource identified by the URI
PUT Place the webpage at the specified URL location (upload/move)
DELETE Request the server to delete the resource identified by the URI
TRACE Request the server to send back the received request information, mainly for testing or diagnosis
Resource
-related options and requirements

1) GET method

When the user directly enters a URL address in the browser address bar or clicks a hyperlink on the web page, the browser will use the GET method to send the request. If the method attribute of the form on the web page is set to "GET" or the method attribute is not set (the default value is GET), when the user submits the form, the browser will also use the GET method to send the request.
If there is a parameter part in the URL requested by the browser, the parameter part will be appended to the resource path in the request line in the request message generated by the browser.
URL address: http://www.lzy.cn/javaForum?name=howard&pwd=123456, the content after "?" is parameter information. A parameter is composed of a parameter name and a parameter value, and an equal sign (=) is used to connect them. If there are multiple parameters in the URL address, separate them with "&".
When the browser sends a request message to the server, the parameter part will be appended to the URI resource to be accessed: GET /javaForum?name=howard&pwd=123456. It should be noted that the amount of data transmitted by GET is limited, and the maximum cannot exceed 2KB.

(2) POST method

If the method attribute of the form on the web page is set to "POST", when the user submits the form, the browser will use the POST method to submit the form content, and send the elements and data of the form to the server as the entity content of the HTTP message, rather than as parameters of the URL address. In addition, when using POST to transmit data to the server, the Content-Type header will be automatically set to "application/x-www-form-urlencoded", and the Content-Length header will be automatically set to the length of the entity content.

POST /javaForum HTTP/1.1
Host: www.lzy.cn
Content-Type: application/x-www-form-urlencoded
Content-Length: 22
name=howard&pwd=123456 

(2) HTTP request header

Goal: Familiar with HTTP request headers

1. HTTP request header

In an HTTP request message, the request line is followed by several request headers. The request header is mainly used to transmit additional information to the server, such as the data type that the client can receive, the compression method, the language, and the URL address of the page to which the hyperlink that sends the request belongs.

Host: localhost:8080
Accept: image/gif, image/x-xbitmap, *
Referer: http://localhost:8080/lzy/
Accept-Language: zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; GTB6.5; CIBA)
Connection: Keep-Alive
Cache-Control: no-cache

2. Fields of the HTTP request header

头字段 说明
Accept Accept头字段用于指出客户端程序(通常是浏览器)能够处理的MIME(Multipurpose Internet Mail Extension)类型
Accept-Charset Accept-Charset头字段用于告知服务器客户端所使用的字符集
Accept-Encoding Accept-Encoding头字段用于指定客户端能够进行解码的数据编码方式,这里的编码方式通常指的是某种压缩方式
Accept-Language Accept-Language头字段用于指定客户端期望服务器返回哪个国家语言的文档
Authorization 当客户端访问受口令保护的网页时,Web服务器会发送401响应状态码和WWW-Authenticate响应头,要求客户端使用Authorization请求头来应答
Proxy-Authorization Proxy-Authorization头字段的作用与用法与Authorization头字段基本相同,只不过Proxy-Authorization请求头是服务器向代理服务器发送的验证信息
Host Host头字段用于指定资源所在的主机名和端口号
If-Match 当客户机再次向服务器请求这个网页文件时,可以使用If-Match头字段附带以前缓存的实体标签内容,这个请求被视为一个条件请求
If-Modified-Since If-Modified-Since请求头的作用和If-Mach类似,只不过它的值为GMT格式的时间
Range 用于指定服务器只需返回文档中的部分内容及内容范围,这对较大文档的断点续传非常有用
If-Range The If-Range header field can only be used together with the Range header field. Its value can be an entity tag or a time in GMT format. Max-Forward specifies the number of
proxy servers that the current request can pass through. Each time a proxy server passes, this value will be reduced by 1
. At the same time, the Referer header field can also be used for the anti-leeching
User-Agent of the website. User-Agent is called User Agent in Chinese, or UA for short. It is used to specify the operating system and version, browser and version, browser rendering engine, browser language, etc. used by the browser or other client programs, so that the server can return different content for different types of browsers.

(1) Accept request header field

The Accept header field is used to indicate the MIME (Multipurpose Internet Mail Extensions, Multipurpose Internet Mail Extensions) type that the client program (usually a browser) can handle. For example, if both the browser and the server support images of the png type, the browser can send an Accept header field containing image/png, and the server checks that the Accept header contains the MIME type image/png, and may use png type files in the img element of the web page. There are many MIME types. For example, the following MIME types can be used as the value of the Accept header field.
Accept: text/html, indicating that the client wants to accept HTML text.
Accept: image/gif, indicating that the client wants to accept resources in GIF image format.
Accept: image/*, indicating that the client can accept all image format subtypes.
Accept: / , indicating that the client can accept content in all formats.
1
2
3
4

(2) Accept-Encoding request header field

The Accept-Encoding header field is used to specify the data encoding method that the client can decode. The encoding method here usually refers to a certain compression method. In the Accept-Encoding header field, you can specify multiple data encoding methods, which are separated by commas. Specific examples: Accept-Encoding: gzip,compress The two
formats of gzip and compress are the most common data encoding methods. Before transmitting larger entity content, compressing and encoding it can save network bandwidth and transmission time. After receiving the request header, the server uses one of the specified formats to compress and encode the original document content, and then sends it to the client as the entity content of the response message, and indicates the compression encoding format used for the entity content in the Content-Encoding response header. After the browser receives such entity content, it needs to reverse decompress it.

(3) Host request header field

The Host header field is used to specify the host name and port number where the resource is located. The format is the same as the host name and port number part in the full URL of the resource. Specific example: Host: www.lzy.cn:80 Since the default port number used by the browser to connect to the server is 80, the port number information ":80" after "www.lzy.cn" can be omitted
. It should be noted that in HTTP1.1, each request message sent by browsers and other clients must contain the Host request header field, so that the Web server can distinguish the virtual Web site that the client wants to visit according to the host name in the Host header field. When a browser accesses a Web site, it will automatically generate the corresponding Host request header according to the URL address in the address bar.

(4) If-Modified-Since request header field

The function of the If-Modified-Since request header is similar to that of If-Mach, except that its value is the time in GMT format. The If-Modified-Since request header is regarded as a request condition, and the server will return the document content only if the modification time of the document in the server is newer than the time specified by the If-Modified-Since request header. Otherwise, the server will return a 304 (Not Modified) status code to indicate that the document cached by the browser is up to date without returning the content of the document to the browser. At this time, the browser still uses the previously cached document. In this way, the amount of communication data between the browser and the server can be reduced to a certain extent, thereby improving communication efficiency.

(5) Referer request header field

The request sent by the browser to the server may be sent by directly entering the URL address in the browser, or by clicking a hyperlink on a web page. For the first case where the URL address is directly entered in the browser address bar, the browser will not send the Referer request header. For the second case, for example, a page contains a hyperlink pointing to a remote server. When the hyperlink is clicked to send a GET request to the server, the browser will include the Referer header field in the request message sent: Host: www.lzy.cn:80 The Referer header field is very useful and is often used by website managers to track how visitors navigate to the website
. At the same time, the Referer header field can also be used for anti-leeching of websites.
What is hotlinking? Assuming that a website wants to display some image information on the homepage, but there are no such image resources in the server of the website, it links to image resources of other websites by using tags in the Html file, and displays them to the viewer. This is hotlinking. Hotlinked websites increase the number of visits to their own websites, but increase the burden on the servers of the linked websites and damage their legitimate interests. Therefore, in order to protect its own resources, a website can detect where to link to the current web page or resource through the Referer header. Once it detects that the access is not through the link of this site, it can block the access or jump to the specified page.

Please add a picture description

(6) User-Agent request header field
User-Agent is called User Agent in Chinese, referred to as UA, which is used to specify the operating system and version, browser and version, browser rendering engine, browser language, etc. used by browsers or other client programs, so that the server can return different content for different types of browsers.
Example of User-Agent request information generated by Google Chrome: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 In the request header above, the User-Agent header field first lists the Mozilla version and then the OS version (Windows NT 10.0 means Windows 10), the browser's engine name (AppleWebKit/537.36 ), and the browser's version (Chrome/110.0.0.0 Safari/537.36)
.

3. HTTP response information

(1) HTTP response status line

Goal: Familiarize yourself with the 3 parts of the HTTP response status line

1. HTTP response status line

The HTTP response status line is located in the first line of the response message, which includes 3 parts, namely the HTTP version, an integer code (status code) indicating success or error, and text information describing the status code.
Specific example of the HTTP response status line: HTTP/1.1 200 OK
HTTP/1.1 is the protocol version used for communication, 200 is the status code, and OK is the status description, indicating that the client request is successful. It should be noted that each part in the request line needs to be separated by a space, and it must end with a carriage return and a line feed at the end.

2. HTTP status code

The status code consists of 3 digits indicating whether the request was understood or fulfilled. The first digit of the HTTP response status code defines the category of the response, and the last two digits have no specific classification. There are 5 possible values ​​for the first number.

3. Common status codes in web development

Status code description
200 means that the server has successfully processed the client's request. The client's request is successful, and the response message returns a normal request result
302, indicating that the requested resource temporarily responds to the request from a different URI, but the requester should continue to use the original location for future requests. For example, in request redirection, the temporary URI should be the resource pointed to by the Location header field of the response.
304 If the client has a cached document, it will append an If-Modified-Since request header to the request message sent, indicating that the server needs to return a new document only if the requested document has changed after the time specified by If-Modified-Since. Status code 304 indicates that the client cached version is up to date and the client should continue to use it. Otherwise, the server will return the requested document with a status code of 200 and
404 indicating that the server could not find the requested resource. For example, accessing a webpage whose server does not exist often returns this status code
500, indicating that the server has encountered an error and cannot process the client's request. In most cases, errors occur in programs such as CGI, ASP, and JSP of the server, and generally the server will provide specific error information in the corresponding message

2) HTTP response header

Goal: Familiarize yourself with HTTP response headers

1. HTTP response header

In the HTTP response message, the first line is the response status line, followed by several response headers. The server transmits additional information to the client through the response headers, including the service program name, the authentication method required by the requested resource, the last modification time of the resource requested by the client, and the redirection address.
Concrete examples of HTTP response headers

Server: Apache-Coyote/1.1 
Content-Encoding: gzip 
Content-Length: 80  
Content-Language: zh-cn 	 
Content-Type: text/html; charset=GB2312 
Last-Modified: Mon, 06 Jul 2020 07:47:47 GMT 
Expires: -1	
Cache-Control: no-cache 
Pragma: no-cache

2. HTTP response header field header field description

Accept-Range is used to indicate whether the server accepts the resource requested by the client using the Range request header field.
Age is used to indicate the effective time that the current web document can be cached in the client or proxy server. The set value is a time in seconds. Etag is used to transmit tag information representing the characteristics of the entity content to the client. The URL address Retry-After using an absolute path can be used in conjunction with the 503 status code to tell the client when to resend
the
request
. It can also be used in conjunction with any 3xx status code to tell the client the minimum delay in processing redirects.





Content-Disposition If the server wants the browser to not directly process the entity content of the response, but to let the user choose to save the entity content of the response in a file, this needs to use the Content-Disposition header field

(1) Location response header field

The Location header field is used to notify the client of the new address to obtain the requested document, and its value is a URL address using an absolute path. Example of the Location response header field: Location: http://www.lzy.edu.cn
The Location header field is used in conjunction with most 3xx status codes to notify the client to automatically reconnect to the new address request document. Since the current response does not directly return the content to the client, the HTTP message using the Location header should not have entity content. It can be seen that the two header fields of Location and Content-Type cannot appear in the HTTP message header at the same time.

(2) Server response header field

The Server header field is used to specify the name of the server software product, a specific example: Server: Apache-Coyote/1.1

(3) Refresh response header field

The Refresh header field is used to tell the browser when to automatically refresh the page. Its value is a time in seconds. For example: Refresh: 3 The Refresh header field shown above is used to tell the browser to automatically refresh this page after 3 seconds
.
A URL parameter can also be added after the time value of the Refresh header field, and the time value and the URL are separated by a semicolon (;), which is used to tell the browser to jump to other web pages after the specified time value. For example, tell the browser to jump to the www.itcast.cn website after 3 seconds, specific example: Refresh: 3;url=http://www.lzy.edu.cn

(4) Content-Disposition response header field

The Content-Disposition header field is not defined in the HTTP standard specification, it is borrowed from RFC2183. In RFC2183, Content-Disposition specifies the way the receiving program processes the data content. There are two standard ways: inline and attachment. Inline means direct processing, while attachment requires the user to intervene and control the way the receiving program processes the data content. In HTTP applications, only attachment is the standard method of Content-Disposition. The filename parameter can also be specified after the attachment. The value of the filename parameter is the name of the file that the server recommends to the browser to save the entity content. The browser should ignore the directory part in the value of the filename parameter, and only take the last part in the parameter as the file name. Before setting Content-Disposition, be sure to set the Content-Type header field.

Content-Type: application/octet-stream
Content-Disposition: attachment; filename=lee.zip

Guess you like

Origin blog.csdn.net/qq_64505257/article/details/130982808