Dynamic website development study notes 03: HTTP protocol

1. Overview of HTTP

(1) Concept of HTTP

Goal: Understand the concepts of HTTP

1. The concept of HTTP

HTTP is the abbreviation of HyperText Transfer Protocol, which is Hypertext Transfer Protocol. It is a request/response protocol. After the client establishes a connection with the server, it can send a request to the server. This request is called an HTTP request. The server will respond after receiving the request, which is called an HTTP response. .Please add image description

2. Characteristics of HTTP protocol

(1) C/S mode

The HTTP protocol supports client (the browser is a Web client)/server mode.

(2) Simple and fast

When the client requests services from the server, it only needs to transmit the request method and path. Commonly used request methods include GET, POST, etc. Different request methods specify different types of contact between the client and the server. HTTP is relatively simple, making the program size of the HTTP server small, so the communication speed is very fast.

(3) Flexible

HTTP allows the transmission of any type of data, and the type of data being transmitted is marked by Content-Type.

(4) Stateless

HTTP is a stateless protocol. Stateless means that the protocol has no memory capacity for transaction processing. If subsequent processing requires previous information, it must be retransmitted, which may increase the amount of data transmitted per connection.

(2) HTTP 1.0 and HTTP 1.1

Objective: Understand the characteristics and differences between HTTP 1.0 and HTTP1.1

1. Development of HTTP

Since its birth, HTTP has gone through many versions. Among them, the earliest version is HTTP 0.9, which was released in 1990. Later, in order to further improve HTTP, HTTP version 1.0 was released in 1996, and HTTP version 1.1 was released in 1997. Since the HTTP 0.9 version is outdated, we won’t explain it too much here.

2. Introduction to HTTP1.0

(1) Definition of HTTP1.0

During the interaction process between the client and the server based on the HTTP 1.0 protocol, they need to go through four steps: establishing a connection, sending request information, sending back response information, and closing the connection.
Please add image description

(2) Disadvantages of HTTP1.0

After the client establishes a connection with the server, it can only process one HTTP request at a time. For content-rich web pages, this method of communication is obviously flawed.
For example, HTML code snippet based on HTTP 1.0 protocol

<html>
	<body>
		<img src="/image01.jpg">
		<img src="/image02.jpg">
		<img src="/image03.jpg">
	</body>	
</html>

3. Introduction to HTTP1.1

In order to overcome the above-mentioned shortcomings of time-consuming interaction between HTTP 1.0 client and server, HTTP 1.1 version came into being. It supports persistent connections, which means that multiple HTTP requests and responses can be transmitted on a TCP connection, thereby reducing the establishment and closing of Connection consumption and latency.
Please add image description

4. HTTP messages

Goal: Be familiar with the composition of HTTP messages

(1) Concept of HTTP messages

When a user accesses a URL address in the browser, clicks a hyperlink on the web page, or submits a form on the web page, the browser will send request data to the server, that is, an HTTP request message. After the server receives the request data, it sends the processed data to the client, which is an HTTP response message. HTTP request messages and HTTP response messages are collectively referred to as HTTP messages.

(2) Use the browser to view HTTP messages

Enter www.baidu.com in the address bar of Google Chrome to visit the Baidu homepage, press the F12 key to enter the developer tools debugging page, and you can see the requested URL address in the request information column of [Network].
Please add image description
Please add image description
Please add image description
Please add image description
Please add image description

2. HTTP request information

(1) HTTP request line

Goal: Be familiar with the request line of HTTP

1. HTTP request line

The HTTP request line is located in the first line of the request message. It includes three parts, namely the request method, the resource path and the HTTP version used. Specific example: GET /index.html HTTP/1.1 GET is the request method,
index.html is the requested resource path, and HTTP/1.1 is the protocol version used for communication. It should be noted that each part in the request line needs to be separated by spaces, and it must end with a carriage return and line feed.

2. HTTP request method

Request method meaning
GET requests to obtain the resource identified by the URI of the request line
POST submits data to the specified resource and requests the server to process (such as submitting a form or uploading a file)
HEAD requests to obtain the response header of the resource identified by the URI
PUT places the web page in Specify the URL location (upload/move)
DELETE Request the server to delete the resource identified by the URI
TRACE Request the server to send back the received request information, mainly used for testing or diagnosis
CONNECT Reserve for future use
OPTIONS Request to query the performance of the server, or query resource-related information Options and requirements

1) GET method

When the user directly enters a URL address in the browser address bar or clicks a hyperlink on the web page, the browser will use the GET method to send a request. If the method attribute of the form on the web page is set to "GET" or the method attribute is not set (the default value is GET), when the user submits the form, the browser will also use the GET method to send the request.
If there is a parameter part in the URL requested by the browser, in the request message generated by the browser, the parameter part will be appended to the resource path in the request line.
URL address: http://www.lzy.cn/javaForum?name=howard&pwd=123456, the content after "?" is parameter information. Parameters consist of parameter names and parameter values, and are connected using an equal sign (=). If there are multiple parameters in the URL address, separate them with "&".
When the browser sends a request message to the server, the parameter part will be appended to the URI resource to be accessed: GET /javaForum?name=howard&pwd=123456. It should be noted that the amount of data transmitted using the GET method is limited and cannot exceed 2KB at most. .

(2) POST method

If the method attribute of the form on the web page is set to "POST", when the user submits the form, the browser will use the POST method to submit the form content, and send the elements and data of the form form to the server as the entity content of the HTTP message instead of Passed as a parameter to the URL address. In addition, when using POST to transmit data to the server, the Content-Type header will be automatically set to "application/x-www-form-urlencoded", and the Content-Length header will be automatically set to the length of the entity content.

POST /javaForum HTTP/1.1
Host: www.lzy.cn
Content-Type: application/x-www-form-urlencoded
Content-Length: 22
name=howard&pwd=123456 

(2) HTTP request header

Goal: Familiar with HTTP request headers

1. HTTP request header

In the HTTP request message, the request line is followed by several request headers. Request headers are mainly used to deliver additional information to the server, such as the data types that the client can receive, compression methods, languages, and the URL address of the page to which the requested hyperlink belongs.

Host: localhost:8080
Accept: image/gif, image/x-xbitmap, *
Referer: http://localhost:8080/lzy/
Accept-Language: zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; GTB6.5; CIBA)
Connection: Keep-Alive
Cache-Control: no-cache

2. HTTP request header fields

Header field description
The Accept header field is used to indicate the MIME (Multipurpose Internet Mail Extension) type that the client program (usually a browser) can handle. The Accept-Charset header field is used to
inform the server of the character set used by the client.
Accept -Encoding The Accept-Encoding header field is used to specify the data encoding method that the client can decode. The encoding method here usually refers to a certain compression method. The Accept-Language header field is used to
specify which country the client expects the server to return. Language documentation
Authorization When a client accesses a password-protected web page, the Web server will send a 401 response status code and a WWW-Authenticate response header, requiring the client to use the Authorization request header to respond to Proxy-Authorization. The role of the Proxy-Authorization header field is the same
as The usage is basically the same as the Authorization header field, except that the Proxy-Authorization request header is the verification information sent by the server to the proxy server. The
Host header field is used to specify the host name and port number where the resource is located.
If-Match, when the client requests this web page from the server again file, you can use the If-Match header field to attach the previously cached entity tag content. This request is regarded as a conditional request.
If-Modified-Since The function of the If-Modified-Since request header is similar to If-Mach, except that its The time Range value in GMT format
is used to specify that the server only needs to return part of the content and content range of the document. This is very useful for resuming the upload of larger documents.
If-Range The If-Range header field can only be used together with the Range header field. Its value can be an entity tag or a time in GMT format. Max-Forward
specifies the number of proxy servers that the current request can pass through. Each time it passes through a proxy server, this value Just minus 1
Referer The Referer header field is very useful and is often used by website administrators to track how website visitors navigate to the website. At the same time, the Referer header field can also be used to prevent hotlinking on the website
. User-Agent. User-Agent is called user agent in Chinese, or UA for short. It is used to specify the operating system and version, browser and version used by the browser or other client programs. Browser rendering engine, browser language, etc. so that the server returns different content for different types of browsers

(1) Accept request header field

The Accept header field is used to indicate the MIME (Multipurpose Internet Mail Extensions) types that the client program (usually a browser) can handle. For example, if both the browser and the server support png type images, the browser can send an Accept header field containing image/png, and the server checks that the Accept header contains the MIME type image/png, which may be in the img element in the web page. Use png type files. There are many MIME types. For example, the following MIME types can be used as the value of the Accept header field.
Accept: text/html, indicating that the client wishes to accept HTML text.
Accept: image/gif, indicating that the client wishes to accept resources in GIF image format.
Accept: image/*, indicating that the client can accept all subtypes of image format.
Accept: / , indicating that the client can accept content in all formats.
1
2
3
4

(2) Accept-Encoding request header field

The Accept-Encoding header field is used to specify the data encoding method that the client can decode. The encoding method here usually refers to a certain compression method. In the Accept-Encoding header field, you can specify multiple data encoding methods, separated by commas. Specific examples: Accept-Encoding: gzip, compress gzip
and compress are the most common data encoding methods. Compressing and encoding larger entity content before transmitting it can save network bandwidth and transmission time. After receiving this request header, the server uses one of the specified formats to compress and encode the original document content, and then sends it to the client as the entity content of the response message, and indicates in the Content-Encoding response header where the entity content is. The compression encoding format used. After the browser receives such entity content, it needs to reversely decompress it.

(3)Host request header field

The Host header field is used to specify the host name and port number where the resource is located. The format is the same as the host name and port number in the full URL of the resource. Specific example: Host: www.lzy.cn: 80 is the port used by default when the browser connects to the server
. The port number is 80, so the port number information ":80" after "www.lzy.cn" can be omitted. It should be noted that in HTTP 1.1, each request message sent by the browser and other clients must include the Host request header field so that the web server can distinguish the virtual server that the client wants to access based on the host name in the Host header field. Website. When the browser accesses a website, it will automatically generate the corresponding Host request header based on the URL address in the address bar.

(4) If-Modified-Since request header field

The If-Modified-Since request header is similar to If-Mach, except that its value is the time in GMT format. The If-Modified-Since request header is regarded as a request condition. The server will return the document content only if the modification time of the document in the server is newer than the time specified by the If-Modified-Since request header. Otherwise, the server will return a 304 (Not Modified) status code to indicate that the document cached by the browser is the latest, without returning the document content to the browser. At this time, the browser will still use the previously cached document. In this way, the amount of communication data between the browser and the server can be reduced to a certain extent, thereby improving communication efficiency.

(5) Referer request header field

The request sent by the browser to the server may be made by directly entering the URL address in the browser, or it may be made by clicking a hyperlink on a web page. For the first situation where the URL address is entered directly into the browser address bar, the browser will not send the Referer request header. For the second case, for example, a page contains a hyperlink pointing to a remote server. When clicking this hyperlink to send a GET request to the server, the browser will include the Referer header field in the request message sent: Host: www.lzy.cn:80
The Referer header field is very useful and is often used by website administrators to track how website visitors navigate to the website. At the same time, the Referer header field can also be used to prevent hotlinking on the website.
What is hotlinking? Suppose that a website wants to display some image information on its homepage, but these image resources are not available in the server of the website. It links to image resources of other websites by using tags in the Html file and displays them to the viewer. This is Hotlink. Hotlinking websites increase the number of visits to their own websites, but increase the burden on the servers of the linked websites and damage their legitimate interests. Therefore, in order to protect its own resources, a website can use the Referer header to detect where to link to the current web page or resource. Once it detects access that is not through a link on this site, it can block access or jump to a specified page.

Please add image description

(6) The User-Agent request header field
User-Agent is called User Agent in Chinese, or UA for short. It is used to specify the operating system and version, browser and version, browser rendering engine, and browser used by the browser or other client programs. browser language, etc., so that the server returns different content for different types of browsers.
Example of User-Agent request information generated by Google Chrome: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 In the above request header,
User- The Agent header field first lists the Mozilla version, then the operating system version (Windows NT 10.0 means Windows 10), the browser's engine name (AppleWebKit/537.36), and the browser version (Chrome/110.0.0.0 Safari/ 537.36).

3. HTTP response information

(1) HTTP response status line

Goal: Become familiar with the 3 parts of the HTTP response status line

1. HTTP response status line

The HTTP response status line is located in the first line of the response message. It includes three parts, namely the HTTP version, an integer code indicating success or error (status code), and text information describing the status code.
Specific example of HTTP response status line: HTTP/1.1 200 OK
HTTP/1.1 is the protocol version used for communication, 200 is the status code, and OK is the status description, indicating that the client request is successful. It should be noted that each part in the request line needs to be separated by spaces, and it must end with a carriage return and line feed.

2. HTTP status code

The status code consists of 3 digits and indicates whether the request was understood or fulfilled. The first digit of the HTTP response status code defines the category of the response, and the next two digits have no specific classification. There are 5 possible values ​​for the first number.

3. Common status codes in web development

Status code description
200 indicates that the server successfully processed the client's request. The client's request is successful, and the response message returns a normal request result of
302, which means that the requested resource temporarily responds to the request from a different URI, but the requester should continue to use the original location for future requests. For example, in a request redirection, the temporary URI should be the resource pointed to by the Location header field of the response
304. If the client has a cached document, it will append an If-Modified-Since request header to the request message sent, indicating that only The server only needs to return a new document if the requested document has changed after the time specified by If-Modified-Since. Status code 304 indicates that the client's cached version is the latest and the client should continue to use it. Otherwise, the server will return the requested document with status code 200 and
404 indicating that the server cannot find the requested resource. For example, accessing a web page that does not exist on the server often returns this status code
500, which means that the server has an error and cannot handle the client's request. In most cases, errors occur in the server's CGI, ASP, JSP and other programs. Generally, the server will provide specific error information in the corresponding message.

2) HTTP response header

Goal: Familiarize yourself with HTTP response headers

1. HTTP response header

In the HTTP response message, the first line is the response status line, followed by several response headers. The server transmits additional information to the client through the response headers, including the service program name, the authentication method required for the requested resource, and the last part of the resource requested by the client. Modification time, redirect address and other information.
Specific examples of HTTP response headers

Server: Apache-Coyote/1.1 
Content-Encoding: gzip 
Content-Length: 80  
Content-Language: zh-cn 	 
Content-Type: text/html; charset=GB2312 
Last-Modified: Mon, 06 Jul 2020 07:47:47 GMT 
Expires: -1	
Cache-Control: no-cache 
Pragma: no-cache

2. HTTP response header field header field description

Accept-Range is used to indicate whether the server accepts the client's request for resources using the Range request header field.
Age is used to indicate the effective time that the current web page document can be cached in the client or proxy server. The set value is a time number in seconds
Etag Used to transmit tag information representing entity content characteristics to the client. These tag information are called entity tags. The entity tags of each version of the resource are different. Entity tags can be used to determine entities under the same resource path obtained at different times. Whether the content is the same
Location is used to notify the client to obtain the new address of the requested document. Its value is a URL address using an absolute path.
Retry-After can be used in conjunction with the 503 status code to tell the client at what time the request can be resent. It can also be used in conjunction with any 3xx status code to tell the client the minimum delay time for processing redirects. The value of the Retry-After header field can be a time in GMT format, or a time number in seconds.
Server is used to specify the name of the server software product.
Vary is used to specify those request header fields that affect the response content generated by the server. Named
WWW-Authenticate When the client accesses a password-protected web page file, the server will send back a 01 (Unauthrized) response status code and a WWW-Authenticate response header in the response message, indicating that the client should use WWW-Authoricate in the Authorization request header. The authentication method specified by the response header provides user name and password information.
Proxy-Authenticate The Proxy-Authenticate header field is for user information verification of the proxy server. Its usage is similar to the WWW-Authenticate header field. Refresh is
used to tell the browser the time to automatically refresh the page. Its value is a time number in seconds
Content-Disposition If the server hopes that the browser does not directly process the entity content of the response, but allows the user to choose to save the entity content of the response to a file, this requires the use of the Content-Disposition header field.

(1) Location response header field

The Location header field is used to notify the client to obtain the new address of the requested document, and its value is a URL address using an absolute path. Location response header field example: Location: http://www.lzy.edu.cn
The Location header field is used with most 3xx status codes to notify the client to automatically reconnect to the new address request document. Since the current response does not directly return content to the client, HTTP messages using the Location header should not have entity content. It can be seen that the two header fields Location and Content-Type cannot appear in the HTTP message header at the same time.

(2) Server response header field

The Server header field is used to specify the name of the server software product. Specific example: Server: Apache-Coyote/1.1

(3) Refresh response header field

The Refresh header field is used to tell the browser the time to automatically refresh the page. Its value is a number of seconds in seconds. Specific example: Refresh: 3. The Refresh header field shown above is used to
tell the browser to automatically refresh the page after 3 seconds. Refresh this page.
You can also add a URL parameter after the time value in the Refresh header field. The time value and the URL are separated by a semicolon (;), which is used to tell the browser to jump to other web pages after the specified time value. For example, tell the browser to jump to the www.itcast.cn website after 3 seconds. Specific example: Refresh: 3;url=http://www.lzy.edu.cn

(4) Content-Disposition response header field

The Content-Disposition header field is not defined in the HTTP standard specification. It is borrowed from RFC2183. In RFC2183, Content-Disposition specifies the way the receiving program processes data content. There are two standard methods, inline and attachment. Inline means direct processing, while attachment requires the user to intervene and control the way the receiving program processes data content. In HTTP applications, only attachment is the standard method of Content-Disposition. The filename parameter can also be specified after attachment. The filename parameter value is the file name that the server recommends the browser to use to save the entity content. The browser should ignore the directory part of the filename parameter value and only take the last part of the parameter as the file name. Before setting Content-Disposition, be sure to set the Content-Type header field.

Content-Type: application/octet-stream
Content-Disposition: attachment; filename=lee.zip

Guess you like

Origin blog.csdn.net/qq_41301333/article/details/131202248