Detailed explanation of HTTP protocol (1)

1. HTTP overview

1. What is HTTP

Hyper Text Transfer Protocol (HTTP)

HTTP是一个基于请求与响应的、无状态的应用层协议,用于在两点之间传输超文本数据,常基于TCP/IP协议传输数据

HTTP was designed to provide 发布和接收HTML页面的方法a

2. Characteristics of HTTP

  • Based on TCP/IP protocol: The connection can be guaranteed to be safe and reliable, but the content cannot be guaranteed to be safe and private.
  • Based on the request response model: one request corresponds to one response
  • Stateless: each request is independent of each other

3. Advantages and Disadvantages of HTTP

advantage:

  • Simple: The HTTP message format is header+body, and the header information is also in a simple key-value format, which is easy to parse.
  • Flexible and easy to expand: HTTP works at the application layer, and the lower layer can be changed at will. Moreover, HTTP request methods, status codes, header fields and other components can be customized, plug and play.
  • Versatile and cross-platform

Disadvantages are all double-edged swords:

  • no status:
    • Benefits: The server does not need to save the status of each HTTP connection, reducing the pressure on the server and using all resources to provide services.
    • Disadvantages: It is more troublesome to complete related operations. You can use Cookie and Session to manually maintain the state.
  • Clear text transmission
    • Benefits: Convenient for analysis and debugging
    • Disadvantages: It is easy for others to obtain the content of the request or response, which is unsafe

The biggest disadvantage of HTTP is that it is insecure, which is reflected in:

  • Communication uses clear text and is not encrypted, so the content may be stolen.
  • The identity of the communicating parties will not be verified, and disguised sites may be accessed.
  • The integrity and correctness of the message cannot be verified, and the communication content may be tampered with.

The security issues of HTTP have been solved on HTTPS, and the SSL/TSL layer has been introduced to achieve ultimate security.

4. Why is HTTP needed after TCP?

  • Layered structure, TCP is responsible for end-to-end transmission, and HTTP focuses on the logic of the application layer. Upper-layer services should not be strongly coupled to underlying protocols, as protocols will be upgraded and eliminated.
  • TCP is already responsible for many things. Adding fields and functions to it will make TCP too large and difficult to upgrade and maintain.
  • Developing network applications directly for TCP is not conducive to business-focused development and may be mired in the details of the protocol.

2. HTTP request

1. HTTP request message composition

The HTTP request message consists of 4 parts ( 请求行+请求头+请求空行+请求体)

The following is a request message for the POST method:

1. Request line

请求行分为三个部分:请求方法、请求地址URL和HTTP协议版本, separated by spaces. For example, GET /index.html HTTP/1.1.

1. Request method

There are 8 request methods defined by HTTP/1.1:

The two most commonly used are GETand POST.

2、URL

URI: Uniform Resource Identifier, uniquely identifies a resource on the network

URL: 统一资源定位符, is an abstract unique identification method of resource location.

Composition: <protocol>://<host>:<port>/<path>

The port and path can sometimes be omitted (the default port number for HTTP is 80)

URL is a subset of URI, which not only uniquely identifies a resource, but also provides a way to access the resource. 

3. Protocol version

The format of the protocol version is: HTTP/major version number.minor version number. Commonly used ones are HTTP/1.0, HTTP/1.1, HTTP/2, and HTTP/3.

2. Request header

The request header adds some additional information to the request message. It consists of key-value pairs, one pair per line, and the name and value are separated by colons.

3. Request a blank line

There will be a blank line at the end of the request header, indicating the end of the request header, followed by the request data.

4. Request body

The request data is not used in the GET method, but in the POST method. POST方法适用于需要客户填写表单的场合.

The most commonly used request headers related to request data are Content-Type and Content-Length.

2. GET and POST

1. The difference between the two

GET

  • The meaning of GET is to obtain the specified resource from the server, requiring the server to put the target resource in the data part of the response message and send it to the client.

  • A GET request does not include the request body. It is separated by a question mark after the URL, and then the request parameters and corresponding values ​​are directly appended to the end. The disadvantages of this method are:

    • The parameter length is limited and large amounts of data cannot be transferred.
    • Not suitable for transmitting private data
  • GET can also have a request body, but RFC defines that the function of GET is to request resources, so according to this semantics, GET does not require a request body.

POST

  • The meaning of POST is to process the specified resources of the server accordingly based on the content of the request body.
  • The benefits of encapsulating request parameters in the request body in the form of key-value pairs are:
    • The browser does not limit the size of the request body and can transmit large amounts of data.
    • The data will not be displayed directly in the URL, which is relatively safe. But HTTP is a clear text transmission, you can see it by grabbing the packet.
  • POST URLs can also take parameters, but generally no one does this.

2. Are they safe and idempotent?

Safe: Requests will not destroy server resources

Idempotent: The same request is made multiple times, and the returned results are the same.

in conclusion:

  • The requests themselves can't reflect anything, it depends on what they actually do.

  • The GET method is generally used to read data and is read-only, so it is safe and idempotent.

    • Therefore, the response to the GET request can be cached. The response content can be cached in the browser or the proxy server.
  • POST is generally used to add or modify data, so it is unsafe and not idempotent.

    • Therefore, browsers generally do not cache responses to POST requests.
  • However, in actual development, GET requests can be used to modify and delete data, and POST can also be used to query data.

    Then GET is not safe and idempotent at this time, and POST is safe and idempotent, so it still needs to be determined based on actual use.

3. HTTP response

1. What is a response?

响应内容是服务器返回给浏览器的内容

When the server receives the browser's request, it will send a response message to the browser, and the browser will display it based on the response content.

2. Response message format

The HTTP response is similar to the HTTP request and also consists of 4 parts:

  1. response line
  2. response header
  3. Respond to empty lines
  4. response body

Look at a simple response message:

Response line:

HTTP/1.1 200 OK

Response header:

Server: Apache-Coyote/1.1

Content-Type: text/html;charset=UTF-8

Content-Length: 624

Date: Mon, 03 Nov 2014 06:37:28 GMT

Respond to empty lines

response body

Analysis:

Response line:

  • HTTP/1.1 200 OK: The response protocol is HTTP1.1 and the status code is 200, indicating that the request is successful. OK is the explanation of the status code;

Response header:

  • Server: Apache-Coyote/1.1: Server version information;
  • Content-Type: text/html;charset=UTF-8: The response body is an html file, and the encoding used is UTF-8;
  • Content-Length: 624: The response body is 724 bytes;
  • Set-Cookie: JSESSIONID=C97E2B4C55553EAB46079A4F263435A4; Path=/hello: Cookie responded to the client;
  • Date: Mon, 03 Nov 2014 06:37:28 GMT: Response time, which may have an 8-hour time zone difference;

Respond to an empty line:

  • Like the request blank line, its purpose is to separate the response header and response body.

Response body:

  • The response body is sometimes an html file that the browser can access directly.
  • If a jsp page is accessed, the response returned is also an html file. 服务器将该jsp翻译成了一个html, and then respond to the browser.
  • 响应体的类型,由响应头的Content-Type指出

3. Common fields of HTTP

1. Request fields

  • Host: When the client sends a request, it is used to specify the domain name of the server.
  • Accept: The client tells the server what format of data it receives.
    • Accept: */* , which means any format is acceptable
  • Accept-Encoding: The client tells the server the compression methods it supports.
    • Accept-Encoding: gzip, deflate
  • User-Agent: The client tells the server its browser information
  • Referer: The client tells the server the source address of the current request, which can be used to prevent hotlinking or make statistics.
  • Connection: Use TCP long connection

2. Response field

  • Content-Type: The server tells the client the format of this response
    • Content-Type: text/html; charset=utf-8
  • Content-Length: The byte length of this response
    • Content-Length: 1000
  • Content-Encoding: Specifies the compression method of data returned by the server
    • Content-Encoding: gzip
  • Connection: Use TCP long connection
    • Connection: keep-alive
  • Set-Cookie: Cookie sent by the server to the client

3. Status code of response line

The response status code consists of three digits, 表示服务器对请求的响应结果.

It is equivalent to a unilateral secret code between the server and the browser. When the browser receives these three digits, it understands the meaning of the server.

For HTTP response status codes 第一个数字定义了响应的类别, the last two digits have no specific classification. The first number has five possible values. The specific introduction is as follows:

Classification Classification description
1xx Prompt message, the server has received the request, but needs the client to continue performing the operation
2xx Success, the message was successfully received and processed
3xx Redirect, the resource location changes, and the client needs to resend the request.
4xx An error occurred on the client. The request message sent was incorrect and the server could not process it.
5xx A server error occurred. An error occurred while the server was processing the request.

Commonly used specific status codes are as follows:

status code meaning
200 请求成功, the browser will display the response information on the browser side.
302 expressed 重定向. For example, when a browser accesses a resource, the server responds to the browser with a 302 status code and sends a new URL through the response header Location, telling the browser to request this URL. This is redirection.
304 After accessing a resource for the first time, the browser will cache the resource locally. 再访问该资源时If the resource has not changed for the second time, the server will respond to the browser with a 304 status code and tell the browser 使用本地缓存的资源.
404 A type of client error, such as on the browser side 请求一个不存在的资源, when a 404 status code will appear on the browser side.
405 A type of client error, indicated 当前的请求方式不支持. For example, if the server only processes GET requests, and the client's request is in POST mode, a 405 status code will appear at this time.
500 服务器端错误, for example, if an exception such as a null pointer occurs in the server-side code, the browser will receive a 500 status code sent by the server.

Detailed explanation of 300 series

  • 301: Moved permanently, the server automatically forwards the request to the new address
  • 302: Redirect, the server returns a new address to the browser, and the browser visits this address again to obtain data.
  • 304: The server tells the browser that the resource has not been modified and uses local cache.

4. Response header

Format:

  • Response header name: value
  • For example, Server: Apache-Coyote/1.1

Commonly used response headers:

Response header name meaning
Content-Type The server tells the client the data format and encoding format (character set) of this response body.
Content-Disposition The server tells the client in what format to open the response body data.

Detailed explanation:

  • Content-Disposition
    • The default value is in-line, which opens on the current page.
    • It can also be set to attachment to open the response body as an attachment for file downloading.

5. Response body

The response body is the transmitted data.

4. Different ways of resource jump

请求转发(forward) and 请求重定向(redirect) 定时刷新can both realize resource jump, but they have some differences.

1. Differences in resource jump methods

  • Request forwarding :

    • One request, one response

    • The address bar remains unchanged

    • The same application can only be redirected within the server and cannot be forwarded to other servers.

  • Request redirection :

    • Two requests, two responses, different request objects (the server will create a new request object each time it is requested)

    • The address bar has changed

    • You can jump within the server or to different applications on different servers.

  • Regular refresh :

    • Two requests, two responses, different request objects
    • The address bar has changed
    • It can be used for resource jump within the server, and can also be used for resource jump between different applications and different servers.
    • The difference between scheduled refresh and request redirection is 定时刷新可以设置时间间隔that, for example, to implement the operation of "successful login, return to the homepage after 5 seconds".

2. Implementation process of resource jump method

One sentence summary: 转发是服务器行为,重定向是客户端行为. Why do you say this? It depends on the workflow of two actions:

Forwarding process:

  1. The client browser sends an http request
  2. The web server accepts this request
  3. 服务器调用内部的一个方法在容器内部完成请求处理和转发动作
  4. Send target resources to customers

What is displayed in the path bar of the customer's browser is still the path of the first visit, which means that the customer cannot feel that the server has forwarded it.

The forwarding behavior means that the browser only makes one access request.

Redirection process:

  1. The client browser sends an http request
  2. After the web server accepts it, it sends a 302 status code and the corresponding new location to the client browser.
  3. 客户浏览器发现是302响应,则自动再发送一个新的http请求, the request url is the new location address
  4. 服务器根据此请求寻找资源并发送给客户端

Since the browser reissued the request, there is no concept of request delivery.

The redirected path is displayed in the path bar of the customer's browser, and customers can observe the address change.

In the redirection behavior, the browser makes at least two access requests.

在重定向的过程中,传输的信息会被丢失

3. Functional differences between forwarding and redirection

Forwarding: The client submits a request, and the server jumps to multiple addresses based on internal logic and returns the final result to the client.

Redirect: The client submits a request, and the server returns a 302 status code and a new address, requiring the client to make another request. After the client requests a new address, the server returns the final result.

To put it simply, forwarding is something internal to the server. For example, A is halfway through a task, sends it to B to continue, and finally completes it. For the client, it only knows the A that it requested earliest, but does not know the B in the middle, or even C and D.

Redirection means that the request is sent to A. A determines that the matter cannot be completed and requires the client to access B again. B finds that it can be done and it is completed. At this time, the browser can see that the address has changed, and the historical back button is also lit. Redirects can access resources outside your own web application.

Examples from real life:

Suppose you apply for a certain license,

Forward: You went to Bureau A first. After looking at it, Bureau A realized that this matter should actually be handled by Bureau B. However, he did not send you back. Instead, he asked you to sit for a while and went to the back office to contact the person from B. After asking them to do it, they sent it over.

Redirect: You went to bureau A first, and the people in bureau A said, "This matter is not our responsibility, go to bureau B." Then you backed out of bureau A and took a bus to bureau B yourself.

Guess you like

Origin blog.csdn.net/m0_62609939/article/details/131852714