A thorough understanding of the Http protocol

Http protocol

1. Initial Http

The Http protocol is the most widely used protocol in the application layer. Http is the bridge between the browser and the server. Http is implemented based on the TCP protocol.

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-9HGwWb3V-1681167552380) (C:/Users/86178/AppData/Roaming/Typora/typora-user-images/ image-20230402150600543.png)]

Usually we input the website address (URL) in the search box, and the browser will construct an Http request based on the URL and send it to the server. The server will return an Http response (including html, css, js), and the browser will get the The html and other data are displayed (rendered), which is why http is called the hypertext transfer protocol , because the transmission is not just text.


2. fiddler packet capture

For the detailed interaction process of the http protocol, you can use the third-party tool fiddler to capture packets.

Fiddler is essentially a proxy program, precautions when using:

  1. It may conflict with other proxy programs, and other proxy programs (including some browser plug-ins) should be closed when using
  2. If you want to capture packets correctly, you need to enable the htpps function. At present, most servers on the Internet are https. Fiddler cannot capture https packets by default. You need to manually start https and install the certificate.

Open csdn, fiddler will grab a lot of requests, usually the blue is the html homepage, the green is, and the black is simply returned data. When the browser interprets and executes html and js, it will send one when it encounters a request.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-1Yf8qzM9-1681167552381)(C:/Users/86178/AppData/Roaming/Typora/typora-user-images/ image-20230402153932443.png)]

The http request has a certain format. Fiddler will parse it according to the format, and it will show different effects. Click raw to see the most original effect. View in Notepad to see a more detailed page.
insert image description here

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-ReedsO3E-1681167552381)(https://gitee.com/liu-xuixui/clouding/raw/master/img /image-20230402154850128.png)]

Observing the packet capture results, we can see that the current http request is actually aline textformatted data.

The response data is also text, but some servers will compress the response. (In order to save bandwidth)

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-mB8PmlF1-1681167552381)(C:/Users/86178/AppData/Roaming/Typora/typora-user-images/ image-20230402155556420.png)]

After manual decompression, we can see the text data of the csdn homepage, that is, the content of html.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-BE39vsTO-1681167552381)(C:/Users/86178/AppData/Roaming/Typora/typora-user-images/ image-20230402155848921.png)]


3. Http message format

To learn a protocol is essentially to understand its message format.

1. Http request

An http request can be divided into 4 parts:

  1. first line
  2. request header
  3. blank line
  4. body

first line

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-QCqdA4c3-1681167552382)(https://gitee.com/liu-xuixui/clouding/raw/master/img /image-20230402160140301.png)]

The first line contains three parts. Spaces are used to distinguish them.

  • GET: http method (method)

  • URL: That is, a unique resource locator. It identifies the location of a unique resource on the Internet (which file is in which directory of which server), and URI is a unique resource identifier, in order to distinguish it from other resources. In fact, URL can also be regarded as It is a URI. It is often used mixedly in development. URL is not exclusive to http. Many protocols can use URL.

  • Version number: HTTP/1.1


Know the URL

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-vaveWDUw-1681167552382) (C:/Users/86178/AppData/Roaming/Typora/typora-user-images/ image-20230402161403398.png)]

For example: Suppose I rent a stall in the school cafeteria to sell Chongqing noodles

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-mf2u2jmK-1681167552382) (C:/Users/86178/AppData/Roaming/Typora/typora-user-images/ image-20230402165600981.png)]

Some parts of a URL can be omitted:

For example:

  • The port number can be omitted, and the browser provides the default port. For http, the default port is80, the default port for https is443.

  • / represents the root directory of the http server, and the http server is a process on the system. So the server is entrusted with a specific directory on the management system, and the resources in this directory can be accessed by the outside world. (The root directory managed by the server can be anywhere on the system, depending on the server configuration)

  • The query string is also optional.

The query string starts with ? and is organized in the form of key-value pairs. The key-value is separated by **& , and the key and value is separated by =**. Sometimes some characters in the URL are defined by characteristics, so you need to modify the content Re-encoding, usually using urlencode (escape character), if you write Chinese without encoding, the browser may not recognize it.


Know the method (Method)

insert image description here

In actual development, most of the methods here are not used. The most common ones are GET and POST.

GET trigger scenario:

  1. Enter the URL directly in the browser address bar
  2. link , script , img , a... and other tags in html
  3. Construct the get request through js.

The difference between GET and POST:

  1. If it is a GET request, there is no body. For a POST request, there is a body. The body of the POST is the content customized by the programmer.
  2. GET sends messages to the server and generally stores them in the quert string, while POST sends messages through the body.
  3. GET requests generally obtain data from the server, and POST is generally used to submit data to the server.
  4. GET is usually idempotent, and POST does not require it. (The same input, the result is also deterministic )
  5. GET can be cached, but POST generally cannot be cached. (The premise of caching is idempotent)

In fact, the difference between GET and POST is just a common usage, and they can be replaced by each other in many scenarios.


know the header

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-oJFh33yD-1681167552382)(C:/Users/86178/AppData/Roaming/Typora/typora-user-images/ image-20230402195614835.png)]

The key-value pairs in the header are pre-defined by http and have specific meanings.

  • HOST: Describes the address and port where the server is located , and is used to describe the target you want to visit. Usually, the content is the same as the URL.

  • Content-Length: Indicates the length of data in the body.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-I8VzgZ9s-1681167552382)(C:/Users/86178/AppData/Roaming/Typora/typora-user-images/ image-20230402200059839.png)]

  • Content-type: Indicates the data format in the body of the request. Common formats include json and form.

  • User-Agent (UA for short)

insert image description here

Describes the version of the browser and operating system. Early browsers only supported text, and later supported various pictures, audio, js... For website developers, whether to support these new functions when developing web pages is a problem, and later proposed a solution. That is to release different versions to apply to various browsers. This problem can be solved according to User-Agent. Later, the difference between browsers is small, and User-Agent is mainly used to distinguish whether it is a mobile terminal or a PC terminal.

  • Refer: Indicates the "source" of the current page. If you search directly in the address bar, favorites, etc., there will be no Refer.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-wG2aD2WK-1681167552383)(C:/Users/86178/AppData/Roaming/Typora/typora-user-images/ image-20230402204445933.png)]

Advertising billing service, the advertiser’s page will be transferred to many other websites. In order to better settle and bill, the advertiser only needs to record the log through the refer. However, http itself is transmitted in plain text, which will be hijacked by the operator and the refer tampered with other.

  • Cookie: Essentially, it is a local storage data mechanism provided by the browser to the webpage. In order to ensure security, the webpage does not allow access to the local hard disk of the computer by default. The cookie browser has made clear restrictions on accessing the hard disk, organized by key-value pairs data.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-amgINwYc-1681167552383)(https://gitee.com/liu-xuixui/clouding/raw/master/img /image-20230402204537742.png)]

The specific content stored in the cookie is defined by the programmer. Here is the meaning of the data, which can only be known by the developing programmer.

Where does the cookie come from? The data in the cookie comes from the server, and the server will determine what to save in the browser's cookie through the ==http response header ==part (set-cookie field).

Where does the cookie exist? It can be considered as existing on the hard disk. When the cookie is stored, it is subdivided according to the latitude of the browser + domain name . Different browsers store their own cookies, and different domain names of the same browser correspond to Different cookies. At the same time, cookies also have an expiration time, eg: many websites will automatically record the login status once they log in.

Where does the cookie go? The client will use the cookie to save the intermediate state used by the user. When the client accesses the browser, it will automatically bring the content in the cookie into the request, and the server will know the state of the client. Cookie What is stored in it is often "the context" In this state, when the browser saves the cookie, it will automatically bring it when it sends a request to the server later. The cookie is like a storage place set up by the server on the browser side.


Recognition request body

The content in the body is closely related to the Content-Type in the header, and the following three types are common:

  1. application/x-www-form-urlencode
  2. multipart/form-data
  3. application/json

2. Http response

The response consists of four parts:

  • 1. First line

  • 2.header
  • 3. A space indicates the end tag of the header
  • 4.body

Http Status Code: Describes the result of this response. (Success? Failure? What is the reason for the failure?)

common:

  • 200 ok succeeded.

  • 404 NotFound The accessed resource does not exist and cannot be found on the server.

  • 403 Forbidden access denied (no permission)

  • 302 Move temporarily redirect, the old domain name jumps to the new domain name. A response message like 302 will carry a Location attribute in the header, and use this attribute to describe which address to jump to.

Redirection: is the mechanism provided by http
insert image description here

Request conversion: is a mechanism provided in spring and servlet

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-4ZMqNscV-1681167552383)(C:/Users/86178/AppData/Roaming/Typora/typora-user-images/ image-20230403093731905.png)]

The difference between redirection and request forwarding:

Redirection can be redirected to external resources (jump to other websites), request forwarding can only be forwarded between resources within the service, and one less interaction is more efficient.

  • 500 internal server error (the server code threw an exception)
  • 504 gateway timeout (the response time is too long, the browser can't wait)

gateway Gateway represents the entrance/exit of a network. If you want to access the content in a server, you need to go through the gateway first, and it is usually used to refer to the entrance server of a computer room.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-QKaD0zru-1681167552384)(C:/Users/86178/AppData/Roaming/Typora/typora-user-images/ image-20230403091004101.png)]

To sum up: 2** success, 3** redirection, 4** client error, 5** server error


Know the response header (header)

The format of the response header is basically the same as that of the request header. The meanings of attributes such as Content-Type and Content-Length are basically the same as those of the request.

Know the response "body" (body)

The format of the body depends on the Content-Type

Since the return response will pass html, css, js, pictures, etc. Therefore, there will be several more data formats:

  • text/html: body data format is html
  • text/css: body data format is CSS

Summary of http protocol message format:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-REthwgkx-1681167552384)(https://gitee.com/liu-xuixui/clouding/raw/master/img /image-20230403100530328.png)]

Guess you like

Origin blog.csdn.net/liu_xuixui/article/details/130073906