Basic format of HTTP protocol

HTTP stands for HyperText Transfer Protocol (Hypertext Transfer Protocol), and HTTP transmits data based on the TCP/IP protocol.

Chrome capture

Note: Chrome browser or Chrome kernel browser is available (such as Edge, Firefox)

  1. Open the developer tools on the landing page:
    open developer tools
  2. Click Network:
    insert image description here
    At this point, you can see all the requests for the target page.
  3. View requests and responses:request and response

Fiddler proxy packet capture

The browser's built-in packet capture tool has limited functions. In the actual development process, some proxy tools are often selected for packet capture.
Common packet capture tools mainly include: Charles, Wireshark, Tcpdump, Fiddler, etc. For starters, use the simple and free Fiddler (Fiddler is only available for Windows systems).

Fiddler does not support crawling HTTPS by default, so it needs to be set first: First click Tools-options-HTTPS to set:

insert image description here
insert image description here
Then Fiddler can capture packets.

For example, to capture the HTML page request of Bilibili:
first open the bilibili website, the package to be captured is:
insert image description here
so many packages, how to determine which one we are looking for?

  1. First look at the URL, which is the target domain name.
  2. Different types of packages have different colors. Here we are grabbing HTML page packages, so look for blue ones.
  3. Look at the body size, the requested HTML page, the body size is generally very large.

After finding the target, double-click to see the details:
insert image description here

HTTP protocol format

HTTP is a request-response protocol, where the client (ie browser) initiates a request and the server returns a response. You can analyze the request and response details and characteristics of the HTTP protocol by capturing packets.
The packet capture tool can use Chrome developer tools or Fiddler.

HTTP request

insert image description here
For Get requests, there is generally no body, but Post requests generally have a body, and the part after the blank line is the body content.
For example, a Post request:
insert image description here
Although the body can be placed in the Get request, it is generally not recommended. Some servers or proxies will ignore or delete the body in the get method, and some clients do not support adding the body to the get request. Therefore, it is best to use a post request to carry the body.

first line

The first line includes the method, URL, and version number.


URL (Uniform Resource Locator), that is, Uniform Resource Locator , commonly known as "URL". Every file on the Internet has a unique URL.
A URL usually includes: protocol, host, port, path, query parameters, anchor.
insert image description here


Common methods of the HTTP protocol are:
insert image description here

Get method

GET is the most commonly used HTTP method, used to obtain a resource from the server. Most of the captured packages are Get requests.

Take the Baidu homepage as an example:
insert image description here
Features :

  • Query parameters in the URL may or may not be empty
  • Several attributes in the Header exist in the form of key-value pairs
  • The body in the Get request is generally empty
Post method

insert image description here
Features :

  • Query parameters in the URL are generally empty
  • Several attributes in the Header exist in the form of key-value pairs
  • The body part is generally not empty
  • The data format in the body is specified by the Content-Type in the header
  • The length of the body is specified by Content-Length in the header
The difference between Get and Post

There is no essential difference between get and post. Get and post are generally interchangeable, but there are some differences between the two:

  • get is generally used for acquisition, and post is generally used for submission
  • The get method obtains data through query parameters, and the body is generally empty
  • The post method submits data through the body, and the query parameters are generally empty
  • Get is generally idempotent, and post is generally not idempotent (every time the same request returns the same response, it is considered idempotent)
  • Get can be cached (because of idempotency), post cannot be cached

Attributes in the request header

  • Host: Indicates the address and port of the server host

Obviously, the URL already contains the address and port of the server, so why set up a Host?
In fact, one server can host multiple websites. These websites share the same IP address and port number, but they have different domain names. At this time, the role of Host is reflected. The Host attribute often uses the domain name as a field, which is easy for the server to identify.
For example, website A: and website B: are hosted by the same server, so the ip address and port number in their URLs are the same. Regardless of whether the request is initiated by website A or website B, the same IP address and port number will be obtained after domain name resolution through the DNS protocol. At this time, the Host attribute is needed to identify whether it is a request from website A or website B.

  • User-Agent: Indicates some information about the browser or operating system, including type, version, language, etc.
    The role of the User-Agent is to allow the server to identify the type and capabilities of the client and return an appropriate response.
    For example, the 4399 mini-game needs a Flash plug-in to run normally, but currently the browser prohibits its own Flash function, so after opening a mini-game, it will display: At the same time, User-Agent is also used to indicate whether the client is a mobile phone
    insert image description here
    or a web page, which is also Explains why the same website has different page layouts when it is opened with a mobile phone and when it is opened with a computer.

  • Content-Length: Indicates the data length of the body

  • Content-Type: Indicates that there are many types of content-type in the data format in the body
    , generally divided into the following categories:

    1. text/plain, indicates the plain text type . It will put the data in the form directly into the request body. This format is suitable for transferring simple text data.

    2. The one starting with application indicates the application type . For example: application/json indicates the JSON data format, and application/x-www-form-urlencoded indicates the form data format.

    3. Starting with mutipart means a multi-part type . For example: multipart/form-data indicates the form data format.

      application/x-www-form-urlencoded is the default form data format, suitable for most scenarios, but cannot upload files and binary data .
      multipart/form-data is a multi-part data format, which divides the data in the form into multiple parts, each part is separated by a specific symbol, and each part can have its own Content-Type and encoding method . File and binary data can be uploaded .
      multipaty/form-data will take up more resources and bandwidth, so multipart/form-data is only used when uploading files or binary data, and application/x-www-form-urlencoded is used in other cases.

  • Referer: Indicates which page the page is redirected from.
    For example, if you jump from Baidu to station B, the captured package is:
    insert image description here
    the refer here means that station B was redirected from Baidu.
    When we search on search engines, there will always be advertisements, which are delivered by advertisers. Advertisers increase exposure through searches: advertisers
    insert image description here
    confirm transaction volume through users clicking on links. But advertisers will not only deliver ads on one search engine, so refer can help determine the source.

Cookies and Sessions

The HTTP protocol is a stateless protocol. Stateless means that the HTTP server has no memory function. Every time the client sends a request to the server, the server cannot determine whether the request is related to the previous request. After purchasing a product, when preparing for subsequent operations, the server cannot determine whether this request is related to the previous request. The reason for this design is to simplify the processing logic of the server and save the cost of network transmission, but it is not convenient for users to operate, so Cookie and Session are introduced to make up for this part of the defect.

  • Cookie
    Cookie is a mechanism for storing user information on the client side . After the client sends a request, the server returns a cookie to the client in addition to the response. The cookie stores the current client status information, such as login status. When the client sends a request next time, it will send the cookie together, and the server can determine the status of the client based on the cookie.
  • Session
    Cookies often contain some sensitive information, which is transparent during transmission, is not safe, and is easy to be stolen and tampered with. Session is a more secure approach.
    Session is a mechanism for storing user information on the server . After the client sends a request, the server will generate Cookie information and a Session ID in addition to returning a response. The server returns the Session ID to the client, and the subsequent client initiates a request. , the Session ID will be sent together, and the server can find the corresponding Cookie information through this Session ID, so as to determine the status of the client.

The difference between Cookie and Session

  • The storage location is different : the cookie is stored on the client, and the session is stored on the server.
  • Security is different : Cookie is insecure, and Session is relatively secure.
  • Different performance : Cookies occupy client resources and affect browsing experience; Sessions occupy server resources and affect performance.
  • The expiration date is different : the cookie can be set with an expiration date, and it will be automatically deleted when it expires; if the session has no expiration date set, it will be deleted when the browser is closed.

HTTP response

HTTP status code

  • Common status codes are:
status code meaning
200 OK (normal)
404 Not Found (no resource found)
403 Forbidden (access is denied)
405 Method Not Allowed (method not supported)
500 Internal Server Error
502 Bad Gateway
504 Gateway Timeout (response timeout)
302 Move temporarily (temporarily redirected)
301 Moved Permanently
  • Status code summary:
    insert image description here

HTTP response headers and body

HTTP response headers

Common properties of response headers are:

  1. text/html, this format converts the response data into an HTML format document, which is suitable for returning web page data, but additional encoding format is required.
    For example: Content-Type=text/html; charset=utf8
  2. text/xml, convert the response data into an xml format document, this format is suitable for returning structured data, also need to set the encoding format or declare the encoding format in the xml document.
  3. The one starting with image indicates the image type, and converts the response data into an image display.

In addition, there are many response attributes such as: text/css, text/javascript, etc.

HTTP response body

The exact format of the response body depends on the Content-Type attribute.
insert image description here

Guess you like