[Computer Network] Application Layer Protocol HTTP

Preface

Through the previous study, we already know that an agreement is actually an agreement, which requires both parties to understand the other party's message. The protocol on the application layer does not belong to the operating system. It is customized by ourselves. As long as both parties can understand it, let’s learn about the HTTP protocol today.

URL (web address)

What is url:
Insert image description here

urlencode and urldecode

URL is not just designed for http. URL is expected to be used by all network protocols. However, the protocol stipulates that non-ASCII characters cannot appear, so non-characters must be escaped. For example, in the figure, /?: etc. have been treated as special characters. If you want to represent these characters separately, you must escape the special characters first.

Rules:
Convert the characters that need to be transcoded to hexadecimal.
Make one for every two digits, add % in front, coded as %XY

For example:
your hexadecimal is 0xE4BDA0 (utf-8)
%E4%BD%A0

Http format

  • ask
    Insert image description here

  • First line: method + url + version

  • Header: The requested attribute, a colon-separated key-value pair; each group is separated by \n; a blank line indicates the end

    • Connection: long and short connections. Long connections allow multiple requests to be received; short connections are disconnected after responding.
  • Body: The blank line is followed by the content of the body. The Body is allowed to be set to an empty string; if the body exists, there will be a Content-Length in the Header to identify the length of the body.

  • response

Insert image description here

  • First line: version number + status code + status code explanation
  • Header: The attribute of the response, which is also a key-value pair, the rules are the same as above
  • Body: The body is below the blank line. If the server returns an interface, the html page is in the body.

method

GET and POST are the most common, where GET acquires resources and POST transmits resource entities.

<form action = "a/c.exe",method="GET">
    姓名:<input type="text" name="myname" value="输入姓名"><br/>
    密码:<input type="text" name="mypass" value=""><br/>
    
    <input type="submit" value="submit"><br/>
</form>

The difference between GET and POST methods:
Insert image description here


Insert image description here
The difference between GET and POST:

  • Data: GET puts the data in the URL, and POST puts it directly in the text.
  • Size: GET is limited by the URL size and cannot be too long. POST can be longer.
  • Security: GET is directly exposed in the URL and is not suitable for transmitting sensitive data, while POST is relatively safer.

Http status code

  • 1XX: Information status code, the received request is being processed
  • 2XX: Success status code, the request is processed normally.
  • 3XX: Redirect status code: Additional action required to complete the request
  • 4XX: Client error code: The server cannot process the request
  • 5XX: Server error code: The server encountered an error in processing the request.

Redirect

Temporary redirection: Does not change any address information of the browser.
Permanent redirection: Permanent redirection will change the browser's local bookmarks.

Redirect to qq's homepage:

    std::string response;
    response += "HTTP/1.0 302 Found" + SEP;
    response += "Location: https://www.qq.com/" + SEP;
    response += SEP;

The main usage scenarios of temporary redirection include: maintenance of old websites, active advertising jumps, etc.
Permanent redirection is not common in our use. Search engines need to periodically crawl data from the entire network. If a permanently redirected website is crawled, He will modify the corresponding jump directly.

Http common headers

  • Content-Type: type of request
  • Content-Length: Body length.
  • Host: The client tells the server which port the requested resource is on which host
  • User-Agent: declare the user's operating system and browser version (anti-crawler-fake information)
  • referer: which page the current page is redirected from
  • location: used with 3 to write a redirection status code
  • Cookie: Store a small amount of information on the client side for session functionality

session hold

HTTP itself is stateless - accesses cannot be remembered. http is not directly involved, but the user needs to maintain the session, so whether the user is online must be recorded.

Cookies are a technology used to cache our user information. The browser will automatically bring the cookie information we have saved and send it to the corresponding website. However, doing so is risky: we directly expose the user information to the outside world. If this cookie information is obtained by an intermediary, not only will the website's account be stolen, but personal information will also be extremely likely to be stolen.

The current common practice is for the server to maintain your information uniformly. After we log in and verify, the server will form a session object and return the sessionid (unique) to our client. The client's cookie file only stores this sessionid. In this way, even if you are intercepted in the middle, you don't have to worry about user information being leaked.

At the same time, the server will also do some identification work, such as detecting whether your IP is abnormal, whether the data is abnormal, etc... Based on these abnormalities, your session will be invalid.

Conclusion

Despite this, the Http protocol is still unsafe. Although post requests avoid browser cache and cannot be directly shared and bookmarked; cookies and sessions also try their best to ensure user information security, but there are still many security issues: post requests are captured by the network, When logging in, it was intercepted by a middleman when logging in. In order to solve these security problems, we need to introduce a new protocol: HTTPS.

Guess you like

Origin blog.csdn.net/m0_73209194/article/details/132157688