[Application layer] Http protocol summary

Article directory

  • 1. Continued -> Http protocol learning
  •        1. The get method and post method in the http request
  •        2. HTTP status code
  •        3. http header
  •        4. Long links
  •        5.cookie (session persistence)
  • Summarize


Continuing with the content of the previous article:

At the end of the last article, we talked about the root directory of the web. We know that resources in a web page are stored in the root directory of the web. When we used http in the last article, we directly gave the response text a hard link. In fact, this The practice is not standardized:

 Let's redesign the body part:

First of all, we write a function to read the file. We will put the resources of the text in the file in the future, so it is good to read the data directly in the file.

 Our first parameter is the file path to read, and the second parameter is an output parameter. In the future, the read data will be placed in out. When reading here, we can use any reading method, and then we judge Whether to open the file successfully, if successful, read each line of data from the file into a new string object, then let out add the read data, and close the file before returning.

Because when we read data in a certain path, the reading may fail because the user enters a wrong path. At this time, we should return 404, and 404 also exists in the root directory, so we create a 404 file:

 Then we define a string object to save this path:

Then we modified the text part. We actually still have problems reading data in readFile. We will talk about this problem later.

After running, when we access the server with a path that does not exist, it will show us that the resource does not exist:

 Of course, we can also add a jump button to the web page. First, create a folder in the root directory of the web, and then create two html files, which are divided into a web page and b web page:

 Then we add a jump button to the homepage of the website:

 Let's run it and see:

 It can be seen that there is no problem. When we click, we will jump to the path of news resources or the path of e-commerce resources.

Of course, you can also add a back button to the a page:

 These are the front-end knowledge, we will not operate, we continue to explain the back-end knowledge:

Previously, in the response line, we filled in a certain value in the content-type comparison table for demonstration purposes. In fact, there are not only html resources in the web page, but also formats like jpg photos, so we need to write a function. When we When the resource is of what type, our function automatically returns the value in the content-type comparison table. To complete this work, we first need to intercept the suffix of the resource type.

 First create a string class to save the type suffix of the resources in the path, and then we only need to find the . symbol in the path from the back to the front, and then we can directly intercept it through substr. If the . symbol is not found, we will give a default html suffix and compare it with content-type at that time.

 It is very simple to look up the comparison table with the suffixes. We only give examples of html and jpg here, and you must remember to put \r\n at the end.

Then our response header should also have the size of the resource that the user wants to access. Finding the size is very simple. We have read the text in the specified path, and directly use body.size() to get the size of the resource. If the program runs When it crashes, let's try body.size()+1, because the string class will have its own \0. Speaking of which, the content of our Http protocol format is all done. We only need to remember that the request is divided into four parts: request line, request header, blank line, request body, the body can be absent, and the request line contains the http access method , url, and http version. Note that each part must end with \r\n. There can be many request messages, such as the resource size and resource type we listed above. Responses are the same as requests.

 The referer in the figure above represents the webpage we are redirected from. Cookies are very important and we will explain them in detail later.

The get method and post method in the http request method

To know the difference between get and post methods, we must first know the form:

 For example, the password we usually use to log in is a large form form, and when we enter the password and account number for data submission, the essential front-end must submit through the form form, and the browser will automatically convert the content in the form form into GET /POST method request. Let's add a form ourselves:

 First, the type is text to indicate the account, the type is password to indicate the password, the type is submit to indicate submission, and the value is to indicate the default content. The default content is as follows:

 The preset content is the prompt font of the input box, and our form action is followed by the resource path to jump after submission, and then the method is followed by our http request method:

 Let's demonstrate with get first, fill in any path that is not there, and it will jump to 404:

 The picture above is the URL bar before login

 The above picture is after submitting. We can see that the get method will directly put the user name and password behind the form submission information when we submit, which means that if a login device uses the get method, the password will be directly obtained. It would be embarrassing for outsiders to see.

 Of course, our server will definitely receive the password and account number submitted by the user. Let's try the post method:

 It can be seen that the post method will only add the form submission path after the URL information, without the account name and password, but for the server, it will still receive the user name and password submitted by the browser.

Summary: The GET method passes parameters through url, while POST submits parameters through the body of the http request.

The POST method submits parameters through the text, so ordinary users cannot see it, and the privacy is better, but privacy is not equal to security. In fact, the security of POST and GET is the same.

Knowing the get and post submission methods, we can actually find that the previous method of extracting the three fields from the request line is not comprehensive, because if it is the get method, the submission information will be automatically added after the url, and the submission information is divided by ?, so We can extract the data on the right side of ?, which represents the account number and password submitted by the user:

 HTTP status code:

 In the http status code, we focus on explaining the redirection status code starting with 3. There is a common misunderstanding here. Many people think that when accessing a web page shows 404, it is a server problem. In fact, this is a client problem. When the client does not have the correct path to access resources or the server does not have this resource at all, then A 404 error will be reported, because the client accesses resources that do not exist on the server, so the error code starting with 4 is the client error status code.

Redirection is divided into permanent redirection and temporary redirection. For example, the server of a website can only accommodate 1 million people before, but as the number of people increases, the server is not enough. At this time, the company adds a new server that can accommodate 100 million people. The ip and port number of the new server are different from the previous one. In order to save the previous 1 million customers, the previous server is redirected. As long as you visit the previous server, you can choose to jump to the new server. This is permanent redirection.

Temporary redirection is similar to our opening screen advertisement, which actually jumps to the advertiser's server 5 seconds before the screen opening.

Temporary redirection is a good demonstration, just add location: redirected URL to the response header

Those who are interested can try it.

HTTP common headers

Content-Type: data type (text/html, etc.)
Content-Length: the length of Body (text)
Host: The client informs the server that the requested resource is on which port of the host;
User-Agent: declare the user's operating system and browser version information;
referer: which page the current page is redirected from;
location: Used with the 3xx status code, telling the client where to visit next;
Cookie: Used to store a small amount of information on the client. Usually used to implement the function of the session

long link

As you can see from the webpage we demonstrated before, there are actually a variety of elements in the webpage, such as text, pictures, audio, etc. These resources actually have their own paths under the web root directory. When we enter the homepage, not only Will visit the homepage, and also visit the path of the picture or the path of the audio or the path of the video that appears on the homepage, etc., that is to say, we may have many client requests sent to the server by a webpage, and we all know the current Browsers are all highly concurrent and multi-threaded, so they can display so much information on one page at a time. Since the request link needs to be sent multiple times, it will inevitably lead to a decrease in efficiency. When we were learning the tcp server, we also knew that every time the client communicates with The server needs a three-way handshake process when establishing a connection, so how to reduce the efficiency of consumption? When a link is established and a large amount of resources is obtained, it is completed through the same resource. This is a long link. For example, our previous webpage had pictures and videos. Without long links, we needed to initiate multiple http requests for pictures and videos to be refreshed through different links. However, with long links, pictures and videos can be obtained through the same link.

Note: To support long links, both the client and the server must support them.

Cookies (session persistence)

What is a cookie? First, let’s give an example: When we log in to QQ or WeChat, we only need to log in once unless there are special circumstances, and we don’t need to enter the password and account number when we log in next time. We can log in directly. This is session persistence. Next we Look at the phenomenon:

 As shown in the picture above, we found that the homepage loads very slowly when logging in to the video website for the first time, but it can be displayed directly next time after logging in once. This is because the browser caches the data for us. The same is true for passwords and account numbers. It helps us record the login information, and the browser will help us log in directly when we enter the app next time. Note: The http protocol is stateless. This protocol itself will not help us record data, because it is a routine operation for users to view new web pages. If a page jump occurs, the new page will not be able to identify which user it is. In order to allow users to freely access the entire website according to their own identities once they log in, session persistence exists.

Cookies are divided into memory level and file level. If it is at the memory level, when we kill the process of an APP, then re-entering the APP will allow us to log in again, and if it is at the file level, we will still log in automatically even if we shut down.

If we just want to log in again, then find the cookie and directly delete the login information of a certain website or APP, as shown in the figure below:

 Here's a question:

 Our browser will save the login information. After saving, every time we log in to the website http to initiate a request, the server will check whether the account number and password filled in by the browser are correct. If it is correct, we will be allowed to log in, but if one day we visit a bad website virus, as shown below:

Once infected with the Trojan virus, the hacker can intercept the request sent by our browser to the server through the Trojan horse. For example, at this time, the browser takes our login information to initiate a login request to the server, and then is hijacked by the hacker, so that the hacker can get our Password, then the hacker requests the server through our account and password, and then logs in to our account, so how to solve such a problem? We found that it is not safe to store private information such as passwords on the client, because we cannot prevent users from accessing illegal or phishing websites to prevent information from being leaked, so a good way is to store user information in Service-Terminal. As shown below:

When we log in with a browser for the first time, and then the browser initiates a request to the server, the server finds that this is the first login user, and will directly create a session file. Since there are many users, each session file must have a unique session id, then the server will send a session id response to the browser, and the browser will save the session id when it receives the session id, which is the cookie we said before. When the user logs in next time, http initiates a request and saves the last saved session id Send it to the server, the server only needs to check whether the file corresponding to the session id sent by the browser exists, and as long as it exists, the user will be in the login state.

We can find that when the user information is saved on the server side, even if the request initiated by the client is hijacked, the hacker can only get the session id, but cannot get the user's password and account number, although we can still log in to our account through session i , but compared to password leakage is already very good, how to solve the login account problem also needs to cooperate with other strategies, for example, our account suddenly shows that the ip area has changed, let us confirm whether it is the person, if not, it will freeze for us account.

Let's demonstrate how to write cookie information:

 We only need to add the word set cookie in the response header and follow the session id, let's run it and go to the browser to see:

You can see that our browser already has the content of the session id we set casually, let’s see if the server receives it:

 We found that every http request will automatically carry all the cookies that have been set to help the server perform authentication.

 In http, we can see that there are still unresolved security issues. In the https protocol we will explain in the next article, we will introduce in detail how https solves the problems left by http.


Summarize

The content of the http protocol is transmitted in plain text in the form of text.

Guess you like

Origin blog.csdn.net/Sxy_wspsby/article/details/131523029