"Network programming" application layer protocol_HTTP protocol learning and in-depth understanding

The content of the "Preface" article is roughly an explanation of the HTTP protocol of the application layer protocol.

"Belonging column" network programming

"Homepage link" personal homepage

"Author" Mr. Maple Leaf (fy)

"Mr. Maple Leaf is a little literary" "Sentence Sharing"
As the saying goes, there is no turning back when you open a bow, there are only three results: arrow breaking, arrow falling, and arrow hitting the target.
——Jiang Xiaoying "Su Dongpo: The Most True Love in the World"

HTTP

1. Introduction to HTTP protocol

HTTP(Hyper Text Transfer Protocol)The protocol, also known as Hypertext Transfer Protocol, is a request-response protocol that works at the application layer

insert image description here
Although we said that the application layer protocol can be customized by ourselves, in fact, some excellent engineers have already defined some ready-made protocols, and the application layer protocol HTTP (Hypertext Transfer Protocol) is one of them for our direct reference. .

Second, know the URL

Usually what we commonly call "URL" actually meansURL

URL(Uniform Resource Lacator)It is called a uniform resource locator, which is what we usually call a URL.

A URL roughly consists of the following parts:
insert image description here

(1) Protocol scheme name

  • http://Indicates httpthe name of the protocol, indicating the protocol that needs to be used when making a request. The protocols that we often see on the Internet in our daily life are: httpand https, what we want to explain is that http协议the httpsprotocol is called a secure data transmission protocol, which will be discussed in the next chapter.

(2) Login information

  • usr:passIndicates the login authentication information, including the user name and password of the login user. This field is now omitted for most URLs

(3) Server address

  • www.example.jpIndicates the server address, also known as the domain name. This domain name is IPan address, which is used to identify a unique host. This domain name will be resolved into IPan address, and the domain name resolution is completed by the domain name resolution server.

In Linux, pingthe domain name can be resolved through the command
insert image description here

(4) Server port number

  • 80Indicates the server port number, httpthe default port number of the protocol is 80, and httpsthe default port number of the protocol is 443.
  • In the URL, the port number of the server is generally omitted, because the correspondence between the service and the port number is clear (the code has been written), so there is no need to specify the port number corresponding to the protocol when using the httpprotocol

(5) Hierarchical file paths

  • /dir/index.htmIndicates the path where the resource to be accessed is located
  • The first one /is the root directory of the web, not the root directory of Linux. The root directory of the web can be any directory under Linux
  • The purpose of accessing the server is to obtain a certain resource on the server. The corresponding server process can already be found through the previous domain name and port. What needs to be done at this time is to indicate the path where the resource is located.

httpA protocol is a protocol for obtaining resources from a remote server to the local.
Everything we see on the Internet is a resource, such as text, audio, pictures, web pages, etc. These resources (files) must be stored on a certain server. HTTPThe protocol can transmit various types of file resources, so it is called hypertext transfer protocol instead of text transfer protocol. The types of file resources that can be transferred are reflected in the word.

(6) query string

uid=1Represents the parameters provided at the time of the request, &separated by symbols

(7) Fragment identifier

ch1Represents the fragment identifier, which is a partial supplement to the resource

Three, urlencode and urldecode

In the URL, characters like /and ? etc. have been interpreted as special meanings by the url. Therefore, these characters cannot appear randomly.
For example, if these special characters are required in a parameter, the special characters must be escaped first

The rules for escaping are as follows:

Convert the characters that need to be transcoded to 16进制, and then from right to left, take 4 digits (less than 4 digits and process them directly), make one digit for every 2 digits, add it in front, and encode %it as%XY

For example, when we search for something in the browser:

For example, when we search C++, wdall of the following are our search parameters ( wdthe name of the parameter), +the plus sign is a special symbol in the URL, and +the value after the character is converted to hexadecimal is 0x2B, so one +will be encoded into a %2B
note : Chinese characters and special characters must be converted. This process becomes URL. encode
insert image description here
insert image description here
When the server receives our request, it will %xxdecode the special symbols. This process is called URL decode. When using C++to write the server, we need to do this work (the source code is available on the Internet, just use it directly) Let’s
verify the decoding process, just search for an online URL decoding tool on the Internet and use it
insert image description here
insert image description here

Fourth, the format of HTTP protocol request and response

HTTP is an application layer service based on requests and responses. As a client, you can initiate a request to the server . After requestthe server receives this , it will analyze it to find out what resources you want to access, and then the server will build a response to complete this. An HTTP request. Based on this working method, it is called or mode, c means , s means , b means that the browser is the client of the protocol, which means that we do not need to write the client to use the protocolrequestrequestresponserequest&responsecsbsclientserverbrowser
httphttp

4.1 HTTP request protocol format

The HTTP request protocol format is roughly as follows:
insert image description here
An HTTP request consists of four parts:

  1. Request line: [request method]+[url]+[http version]+[\r\n]
  2. Request header: the attributes of the request, these attributes are name:valuelisted in the form of + ending with [\r\n]
  3. Blank line: Encountering a blank line (\r\n) indicates the end of the request header
  4. Request body: The request body is allowed to be an empty string, and the request body can be empty. If the request body exists, there will be one in the request header Content-Lengthto identify the length of the request body

Notice: http uses special symbols (\r\n) to divide the content

The first three parts are generally included with the HTTP protocol, and the last part of the request body can be omitted (empty string). After the request is packaged, it is directly delivered to the next layer: the transport layer, which will then process it

4.2 HTTP response protocol format

The format of the HTTP response protocol is roughly as follows:
insert image description here
The HTTP response consists of four parts:

  1. Status line: [http version]+[status code]+[status code description]]+[\r\n]
  2. Response header: the attributes of the response, these attributes are name:valuelisted in the form of + ending with [\r\n]
  3. Empty line: encountering an empty line (\r\n) indicates the end of the response header
  4. Response body: The response body is allowed to be an empty string, and the response body can be empty. If the response body exists, there will be an attribute in the response header Content-Lengthto identify the length of the response body

Notice: http is divided by special symbols (\r\n).
The first three parts of the content are generally provided by the HTTP protocol. The last part of the response body can be omitted (empty string). After the request is packaged, it is directly delivered to the next Layer: transport layer, which is then processed by the transport layer

4.3 Questions

How to ensure that an http request and response are completely read at the application layer? ?

  • First, for requests and responses it can be read line by line (each line has \r\n)
  • Use whilea loop to read a complete line (for \r\nsplitting) until all request headers or response headers are read, and a blank line is read to indicate that the read is complete
  • The next step is to read the text, how to ensure that the text is read? ? There are no special symbols in the text
  • We have already ensured that the request or response header has been read, and there must be a field in the header: Content-Length, which is used to identify the length of the response body or request body
  • For Content-Lengthparsing, get the length of the text, so that you can ensure that the read text is complete, and you can read it directly according to the parsed length

This ensures that an http request and response are completely read at the application layer

How are http requests and responses serialized and deserialized? ?

  • Serialization and deserialization httpare implemented by themselves by using special characters \r\n. 第一行 + 请求/响应报头As long as the special characters are read line by line, the entire string can be obtained
  • The body does not need to be serialized and deserialized, if necessary, customize it yourself

The above is httpa macro understanding of the protocol, and the following code is written to understand httpthe protocol.

Five, HTTP test code

5.1 HTTP requests

Let's write a simple TCP server. What this server needs to do is to print the HTTP request sent by the browser.

httpServer.hpp

#pragma once

#include <iostream>
#include <string>
#include <functional>
#include <strings.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include "protocol.hpp"

static const int gbacklog = 5;
using func_t = std::function<bool(const httpRequest &req, httpResponse &resp)>;

// 错误类型枚举
enum
{
    
    
    UAGE_ERR = 1,
    SOCKET_ERR,
    BIND_ERR,
    LISTEN_ERR
};

// 业务处理
void handlerHttp(int sockfd, func_t func)
{
    
    
    char buffer[4096];
    httpRequest req;
    httpResponse resp;
    size_t n = recv(sockfd, buffer, sizeof(buffer) - 1, 0);
    if (n > 0)
    {
    
    
        buffer[n] = 0;
        req.inbuffer = buffer;
        func(req, resp);
        send(sockfd, resp.outbuffer.c_str(), resp.outbuffer.size(), 0);
    }
}

class ThreadDate
{
    
    
public:
    ThreadDate(int sockfd, func_t func) : _sockfd(sockfd), _func(func)
    {
    
    }

public:
    int _sockfd;
    func_t _func;
};

class httpServer
{
    
    
public:
    httpServer(const uint16_t &port) : _listensock(-1), _port(port)
    {
    
    }

    // 初始化服务器
    void initServer()
    {
    
    
        // 1.创建套接字
        _listensock = socket(AF_INET, SOCK_STREAM, 0);
        if (_listensock == -1)
        {
    
    
            std::cout << "create socket error" << std::endl;
            exit(SOCKET_ERR);
        }
        std::cout << "create socket success: " << _listensock << std::endl;

        // 2.绑定端口
        // 2.1 填充 sockaddr_in 结构体
        struct sockaddr_in local;
        bzero(&local, sizeof(local));       // 把 sockaddr_in结构体全部初始化为0
        local.sin_family = AF_INET;         // 未来通信采用的是网络通信
        local.sin_port = htons(_port);      // htons(_port)主机字节序转网络字节序
        local.sin_addr.s_addr = INADDR_ANY; // INADDR_ANY 就是  0x00000000

        // 2.2 绑定
        int n = bind(_listensock, (struct sockaddr *)&local, sizeof(local)); // 需要强转,(struct sockaddr*)&local
        if (n == -1)
        {
    
    
            std::cout << "bind socket error" << std::endl;
            exit(BIND_ERR);
        }
        std::cout << "bind socket success" << std::endl;

        // 3. 把_listensock套接字设置为监听状态
        if (listen(_listensock, gbacklog) == -1)
        {
    
    
            std::cout << "listen socket error" << std::endl;
            exit(LISTEN_ERR);
        }
        std::cout << "listen socket success" << std::endl;
    }

    // 启动服务器
    void start(func_t func)
    {
    
    
        for (;;)
        {
    
    
            // 4. 获取新链接,accept从_listensock套接字里面获取新链接
            struct sockaddr_in peer;
            socklen_t len = sizeof(peer);
            // 这里的sockfd才是真正为客户端请求服务
            int sockfd = accept(_listensock, (struct sockaddr *)&peer, &len);
            if (sockfd < 0) // 获取新链接失败,但不会影响服务端运行
            {
    
    
                std::cout << "accept error, next!" << std::endl;
                continue;
            }
            std::cout << "accept a new line success, sockfd: " << sockfd << std::endl;

            // 5. 为sockfd提供服务,即为客户端提供服务
            // 多线程版
            pthread_t tid;
            ThreadDate *td = new ThreadDate(sockfd, func);
            pthread_create(&tid, nullptr, threadRoutine, td);
        }
    }

    static void *threadRoutine(void *args)
    {
    
    
        pthread_detach(pthread_self()); // 线程分离
        ThreadDate *td = static_cast<ThreadDate *>(args);
        handlerHttp(td->_sockfd, td->_func); // 业务处理
        close(td->_sockfd);                  // 必须关闭,由新线程关闭
        delete td;
        return nullptr;
    }

    ~httpServer()
    {
    
    }

private:
    int _listensock; // listen套接字,不是用来数据通信的,是用来监听链接到来
    uint16_t _port;  // 端口号
};

httpServer.cc

#include "httpServer.hpp"
#include <memory>

// 使用手册
// ./httpServer port
static void Uage(std::string proc)
{
    
    
    std::cout << "\nUage:\n\t" << proc << " local_port\n\n";
}

bool get(const httpRequest &req, httpResponse &resp)
{
    
    
    std::cout << "----------------------http start----------------------" << std::endl;
    std::cout << req.inbuffer;
    std::cout << "----------------------http end  ----------------------" << std::endl;
}

int main(int argc, char *argv[])
{
    
    
    if (argc != 2)
    {
    
    
        Uage(argv[0]);
        exit(UAGE_ERR);
    }

    uint16_t port = atoi(argv[1]); // string to int
    std::unique_ptr<httpServer> tsvr(new httpServer(port));
    tsvr->initServer(); // 初始化服务器
    tsvr->start(get);   // 启动服务器

    return 0;
}

protocol.hpp

#pragma once

#include <iostream>
#include <string>
#include <vector>

class httpRequest
{
    
    
public:
    std::string inbuffer;
};

class httpResponse
{
    
    
public:
    std::string outbuffer;
};

After running the server program, and then access it with a browser, our server will receive the HTTP request from the browser and print it out.
Since there is nothing in the code, only the following information will be displayed.
insert image description here
The server will receive the browser's request Incoming HTTP requests and print them out (although only visited once, but will receive multiple HTTP requests, the behavior of the browser)
insert image description here

GET / HTTP/1.1
Host: 119.3.185.15:8080
Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Edg/114.0.1823.67
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6

explain:

  • Since the browser uses the HTTP protocol by default when it initiates a request, we can directly enter the server’s public network address and port number without specifying the HTTP protocol when entering the URL in the browser’s url box, such as the
  • The first line is the status line: GET / HTTP/1.1, GETwhich is the request method, which is the browser’s default, and the URL is \, because we don’t have a specific request, the browser will visit \(web root directory) by default, which HTTP/1.1is the version number of HTTP

The rest are all request headers, all of which are name: valuevarious request attributes displayed in the form of lines.
insert image description here
A blank line will also be printed. Since there is no request body, the default is an empty string, and there will be no printed information displayed by
insert image description here
the client. Host version information:

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Edg/114.0.1823.67

User-AgentIt is to display the version information of the client host that initiated the request.
For example, when we search for something to download, it will show us the download that matches our own operating system by default. How does it know that we want to download the computer version? ?
The reason is that when we initiate the request, the request already carries the version information of our operating system . The rest
insert image description here
is to tell the server what my client currently supports, such as the encoding format, what kind of text, etc.
insert image description here
talk again

How to separate HTTP headers from payload?

  • For HTTP, the status line and response/request header are HTTP header information, and the response/request body here is actually the HTTP payload.
  • If a blank line is read, it means that the header has been read. The blank line is the key to separating the HTTP header from the payload.
  • That is, http uses special symbols to separate headers and payloads

Why does HTTP need an interactive version?

  • The request line in the HTTP request and the status line in the HTTP response both contain http version information. . The HTTP request is sent by the client, so the HTTP request indicates the http version of the client, and the HTTP response is sent by the server, so the HTTP response indicates the server's http version
  • When the client and the server communicate, they will exchange the http versions of both parties, mainly for compatibility issues. Because the server and the client may use different http versions, in order to allow the clients of different versions to enjoy the corresponding services, the communication parties are required to perform version negotiation
  • For example, an application whose version is 1.0 is upgraded to 2.0 today (new features are provided, but the old version does not). Some users upgrade, and some users choose not to upgrade. At this time, there will be a problem of version differences. The old version accesses the server, but cannot access the new version of the server. The old version must be allowed to access the old server. At this time, the version information of both parties needs to be exchanged, so that clients of different versions can enjoy the corresponding services.
  • Therefore, in order to ensure good compatibility, both parties need to exchange their version information

5.2 HTTP response

Simply add a little code, let's observe the HTTP response

bool get(const httpRequest &req, httpResponse &resp)
{
    
    
    std::cout << "----------------------http request start----------------------" << std::endl;
    std::cout << req.inbuffer;
    std::cout << "+++++++++++++++++++++++++++++" << std::endl;
    std::cout << "request method: " << req.method << std::endl;
    std::cout << "request url: " << req.url << std::endl;
    std::cout << "request httpversion: " << req.httpversion << std::endl;
    std::cout << "request path: " << req.path << std::endl;
    std::cout << "request file suffix: " << req.suffix << std::endl;
    std::cout << "request body size: " << req.size << "字节" << std::endl;
    std::cout << "----------------------http request end  ----------------------" << std::endl;

    std::cout << "----------------------http response start ----------------------" << std::endl;
    std::string respline = "HTTP/1.1 200 OK\r\n"; // 响应状态行
    std::string respheader = Util::suffixToDesc(req.suffix);
    std::string respblank = "\r\n"; // 响应空行
    std::string respbody;           // 响应正文
    respbody.resize(req.size);                                                     
    if (!Util::readFile(req.path, (char *)respbody.c_str(), req.size)) // 访问资源不存在,打开404html
    {
    
    
        struct stat st;
        stat(html_404.c_str(), &st);
        respbody.resize(st.st_size); 
        Util::readFile(html_404, (char *)respbody.c_str(), st.st_size); // 一定成功
    }
     
    resp.outbuffer = respline;
    respheader += "Content-Length: ";
    respheader += std::to_string(respbody.size());
    respheader +=  respblank;
    resp.outbuffer += respheader;
    resp.outbuffer += respblank;

    std::cout << resp.outbuffer;

    resp.outbuffer += respbody;
    std::cout << "----------------------http response end   ----------------------" << std::endl;

    return true;
}

Too much code will not be posted, gitee link: link

The result of the operation, the server responds back (when the browser accesses our server, the server will index.htmlrespond to this file to the browser, the default index.htmlfile is the home page of the visited website)
insert image description here
and print out part of the response information
insert image description here
Note: Just as an example, when constructing the HTTP response, only two attribute information are added to the response header, and there are many attribute information in the actual HTTP response header

Six, HTTP method

insert image description here
The common methods of HTTP are as follows: (in the request)
insert image description here

The most commonly used are the GET method and the POST method

When interacting with front-end and back-end data, the essence is that the front-end formsubmits through the form, and the browser will automatically convert the content of the form into GET/POSTa request .
For example, the front-end form submission page
insert image description here
action="/a/test.py"means that the form is submitted to the specified path file, method="GET"which means http access The method is GET
to start the server, visit the browser
insert image description here
and submit the content, such as Zhang San, 123123,
because the accessed page /a/test.pydoes not exist, display to 404the page (set by yourself)
insert image description here
to view the request information printed by the server
insert image description here

GETWhen the method submits parameters, the parameter submission will be spliced ​​to the back of the URL

/a/test.py?The front is the resource we want to request, and the back xxxname=%E5%BC%A0%E4%B8%89&yyypwd=123123is the information submitted by the form. You will also see the submitted content in the browser URL bar. Let’s
insert image description here
try POSTthe method below, modify the HTML
insert image description here
browser to access
insert image description here
the submission form, and you will not see it in the browser URL bar The content we submit, but we can see the resources we access
insert image description here
View the request information printed by the server
insert image description here

POSTThe method submits the form information, and the submitted parameters are placed in the body of the http request

In the URL bar of the browser, we will not see the content we submit, but we can see the resources we visit
insert image description here

Summary: GET/POSTThe difference between http request methods

  • GETThe method submission parameter is to pass the parameter through the URL, for example:http://ip:port/xxx/yyy?name=value&name2=value2...
  • POSTThe method submission parameter is to submit the parameter through the http request body
  • POSTThe method submits parameters through the request body, which is generally invisible to users and better in privacy
  • GETMethod submission parameters are parameters passed through the URL, which can be seen by anyone
  • GETThe method passes the parameter through the URL, and the parameter is destined not to be too large, while POSTthe method passes the parameter through the body, and the body can be very large

Notice: Privacy! = Security, HTTP security is not good, can be directly caught by others

Seven, HTTP status code

HTTP status codes are as follows:
insert image description here
Note: 1xx represents the status code starting with 1, the status code has three digits, for example, 404 is
the most common status code, such as200(OK), 404(Not Found), 403(Forbidden), 302(Redirect, 重定向), 504(Bad Gateway)

Let's talk about Redirection (redirection status code)

Redirection is to redirect various network requests to other locations through various methods. At this time, the server is equivalent to providing a guiding service.
Redirection is done by the client, and the server tells the client to
insert image description here
redirect It can be divided into temporary redirection and permanent redirection. Status code 301 indicates permanent redirection, while status codes 302 and 307 indicate temporary redirection.

Moved Permanently, permanently redirected

  • Permanent means that the originally accessed resources have been permanently deleted, and the client should be redirected according to the new URI access

Temporary Redirect

  • Temporary means that the accessed resources may be temporarily accessed using the location URI first, but the old resources are still there, and you may not need to redirect the next time you visit
  • 302 redirection may have URL hijacking (URL hijacking). For example, the search results still display URL A, but the content of the webpage used is the content on your URL B. This situation is called URL hijacking

For more explanations of redirection, link to the article: Redirection

Here's a demonstration of temporary redirection

  • The Location field is an attribute information in the HTTP header, which indicates the target website you want to redirect to

Change the status code in the HTTP response to 307, and then keep up with the corresponding status code description. In addition, you need to add a Location field in the HTTP response header. This Location is followed by the webpage you need to redirect to, such as here Set it as the home page of my CSDN
insert image description here
At this time, when the browser accesses our server, it will immediately jump to the home page of CSDN
insert image description here
The server responds with printing information
insert image description here

Eight, HTTP Common Header

Common HTTP headers are as follows:

  • Content-Type: data type (text/html, etc.)
  • Content-Length: the length of the Body
  • Host: The client informs the server that the requested resource is on which port of the host;
  • User-Agent: declare the user's operating system and browser version information;
  • Referer: Which page the current page is redirected from
  • Location: Use it with 3xx status code to tell the client where to visit next
  • Cookie: Used to store a small amount of information on the client side. Usually used to implement the session function

Host

HostThe field indicates the IP and port of the service that the client wants to access. For example, when the browser accesses our server, the Host field in the HTTP request sent by the browser is filled with our IP and port.
insert image description here

User-Agent

As mentioned earlier, User-Agentit represents the version information of the operating system and browser corresponding to the client.
insert image description here

Refer

RefererIt represents which page you are currently jumping from. RefererThe advantage of recording the previous page is that it is convenient to roll back, and on the other hand, we can know the correlation between our current page and the previous page.

Keep-Alive (long connection)

  • Keep-Alive, also known as long connection, is a technology used in the HTTP protocol to maintain a persistent connection between the client and the server to reduce the delay and resource consumption of each request

In the traditional HTTP protocol, every time the client sends a request, the server will immediately return a response and close the connection. Such a connection is called a short connection. The long connection is that after a connection is established between the client and the server, multiple requests can be sent and multiple responses can be received through the connection. The
advantages of a long connection include:

  • Reduce the overhead of connection establishment and disconnection: In short connections, each request needs to establish and disconnect connections, while long connections can reuse established connections, reducing these overheads
  • Reduce delay: In short connections, each request needs to re-establish the connection, while long connections can avoid this delay and improve the response speed
  • Reduce resource consumption: in a short connection, each request needs to re-establish the connection, and a long connection can reduce the consumption of server resources

Please be aware of: The long connection is not permanent. Both the server and the client can actively close the connection. The value corresponding to the field
in the HTTP request or response header means that the long connection is supported. Let's talk about it in detailConnectKeep-Alive
insert image description here
Cookie和Session

Nine, Cookie and Session

HTTP is actually a stateless protocol , there is no relationship between each request/response of HTTP, but you find that this is not the case when you use a browser.
For example, when we log in to a website, such as bilibili, after logging in once, the login status can remain for a long time. After closing the bilibili website and reopening it, we find that the account is still logged in, and there is no need to log in again. Close the browser, too
insert image description here

This is achieved through cookieand , this is called session persistencesession

Notice: Strictly speaking, session retention is not a natural feature of http. It is found that session retention is required after later use.
The http protocol is stateless, but users need it. When the user performs web page operations, it is necessary to view a new web page. If a page jump occurs, the new page will not be able to identify which user it is, and it is necessary to log in again. This is obviously inappropriate. Therefore, for the user
once Log in, you can visit the entire website according to your own identity, which requires session persistence

session persistence (old way)

  • When the user visits the website, the website will induce the user to log in. After the user logs in, the client browser will save the user's account number and password. In the future, as long as the user visits the same website, the browser will automatically push the saved history. information, authentication
  • The browser saves account numbers and passwords. This technique is calledcookie

cookie

  • CookieIt is a small text file stored in the user's browser, which is used to store the user's identity authentication information, personalized settings, etc. When a user visits a website, the server stores some information in and sends this to the server Cookiein future requestsCookie
  • cookieThere are two ways to save: cookie文件save and cookie内存save
  • Close the browser and open it again, visit the website you have logged in before, if you need to re-enter the account number and password, it means that the cookie information saved in the browser when you logged in before is at the memory level
  • Close the browser or restart the computer and open it again, visit the website you have logged in before, if you do not need to re-enter the account and password, it means that the cookie information saved in the browser when you logged in before is at the file level

This cookiecan be managed in the browser, cookiedelete all these, and all websites need to log in again
insert image description here
In the website, after logging in, we can also view the website for cookie
insert image description here
insert image description here
testing, cookiedelete the website, after deletion, The user is not logged in and needs to log in again
insert image description here
insert image description here

cookieProblems in use

Under normal circumstances, there is no problem. If
insert image description here
the user's unsafe operation is infected with a virus, worm, Trojan horse, etc., the user's own cookiewill be leaked.

  • Worms: aiming at directly attacking user hosts (mainly attacking CPU, memory, etc.), causing exhaustion of system resources
  • Trojan horse virus: Trojan horses are similar to the Trojan horses in ancient legends. They hide enemy soldiers and come out at night to destroy them. The Trojan horse is not aimed at destroying the computer, but is hidden in a seemingly normal program. It cooperates with hackers to cooperate with the inside and outside. The Trojan horse is aimed at stealing user information and remotely controlling the computer, and will not maliciously attack the user host

cookieAfter being obtained by someone with malicious intentions, the hacker can directly access the server from his own browser, and the server will mistakenly believe that the user is accessing the server (great harm to society)
Solutionsession

session

  • sessionIt is a server-side storage technology for storing user session information.
  • When the user visits the website for the first time, the server will create a unique ID for the user Session ID, store the ID in the browser and Cookiesend it to the browser. The browser will automatically send this to the server in subsequent requests Session ID. The server Session IDfinds the corresponding session information according to the
  • sessionIt is stored on the server side, and each user has one session文件, session IDwhich is unique on the server (a string)
  • The client browser does not need to store the user's account password, sessionIDjust store it, that is, sessionIDput it cookiein

SessionIDWhen we log in to a website for the first time and enter the account number and password , the server will generate a corresponding one after the server authentication is SessionIDsuccessful
. When responding, the generated SessionID value will be responded to the browser. After the browser receives the response, it will automatically extract Session IDthe value and save it in the browser cookiefile. When accessing the server later, this will be automatically carried in the corresponding HTTP request Session ID.
insert image description here

  • At this time, the leakage of user information has been greatly improved, but there are still problems
  • The hacker has stolen the user's session file. The hacker can access the server as the user, and the server will mistakenly believe that the illegal user is a normal user. This cannot be solved.
  • At this time, a certain strategy, such as IP, is used to make it session IDinvalid. Only the person with the password can log in, and the login is successful again session ID, which alleviates session IDthe problem of theft to a certain extent (it cannot be cured)

security is relative

  • While not really addressing security concerns, this approach is relatively safe. There is no concept of absolute security on the Internet. Any security is relative. Even if you encrypt the information sent to the network, it may be cracked by others.
  • There is a rule in the security field: if the cost of cracking a piece of information is far greater than the benefits obtained after cracking it (indicating that doing this is a loss), then the information can be said to be safe .

Verify below, the client will carry the cookie information

  • When the browser accesses our server, if the HTTP response from the server to the browser contains fields, then this information Set-Cookiewill be carried when the browser accesses the server againcookie

j Simply modify the above code, if there is too much code, don’t paste it, link: Code
insert image description here
Add a field to the server’s response header Set-Cookieto see if the browser will bring this Set-Cookiefield when it initiates the HTTP request for the second time
. After running the server , use a browser to access our server, cookiethe value is set by us 1234567asdf, at this time, such a cookie is written in the browser,
insert image description here
the second request of the client has already carried the cookie information
insert image description here
, after that, every http request will automatically Carry all the cookies that have been set to help the server perform authentication behaviors. This is the function of http session retention

Tool recommendation:

postman: HTTP debugging tool, simulate browser behavior
insert image description here
fiddler: packet capture tool, HTTP tool
insert image description here
--------------------- END --------- -------------

「 作者 」 枫叶先生
「 更新 」 2023.7.11
「 声明 」 余之才疏学浅,故所撰文疏漏难免,
          或有谬误或不准确之处,敬请读者批评指正。

Guess you like

Origin blog.csdn.net/m0_64280701/article/details/131620304