【Computer network】HTTP/HTTPS

HTTP network protocol

Although we say that the application layer protocol is determined by our programmers, in fact, some big guys have defined some ready-made and very useful application layer protocols for our direct reference. HTTP (Hypertext Transfer agreement) is one of them

Understand network protocols

The protocol is a kind of "agreement". The interface of socket api, when reading and writing data, is sent and received in the form of "string". What if we want to transmit some "structured data"?

First of all, we can store this structured data in a structure. Both parties in the communication know this structure. When receiving and sending requests or responses, we can use the structure to receive, and thus achieve the purpose of transmitting structured data. And how is the structure defined, and what are the internal members of the structure? This is an agreement between the client and the server before the communication. The premise of communication is that both parties know and are willing to abide by this agreement.

Simple calculator online

See code cloud for all code details

The use of these two structures is a contract (protocol) defined by ourselves. Both the client and the server we create must abide by it! This is called a custom protocol

The online version calculator in cs mode we wrote is essentially an application layer network service. The basic communication code is written by ourselves. Serialization and deserialization (to be discussed later) are completed through components, requests, We did it ourselves when we agreed on the result format, and we also wrote the business logic ourselves.

Through this simple version of the calculator, we can establish a preliminary understanding of the upper three layers of the OSI seven-layer model

$[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-WTyDpKFQ-1674359312023) (C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\ image-20230117092635849.png)]$

//请求结构体
typedef struct request{
    
     //请求结构体
    int x; 				//数字1	
    int y;				//数字2
    char op;			//运算符
}request_t;

//响应结构体
// response format 相应格式
typedef struct response{
    
    
    int sign;			//标志位，反应结果是否可靠
    int result; 		//结果
}response_t;

Part of the server communication code

// 服务器 
// 1.Read request
  request_t req;							  //创建请求结构体接收请求
  memset(&req, 0, sizeof(req));		
  ssize_t s = read(sock, &req, sizeof(req));  //将网络数据传输给请求结构体对象
  std::cout << "get a request, request size = " << sizeof(req) << std::endl;
  std::cout << s << " " << req.x << req.op << req.y << std::endl; 
  if (s == sizeof(req)){
    
    			
    // Read full request success 			 //若获取的请求完整
    // 2.prase request 						 //解析请求信息，构建响应结构体
    std::cout << "prase request" << std::endl;
    response_t resp{
    
    0 , 0};					 
    switch (req.op){
    
    						 //通过请求信息来构建响应
      case '+':
        resp.result = req.x + req.y;
        break;
      case '/':
        if (req.y == 0) resp.sign = 1;
        else resp.result = req.x / req.y;
        break;
      default:
        resp.sign = 3; // request method errno
        break;
    }
    // send response
    std::cout << "return response" << std::endl;
    write(sock, &resp, sizeof(resp));	 	 //将构建的响应发送给客户端
  }

Part of client communication code

  // 客户端
  request_t req;						//从标准输入（客户）得到数据保存到结构体中
  cout << "Please Enter Date One# ";
  cin >> req.x;
  cout << "Please Enter Date Two# ";
  cin >> req.y;
  cout << "Please Enter Operator";
  cin >> req.op;

  write(sock, &req, sizeof(req));	 	//将结构体发送给服务器
  response_t resp;						//创建响应结构体接收服务器响应
  ssize_t s = read(sock, &resp, sizeof(resp)); //读取响应内容打印结果
  if (s == sizeof(resp)){
    
    
    if (resp.sign == 1){
    
    
      std::cout << "除零错误" << std::endl;
    }
    else if (resp.sign == 3){
    
    
      std::cout << "非法运算符" << std::endl;
    }
    else {
    
    
      std::cout << "result = " << resp.result << std::endl;
    }
  }

Through the above code, we have indeed completed the transmission of network structured data through the protocol defined by ourselves using the structure, but this method has very obvious disadvantages. First of all, we must ensure that the memory alignment of the client and the server must be the same . Secondly, once the server is updated and the transmitted structure is modified, all previous clients will not be able to use it , because the formats of the two structures are different. Then there is no hope that the transmitted data can be taken out as it is. Furthermore, the size of some data in many scenarios is not fixed , such as WeChat chat, how do you know the size of a message sent by a person, and how much space should we open for the structure of the message? ? If it is set too large, network resources will be wasted, if it is set too small, it may be a little longer, and problems such as truncation or garbled characters will occur

In order to solve the above problems, previous programmers proposed serialization and deserialization

Serialization and deserialization

When a host wants to upload data to the network, it serializes the data and then transmits it to the network. After a host wants to read data from the network, it needs to deserialize the data in the network

JSON is a serialization and deserialization tool commonly used in our daily development

sudo yum install -y jsoncpp-devel //安装json

JSON transfer data

JSON serialization

#include <iostream>
#include <string>
#include <jsoncpp/json/json.h>

typedef struct request{
    
    
    int x;
    int y;
    char op;
}request_t;

int main(){
    
    
    request_t req{
    
    10, 20, '*'};
    //序列化过程
    Json::Value root; //可以承装任何对象，json是一种kv式的序列化方案
    root["datax"] = req.x;
    root["datay"] = req.y;
    root["operator"] = req.op;

    // Json::StyledWriter writer;
    Json::FastWriter writer;
    writer.write(root);
    std::string json_string = writer.write(root);
    std::cout << json_string << std::cout;
    return 0;
}

Json::StyledWriter类型对象构建的json_string 
    
{
    
    "datax":10,"datay":20,"operator":42}

Json::FastWriter 类型对象构建的json_string
    
{
    
    
   "datax" : 10,
   "datay" : 20,
   "operator" : 42
}

[clx@VM-20-6-centos JsonTest]$ ldd a.out
        linux-vdso.so.1 =>  (0x00007fffddfee000)
        /$LIB/libonion.so => /lib64/libonion.so (0x00007f80236f2000)
        libjsoncpp.so.0 => /lib64/libjsoncpp.so.0 (0x00007f80233a2000) // 这就是第三方组件，也就是一个动态库
        libstdc++.so.6 => /home/clx/.VimForCpp/vim/bundle/YCM.so/el7.x86_64/libstdc++.so.6 (0x00007f8023021000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f8022d1f000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8022b09000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f802273b000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f8022537000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f80235d9000)

JSON deserialization

int main(){
    
    
    // 反序列化
    std::string json_string = R"({"datax":10, "datay":20, "operator":42})";//R()可以防止内部字符部分字符被转义
    Json::Reader reader; //使用Json::Reader类型对象反序列化 序列化数据，放入万能对象root中
    
    Json::Value root;
    reader.parse(json_string, root);
    request_t req;
    req.x = root["datax"].asInt();//root使用key来查找value数据，并使用asInt()函数转化成对应类型
    req.y = root["datay"].asInt();
    req.op = root["operator"].asUInt();
    std::cout << req.x << " " << req.op << " "<< req.y << std::endl;
    return 0; 
}

Optimize Calculator with JSON

1. Use JSON to serialize and deserialize the request and response structures respectively

std::string SerializeRequest(const request_t &req){
    
    
    Json::Value root;
    root["datax"] = req.x;
    root["datay"] = req.y;
    root["operator"] = req.op;

    Json::FastWriter writer;
    std::string json_string = writer.write(root);
    return json_string;
}

void DeserializeRequest(const std::string &json_string, request_t &out){
    
    
    Json::Reader reader;
    
    Json::Value root;
    reader.parse(json_string, root);
    out.x = root["datax"].asInt();
    out.y = root["datay"].asInt();
    out.op = root["operator"].asUInt();
}

std::string SerializeResponse(const response_t &resp){
    
    
    Json::Value root;
    root["sign"] = resp.sign;
    root["result"] = resp.result;

    Json::FastWriter writer;
    std::string json_string = writer.write(root);
    return json_string;
}

void DeserializeResponse(const std::string &json_string, response_t &out){
    
    
    Json::Reader reader;
    
    Json::Value root;
    reader.parse(json_string, root);
    out.sign = root["sign"].asInt();
    out.result = root["result"].asInt();
}

2. Use JSON strings to transmit data within the network

  //1.Method2 ReadRequest 从网络中读取Json字符串
  char buffer[1024] = {
    
    0};
  ssize_t s = read(sock, buffer, sizeof(buffer) - 1);
  buffer[s] = 0;
  if (s > 0){
    
    
    request_t req;
    DeserializeRequest(buffer, req);
      
  // send response 将结构化数据进行序列化后再发送
    std::string json_string = SerializeResponse(resp);
    write(sock, json_string.c_str(), json_string.size());
    std::cout << "return response successs" << std::endl;

Formally understand the HTTP protocol

Although we say that the application layer protocol is determined by our programmers, in fact, some big guys have defined some ready-made and very easy-to-use application layer protocols for our direct reference. **HTTP(super Text Transfer Protocol)** is one of the

The HTTP protocol is essentially the same as the network calculator we just wrote. It is an application layer protocol. It also implements three steps in our network calculator: 1. Network communication 2. Serialization and deserialization 3. Protocol detail

Basic understanding of URL

Usually what we call a website is actually a URL

The pictures and videos we request are called resources, and these resources are stored on a Linux machine in the network. IP + Port uniquely identifies a process, but cannot uniquely identify a resource. However, traditional operating systems store resources in files. For a single Linux system, the way to identify unique resources is through paths.

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-4cJ4B5LL-1674359312024) (C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\ image-20230117095522126.png)]

IP + PortUniquely identify a process (IP is usually presented as a domain name)

IP+Linux路径A network resource can be uniquely identified (the path can be confirmed by the directory name + /)

Common network protocols such as HTTP have their own specified server port numbers , which is like calling the police 110. All Chinese people know that 110 is the calling number for the police. And all those who have studied network programmers also know that HTTP corresponds to port number 80, so this port number can be omitted in many cases

urlencode and urldecode

Characters like / ? : have been understood by url as special meanings. Therefore, these characters cannot appear randomly. For example, if a parameter needs to contain these special characters, the special characters must be escaped first.

The rules of escaping are as follows: Convert the characters to be transcoded into hexadecimal, and then from right to left, take 4 digits (less than 4 digits and process them directly), make one digit for every 2 digits, add % in front, and encode it as % XY format

For example, we search for translation and C++ in Baidu respectively
insert image description here

Observing the above URL, we can find that there is a wd field to transmit our search keywords, the translation has not changed in the URL, and the two plus signs in C++ are escaped because of characters like + It is understood by url as a special meaning, and it will be escaped before transmission

We can use this escaping tool to escape our own string and see its form in the URL

HTTP protocol format

Whether it is a request or a response, http constructs the request or response in units of lines (\n)! Whether it is a request or a response, it is almost composed of 3 or 4 parts

How to understand the online behavior of ordinary users 1. Obtain the resources you want from the target server 2. Upload your data to the target server

HTTP request

$[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-jbw4HkaW-1674359312025) (C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\ image-20230117103231286.png)]$

The first part of the request line (the first line) consists of request method + url (the content after removing the domain name) + http protocol version + \n

The second part is the request header (Header), a colon-separated key-value pair; each group of attributes is separated by \n; a blank line indicates the end of the Header part

The third part is blank\n

The fourth part of the request body (Body) (if any) is the data submitted by the user. The request body is allowed to be an empty string. If the body exists, there will be a Content-Length attribute in the Header to identify the length of the Body

The first three parts are called HTTP request headers, and the fourth part becomes the HTTP payload

HTTP response

$[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-PsDLkPxb-1674359312026) (C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\ image-20230117104512531.png)]$

The first part of the status line (the first line) is described by the http protocol version + status code + status code

The second part is the request header (Header), a colon-separated key-value pair; each group of attributes is separated by \n; a blank line indicates the end of the Header part

The third part is blank\n

The fourth part is the response body (Body) (if any) The data submitted by the user, the response body is allowed to be an empty string. If the body exists, there will be a Content-Length attribute in the Header to identify the length of the response body

The first three parts are called HTTP response headers, and the fourth part becomes the HTTP payload

think:

1. How to unpack HTTP responses or requests, and how to share them

2. How is the HTTP request or response read? send as string

3. How is the HTTP request sent? read as string

4. How are http request and http response treated? string

How to unpack: We treat the request and response as a large string, and the blank line is a special character in the HTTP protocol. Use a blank line to distinguish the HTTP header from the payload. When we read the data line by line , when there is no data in this line and only \n, we know that the HTTP header has been read, and the next part is the payload

How to share: This is not solved by http, but by specific application codes. HTTP needs an interface to help the upper layer obtain parameters

HTTP method

$[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-LW0qNJhh-1674359312026) (C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\ image-20230118101413117.png)]$

Protocol support does not mean that the server supports this method, the server will support various methods according to its own situation

Common HTTP status codes

$[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-kz8V9iYA-1674359312026) (C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\ image-20230118100201726.png)]$

The most common status codes, such as 200 (OK), 404 (Not Found), 403 (Forbidden permission is too low to access), 301 (permanent redirection), 302or307 (Redirect, temporary redirection), 504 (Bad Gateway).

Permanent redirection: For example, if a website is updated and the URL is also updated, he sets a permanent redirection on the original URL, so that you can jump to the new page when you visit the old URL

Temporary redirection: For example, when placing an order on Meituan, it will jump to the order interface from Meituan, and it will automatically jump back to the original interface after we place the order

The application layer requires people to participate, and the level of people is uneven. Many people don’t know how to use the http status code at all, and because there are so many browsers, everyone’s support for the status code is not particularly good. Sometimes you write an error The status code can still be displayed. So now the 404 status code has no guiding significance to the browser, and the browser just displays your webpage normally

HTTP Common Header

Content-Type: 数据类型(text/html等)
Content-Length: Body的长度
Host: 客户端告知服务器, 所请求的资源是在哪个主机的哪个端口上;
User-Agent: 声明用户的操作系统和浏览器版本信息;
referer: 当前页面是从哪个页面跳转过来的;
location: 搭配3xx状态码使用, 告诉客户端接下来要去哪里访问;
Cookie: 用于在客户端存储少量信息. 通常用于实现会话(session)的功能;
Conection: 1.0只有短链接，HTTP1.1版本之后支持长链接

Conection: keep-alivelong link

All the experiments we did before were to request and respond to disconnect links, but there are many resources on a server, and a large webpage is composed of many resources, and each resource needs to initiate an http request. However, http/1.1 launched a long link version, and the link between the two parties is only established once, and the link is closed after all resource interactions are completed. To achieve the purpose of improving efficiency by reducing the frequent establishment of TCP Lina sister

Build a simple HTTP server

Simple front-end page design

This website contains a simple front-end and Mina HTML writing tutorial w3cschool , I made a simple homepage HTML interface through the form of this website

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
    </head>
    <body>
        <h3>hello net</h3> 
        <h5>hello 我是表单</h5>
        <from action="/a" method = "POST">
            姓名:<input type="text" name="name"><br/>
            密码:<input type="password" name="passwd"><br/>
            <input type="submit" value="登录"> <br/>
        </from>
    </body>
</html>

HTTP server construction

//这里使用了专门的网络写入读取接口
ssize_t recv(int sockfd, void *buf, size_t len, int flags);  
ssize_t send(int sockfd, const void *buf, size_t len, int flags);

This group of interfaces is exactly the same as our read and write interfaces. It falgsis enough to set the last parameter to 0. For other usages, you can use the man recvcommand to view the document.

#include "Sock.hpp"
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fstream>

#define WWWROOT "./wwwroot/"    //根目录
#define HOME_PAGE "index.html"	//首页

void Usage(std::string proc){
    
    
    std::cout << "Usage " << proc << " port " << std::endl; 
}

void* HandlerHttpRequest(void* args){
    
    
    int sock = *(int*)args;
    delete (int*)args;
    
    pthread_detach(pthread_self());
#define SIZE 1024 * 10
    char buffer[SIZE];
    memset(buffer, 0, SIZE);
    ssize_t s = recv(sock, buffer, sizeof(buffer) - 1, 0);
    if (s > 0){
    
    
        buffer[s] = 0;
        std::cout << buffer << std::endl;                 //查看浏览器发来的HTTP请求
        //构建HTTP响应
        // std::string http_response = "http/1.0 200 OK\n";  //构建响应状态行
        // http_response += "Content-Type: text/plain\n";    //正文的属性
        // http_response += "\n";                            //空行
        // http_response += "hello net!";                    //正文  

        std::string html_file = WWWROOT;
        html_file += HOME_PAGE;
        // 返回的时候不仅仅是返回正文网页信息，而是要包括http请求
        std::string http_response = "http/1.0 200 OK\n";
        // 正文部分的数据类型
        http_response += "Content-Type: text/html; charset=utf8\n";
        struct stat st;
        stat(html_file.c_str(), &st);
        http_response += "Content-Length: ";
        http_response += std::to_string(st.st_size);
        http_response += "\n";
        http_response += "\n";
        //std::cout << http_response << std::endl;
        //响应正文
        std::ifstream in(html_file);
        if (!in.is_open()){
    
    
            std::cerr << "open html error!" << std::endl;
        }
        else {
    
    
            std::cout << "open success" << std::endl;
            std::string content;
            std::string line;
            while (std::getline(in, line)){
    
    
                content += line;
            }
            //std::cout << content << std::endl;
            http_response += content;
            //std::cout << http_response << std::endl;
        }
        send(sock, http_response.c_str(), http_response.size(), 0);
    }
    close(sock);
    return nullptr;
}
int main(int argc, char* argv[]){
    
    
    if (argc != 2) {
    
     Usage(argv[0]); return 1;}
    uint16_t port = atoi(argv[1]);
    int listen_sock = Sock::Socket();
    Sock::Bind(listen_sock, port);
    Sock::Listen(listen_sock);

    for ( ; ; ){
    
    
        int sock = Sock::Accept(listen_sock);
        if (sock > 0){
    
    
            pthread_t tid;
            int *psock = new int(sock);
            pthread_create(&tid, nullptr, HandlerHttpRequest, (void*)psock);
        }
    }
}

Here we start the HTTP server with port number 9090. Although the HTTP server generally uses port 80, this is just a common habit. It does not mean that the HTTP server cannot use other port numbers.

Run our program to receive HTTP requests from the network. We can use the browser to send requests to our program. We only need to enter our in the browser, and we 公网IP:端口号can see that the browser sends us The incoming http request is as follows

GET / HTTP/1.1                //请求行
Host: 101.43.252.201:8888	  //请求报头
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9
							 //空行 到此HTTP请求报头结束，这个请求没有正文

GET / HTTP/1.1                //请求行

You can see that the request method is followed by a /web root directory. We generally initiate a request to obtain a specific resource on a server. We can uniquely identify a resource by IP + path. If the requested path /is We want to request the homepage of the website

Common types of Content-Type can be viewed in this blogger's blog, which is very complete

How to judge that we have finished reading the header part? ? read blank line

After reading the header part, we can correctly extract various attributes of the header part, including Content-Length

Determine whether there is any text after the blank line, it is related to the request method

If it is the text, how to ensure that all the text is read and the data of the next HTTP is not read? ?
If there is a body, the header part consists of an attribute: Content-Length: len, indicating how many bytes the body consists of

GET and POST methods

GET method: Also known as the acquisition method, it is the most commonly used method. By default, all web pages are obtained by the GET method, but if you want to use the GET method to submit parameters, the parameters will be spliced through the URL and submitted to the server.

POST method: Also known as the push method, it is a common method for submitting parameters, but if submitting parameters, it is generally submitted through the body part, but don’t forget that Content-Length: XXX indicates the length of the body

POST vs GET

The location of the parameter submission is different, and the POST method is more private (private! = safe, safe = encrypted), and will not be echoed to the URL box of the browser! The get method is not private and will echo important information to the url input box, increasing the risk of being stolen

GET is passed through the URL, and the URL has a size limit! Specifically related to the browser. The POST method is passed by the body part, and the general size is not limited

If the submitted parameters are not sensitive and the number is very small, you can use the GET method; otherwise, use the POST method

Cookies and Sessions

In daily life, we find that when various pages jump, the essence is that the website still knows me after making various http requests. For example, I am a big member of station B, no matter how I browse videos at station B, he will not let me log in again, but the HTTP protocol is a stateless protocol, it only cares about whether this request is successful, and the previous requests does not keep any records

Letting the website know me is not a problem to be solved by the HTTP protocol itself. HTTP can provide some technical support to ensure that the website has a session retention function. A cookie is a session management tool

Browser: A cookie is actually a file, which stores our user's private information
http protocol: Once the website corresponds to a cookie, when any request is initiated, the cookie information will be automatically carried in the request

//Set-Cookie: 服务器向浏览器设置一个cookie
http_response += "Set-Cookie: id=1111111111\n"; 
http_response += "Set-Cookie: password=2222\n";

We add two lines to the HTTP response, use the browser to send a request to the server, and add the Set-Cookie attribute to the server's response header to set the browser's cookie file

$[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-Psmt84o1-1674359312027) (C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\ image-20230118193436875.png)]$

Use the browser to send the request again, and you can see that the request contains Cookie information

GET / HTTP/1.1
Host: 101.43.252.201:8889
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9
Cookie: id=1111111111; password=2222 //Cookie信息

Browser cookie file storage format: 1. file version 2. memory version

The difference is that after logging in to a website, closing the browser and then opening the website, whether the logged-in web page still recognizes you, if you still know it, it is saved in the file version, and if you don’t know it, it is saved in the memory version.

So if someone else steals our cookie file, others can access specific resources with my identity authentication. If our username and password are saved, it will be very bad . So simply using cookies has certain security risks, so Session comes out, but using Session doesn't mean we don't use cookies

The core idea of Session: save the user's private information on the server side

$[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-uEErO39O-1674359312028) (C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\ image-20230118200516017.png)]$

The browser sends the private information to the server, and the server authenticates the private information, builds a session file and saves it on the server, then generates a unique session_id through the session file, and sets the session_id to the cookie file in the browser . When we log in again, the browser will automatically carry the cookie information (session_id) to send the request, and the subsequent Server can still recognize the Client

The user's browser no longer saves private information, so even if the user's cookie information is leaked, others will not get the user name and password (private information), but the risk of cookie file leakage still exists, and others can still use the cookie file to visit us The websites visited , because the cookie file is on the user's computer, and the user lacks protection awareness.

** For these risks, Internet companies have also made some derivative defense measures, such as logging in from another place and re-entering the account password. **For example, there are some illegal telecom fraud groups in Myanmar. They stole the QQ account of a Beijinger and logged in in Myanmar. The login IP of this QQ account was still in Beijing one minute ago, but now the login IP is displayed as Myanmar. The system detects If there is an exception, the defense measures will be activated and you will be asked to re-enter the account password. In this case, the system will automatically determine that the account has been stolen and discard the original session file. If you re-enter the account password, the system will regenerate a new session file and session_id for you.

HTTPS

encryption and decryption layer

All the websites we can name use the HTTPS protocol. What is the difference between the **HTTP protocol and HTTPS? ? encryption**

HTTPS = HTTP + TLS/SSL (data encryption and decryption layer)

The encryption and decryption layer is at the bottom of the HTTP protocol layer . HTTP first accesses the TLS and SSL security layer interfaces, and then the security layer interface calls the system call interface, socket interface, and the data will be encrypted/decrypted after being read by the system call interface. operate. So HTTPS is the HTTP protocol plus an encryption and decryption layer, and these two layers are collectively referred to as HTTPS . And in most cases only user data (payload) will be encrypted, other data is not necessary

two encryption methods

1. Symmetric encryption, key (only one) X

Use the X key to encrypt and also use the X key to decrypt, for example:

data ^ X = result; // 加密
result ^ X = data; // 解密

2. Asymmetric encryption has a pair of keys: public key and private key

It can be encrypted with the public key but only decrypted with the private key or encrypted with the private key and decrypted only with the public key. The classic asymmetric encryption algorithm RSA generally speaking, the public key is open to the world, and the private key must be kept privately!

How to confirm that the text remains intact after being transmitted over the network, and detect whether it has been tampered with

We can use the Hash hash algorithm to process the text to form a fixed-length, unique string sequence called data summary or data fingerprint. (Hash hash algorithm, as long as there is only one punctuation difference in the text, it will also generate a very different hash result) , and then perform encryption algorithm processing on the data summary to generate a data signature. Send the data signature along with the text to another host over the network. After receiving the data, the other end decrypts the data signature to obtain data fingerprint 1, and then processes the text with Hash hash algorithm to generate data fingerprint 2. If the two data fingerprints are the same, the text has not been tampered with

$[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-nVrWi6b1-1674359312028) (C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\ image-20230118215816713.png)]$

Practical Applications of Encryption

So in our daily life, do we use symmetric encryption or asymmetric encryption? ?

If we use symmetric encryption, how should we deploy the key on both servers? ? We can solve this by pre-installing all symmetric keys on the machine in advance, but the cost of pre-installing is very high. If each server requires us to manually install the key, is it too troublesome? Can I use the Internet to download? Downloading is equal to network communication, so does the download key need to be encrypted? ? Even if both parties have a key, how should they negotiate which key to use? ? Negotiating the key for the first time has absolutely no encryption . So directly using symmetric encryption is actually insecure

$[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-NGsbpN6t-1674359312029) (C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\ image-20230122095057153.png)]$

If asymmetric encryption is used, two pairs of asymmetric keys are usually used to ensure communication security. Both the client and the server have their own public key and private key. During the phase of negotiating the public key, the server and the client send the public key to each other (the data is not encrypted). Then enter the communication stage, the client's data is encrypted with S's public key S and sent to the server, and then the server uses its own private key S` to decrypt to obtain the data. Similarly, the server encrypts the response using the client's public key C, and then transmits the encrypted data to the client, and the client decrypts it with its own private key to obtain the response data . A pair of keys ensures communication security in one direction, but any asymmetric encryption method However, there is a risk of illegal theft, and the asymmetric encryption algorithm is particularly time-consuming and inefficient

Asymmetric + symmetrical schemes used in real life

$[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-UGyZLgUB-1674359312029) (C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\ image-20230122100427830.png)]$

Negotiation key stage: Server sends its own public key S (unencrypted) directly to Client, Client automatically generates a symmetrically encrypted key X, encrypts it with public key S and sends it to Server, and Server uses its own private key S` to decrypt , the symmetric key X is obtained
Communication phase: Both Server and Client know the symmetric key X, and the two parties communicate through the symmetric key X

man-in-the-middle attack

In the network link, there may be a man-in-the-middle to spy on and modify our data at any time. When the server and client are negotiating keys, they may receive man-in-the-middle attacks

$[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-AqjTDN20-1674359312029) (C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\ image-20230122102008467.png)]$

The middleman intercepts the unencrypted public key S sent by the server to the client, modifies it into its own public key M, and then sends it to the client. The client automatically generates a symmetric key X, and then encrypts X with the M public key and sends it to the server. At this time, even if the server obtains the encrypted X data, it cannot complete the decryption, because the server does not have M` . If the encrypted X information is intercepted or peeped by the middleman again, he can use the private key to decrypt and obtain the symmetric key X. In this way, the intermediary establishes a communication with the client based on the symmetric key X, and the response received by the user is changed from being constructed by the server to one constructed by the intermediary, and the user's data is completely leaked.

Essential problem: Client cannot determine whether the key negotiation message is sent by a legitimate server

CA certificate authority

$[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-2V877F6T-1674359312030) (C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\ image-20230122114734557.png)]$

As long as a service provider is certified by an authoritative organization, the organization is legal. The CA organization is the authoritative certification authority in the network (it also has its own public key and private key)

The service provider must provide its own basic information (such as: domain name, public key, etc.) to apply for a CA certificate, and the CA institution will create a certificate based on this information. The basic information of the enterprise is a piece of text, and the CA organization will use Hash hash data to generate data fingerprints, and then **encrypt with the private key of the CA organization (important! Important! Important!)** to generate the digital signature of the company, Then build the certificate and issue it to the legitimate service provider

$[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-yYKZj5xz-1674359312030) (C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\ image-20230122112048895.png)]$

Certificate: The digital signature generated by the CA organization to the enterprise + the enterprise provides the basic information used to apply for the certificate

Therefore, during the key negotiation stage between the server and the client, the server only needs to send the CA certificate to the client, because the basic information of the enterprise in the certificate contains the server public key S. After receiving the certificate, the client uses the public key of the CA organization to decrypt the data signature, and then hashes the basic information of the enterprise to compare whether the two data fingerprints are the same. If they are the same, it means that the data source is legal , and proceed to the next step operate.

Note: The client uses the public key of the CA organization to decrypt the data signature. How does the client know the public key of the CA organization? ?

Generally, it is built-in. When we download the browser, it will automatically help us build the public key of the CA organization. Moreover, there is not only one CA institution. If Company C provides a CA certificate to the server of an enterprise, and Company A trusts Company B, and Company B trusts Company C, it means that this enterprise is trusted by both ABC and ABC.

A small part is that when accessing the URL, the browser may prompt the user to install

So can the middleman intercept your certificate information? ? of course can.

But can a man in the middle modify the public key in the certificate? ? Absolutely not, because if the basic information of the enterprise is modified, the generated data fingerprint will be different from the original one. As long as the client decrypts the data signature and hashes the modified information, it can be found that the fingerprint is different and the data has been modified.

Can it replace the enterprise public key information and replace the data signature? ? No, because the data signature is encrypted by the private key of the CA organization, and the intermediary will never know the private key information of the CA organization. If you use your own private key to encrypt and generate a data signature, you cannot use the public key of the CA organization to decrypt it.

What if the intermediary is also a legal server and has its own CA certificate? ? The CA certificate not only contains the public key of the enterprise, but also contains information such as the domain name. The client clearly sends a request to www.baidu.com, but the response is actually www.qq.com, which makes you very uncomfortable. Reasonably, the storage of the certificate prevents the middleman from modifying the data sent by the server to the client, and once modified, it will be detected

The CA agency uses the private key to generate data signatures, which successfully prevents the middleman from modifying the server's information data, and impersonating the server to communicate with the client. However, the middleman also knows the public key of the CA organization, and it also knows the public key of the server. The symmetric key X data to decrypt. So it still knows the content of our web communication, but it cannot modify it

Decryption, how does the client know the public key of the CA organization? ?

Generally, it is built-in. When we download the browser, it will automatically help us build the public key of the CA organization. Moreover, there is not only one CA institution. If Company C provides a CA certificate to the server of an enterprise, and Company A trusts Company B, and Company B trusts Company C, it means that this enterprise is trusted by both ABC and ABC.

A small part is that when accessing the URL, the browser may prompt the user to install