[Project] Lightweight HTTP server

1. Project introduction

This project implements an HTTP server. The main function of the project is to read and analyze the HTTP request sent by the client through the basic network socket, and finally construct the HTTP response and return it to the client.
This project uses the CS model to implement a lightweight HTTP server, with the purpose of understanding the processing of the HTTP protocol.

Technologies involved:
C/C++, HTTP protocol, Socket programming, CGI, singleton mode, mutex, condition variable, multithreading, thread pool and other technologies.

2. Pre-knowledge

The blogger's previous article on the HTTP protocol has been introduced in detail: [Network Programming] Application Layer Protocol - HTTP Protocol

2.1 URI、URL、URN

Here are some additional knowledge points:

  • Definition of URI, URL, URN

URIUniform Resource Identifier: Used to identify a unique resource.
URLUniform Resource Locator: Used to locate unique resources.
URNUniform Resource Name: Identify resources by name.

URI is just to ensure the uniqueness of the resource, and the URL must not only ensure the uniqueness but also allow us to find this resource.

2.2 CGI

When we make a network request, there are two situations
: 1️⃣ Get resources from the server
2️⃣ Submit data to the server

Usually, the request method corresponding to obtaining resources from the server is the GET (parameter passing through URL) method, and the request method corresponding to uploading data to the server is the POST (passing parameters through the text) method.

Obtaining the data is only the first step, and the data must be processed . How to deal with the data?

Process data in CGI mode.
CGI (Common Gateway Interface) is an important Internet technology that allows a client to request data from a web browser to a program executing on a web server. CGI describes a standard for transferring data between servers and request handlers.

2.2.1 The concept of CGI

The actual processing of data has little to do with HTTP, but depends on the specific business scenarios of the upper layer, so HTTP does not process these data. However, HTTP provides a CGI mechanism. The upper layer can deploy several CGI programs in the server. These CGI programs can be written in any programming language. When HTTP obtains data, it will submit it to the corresponding CGI program for processing, and then use the CGI program The processing results build an HTTP response and return it to the browser.

CGI is also a running program , the logic diagram is as follows:
insert image description here
Details:

How to call the target CGI program, how to transfer data to the CGI program, and how to get the processing result of the CGI program

  • When do I need to use CGI mode?

As long as the user uploads data when requesting the server, the server needs to use the CGI mode to process the data uploaded by the user.
However, if the user simply wants to request a certain resource file on the server, it is not necessary to use the CGI mode, and the resource file requested by the user can be directly returned to the user.
In addition, if the user requests an executable program on the server, it means that the user wants the server to run the executable program, and the CGI mode also needs to be used at this time.

2.2.2 Realization of CGI mode

If you want one program to execute another program, it is obvious to use program replacement, but if the program is replaced directly, then the code and data of the server will be replaced, so the first step: 1️⃣ Create a subprocess for program
replacement

The next question to think about is that the CGI program is to help us process data, so how does the CGI program get the data? Because the server process and the CGI process here are parent-child processes, it is preferred to use anonymous pipes .
But anonymous pipes can only communicate in one direction, here is to transfer data to each other, so we can use two anonymous pipes.
2️⃣ Create two anonymous channels to complete data transmission

The parent and child processes each use two variables to record the file descriptors corresponding to the read and write ends of the pipe, but for the child process, when the child process executes the exec program replacement, the recorded file descriptor will be lost (the program replacement will replace code and data).

After the child process replaces the process program, the two anonymous pipes created at the bottom still exist , but the replaced CGI program does not know the file descriptors corresponding to the two pipes.

Solution:

In the replaced CGI program, reading data from standard input is equivalent to reading data from a pipeline, and writing data to standard output is equivalent to writing data to a pipeline.

This is done by redirecting the child process before it is replaced .
3️⃣ Redirect operation

Now that the communication channel is established, the data delivery is about to be performed.
First of all, where is the data of the parent process?

Because there are two request methods, GET and POST, the parameters may be in the url or in the body. So there are two cases to discuss.
If the request method is the GET method , the user passes the parameters through the URL. Generally, the length of the parameters is relatively short, and the efficiency of reading through the pipeline is low . At this time , the parameters can be imported into the environment variable through the putenv function before the child process replaces the process program. , since environment variables are not affected by process program replacement, the replaced CGI program can obtain corresponding parameters through the getenv function.
If the request method is the POST method , then the user passes the parameters through the request body. At this time, the parent process directly writes the data in the request body to the pipeline and passes it to the CGI program, but in order for the CGI program to know how much it should read from the pipeline parameter, the parent process also needs to import the length of the request body into the environment variable through the putenv function.

But how does the child process know whether it is reading from a pipe or from an environment variable?

The child process also needs to know the request method . The method is through environment variables.

The summary is as follows:
when using CGI mode, if the request method is the POST method, then the CGI program needs to read the data passed by the parent process from the pipeline; if the request method is the GET method, then the CGI program needs to obtain the data passed by the parent process from the environment variable data.
Before the child process replaces the process program , it is also necessary to import the request method corresponding to this HTTP request into the environment variable through the putenv function

This whole process is the fourth step:
4️⃣ The parent-child process delivers data

2.2.3 Significance of CGI

The CGI mechanism is to let the server hand over the obtained data to the corresponding CGI program for processing, and then return the processing result of the CGI program to the client.
The CGI mechanism makes the data input by the browser finally handed over to the CGI program, and the output result of the CGI program is finally handed over to the browser. This means that developers of CGI programs can completely ignore the processing logic of the intermediate server .
It is equivalent to that the CGI program can read the content input by the browser from the standard input, and the data written by the CGI program to the standard output can finally be output to the browser. The communication details in the middle are all done by HTTP, and CGI does not pay attention.

3. Project design

3.1 Log writing

During the running of the server, if we want to see some events generated by the server, we can use the log.

The expected results are as follows:

[日志等级][时间][信息][错误文件][行数] 

Note:
The log levels are divided into four categories:

#define NORMAL  0// 正常    
#define WARNING 1// 警告    
#define ERROR   2// 错误    
#define FATAL   3// 致命错误

Time : Print the current time.
Info : Log information generated by the event.
Error file : which file the event occurred in.
Line number : Which line of the corresponding file is the event generated.

source code:

#define NORMAL  0// 正常
#define WARNING 1// 警告
#define ERROR   2// 错误
#define FATAL   3// 致命错误

#define LOG_NOR "log.txt"
#define LOG_ERR "log.error"

const char* to_string_level(int level)
{
    
    
    switch(level)
    {
    
    
        case NORMAL: return "NORMAL";
        case WARNING: return "WARNING";
        case ERROR: return "ERROR";
        case FATAL: return "FATAL";
        default : return nullptr;
    }
}

#define LOG(level, format) logMessage(level, __FILE__, __LINE__, format)

void logMessage(int level, std::string file_name, int line,  std::string format)
{
    
    
    // [日志等级][时间][信息][错误文件][行数]
    char logprefix[1024];
    time_t now;
    time(&now);
    struct tm *ptm = localtime(&now);
    char timebuf[1024];
    snprintf(timebuf, sizeof timebuf, "%d年%d月%d日 %d:%d:%d", ptm->tm_year + 1900, ptm->tm_mon + 1, ptm->tm_mday, ptm->tm_hour, ptm->tm_min, ptm->tm_sec);
    snprintf(logprefix, sizeof logprefix, "[%s][%s]", to_string_level(level), timebuf);
    std::cout << logprefix << "[" << format << "]" << "[" << file_name << "]" << "[" << line << "]" << std::endl;
}

3.2 Socket writing

It is enough to bind the server to the port. You don’t need to display the binding IP . You can directly set the IP address to INADDR_ANY. It means to randomly bind the ip address of the host. You cannot bind the public network IP and private network IP here. If The public network IP is bound to the cloud server, because this is the IP virtualized by the cloud server manufacturer and cannot communicate.

Because the server is globally unique, it can be set to a singleton mode.

#define BACKLOG 5 // 等待队列的最大长度

class TCPServer
{
    
    
public:
    // 获取单例对象
    static TCPServer* GetSingle(int port)
    {
    
    
        // 静态锁,不用调用init初始化和销毁锁
        static pthread_mutex_t Lock = PTHREAD_MUTEX_INITIALIZER;
        if(_TCPSingle == nullptr)
        {
    
    
            pthread_mutex_lock(&Lock);
            if(_TCPSingle == nullptr)
            {
    
    
                _TCPSingle = new TCPServer(port);
                _TCPSingle->InitServer();// 初始化服务器
            }
            pthread_mutex_unlock(&Lock);
        }
        return _TCPSingle;
    }

    void InitServer()
    {
    
    
        // 1.创建套接字
        // 2.绑定
        // 3.设置监听状态
        Socket();
        Bind();
        Listen();
        LOG(NORMAL, "Init Server Success");
    }
    
    // 创建套接字
    void Socket()
    {
    
    
        _listensock = socket(AF_INET/*网络通信*/, SOCK_STREAM/*流式套接字*/, 0/*协议*/);
        if(_listensock < 0)
        {
    
     
            LOG(FATAL, "Socket Error!");
            exit(1);
        }
        // 设置端口复用
        int opt = 1;
        setsockopt(_listensock, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof opt);
        LOG(NORMAL, "Creat Socket Success");
    }
    
    // 方便上层获得监听套接字
    int sock()
    {
    
    
        return _listensock;
    }
    
    // 绑定
    void Bind()
    {
    
    
        struct sockaddr_in local;
        // 初始化结构体
        bzero(&local, sizeof local);
        local.sin_family = AF_INET;
        local.sin_port = htons(_port);
        local.sin_addr.s_addr = INADDR_ANY;// 随机绑定IP
        if(bind(_listensock, (struct sockaddr*)&local, sizeof local) < 0)
        {
    
    
            // 绑定失败
            LOG(FATAL, "Bind Socket Error!");
            exit(2);
        }
        LOG(NORMAL, "Bind Socket Success");
    }

    // 监听
    void Listen()
    {
    
    
        if(listen(_listensock, BACKLOG) < 0)
        {
    
    
            // 监听失败
            LOG(FATAL, "Listen Socket Error!");
            exit(3);  
        }
        LOG(NORMAL, "Listen Socket Success");
    }

    ~TCPServer()
    {
    
    
        if(_listensock >= 0)
            close(_listensock);
    }
private:
    // 私有构造+防拷贝
    TCPServer(int port)
        : _port(port)
        , _listensock(-1)
    {
    
    }

    TCPServer(const TCPServer&)=delete;
    TCPServer* operator=(const TCPServer&)=delete;
private:
    int _port;// 端口号
    int _listensock;// 监听套接字
    static TCPServer* _TCPSingle;// 单例对象
};

 TCPServer* TCPServer::_TCPSingle = nullptr;

3.3 HTTP server implementation

In the writing of sockets, the work of creating sockets, binding ports, and setting the listening status is completed, and the next step is to obtain new connections .

#define PORT 8080

class HTTPServer
{
    
    
public:
    HTTPServer(int port = PORT)
        : _port(port)
    {
    
    }

    void InitServer()
    {
    
    
        tcp_server = TCPServer::GetSingle(_port);

    }
    
    // 启动服务器
    void Start()
    {
    
    
        LOG(NORMAL, "HTTP Start");
        // 监听套接字
        int listen_sock = tcp_server->sock();
        while(true)
        {
    
    
            struct sockaddr_in peer;
            socklen_t len = sizeof(peer);
            // 获取新连接 
            int sock = accept(listen_sock, (struct sockaddr*)&peer, &len);
            if(sock < 0)
            {
    
    
                continue; //获取失败,继续获取
            }
            LOG(NORMAL, "Accept Link Success");
            int* _sock = new int(sock);
            pthread_t tid;
            pthread_create(&tid, nullptr,Enter::Handler, _sock);
            pthread_detach(tid);// 线程分离

        }
    }

    ~HTTPServer()
    {
    
    }
private:
    int _port;
    TCPServer* tcp_server = nullptr;
};

Tell me why, because int* _sock = new int(sock);
it may be modified after being passed to the thread.

  • thread function
class Enter
{
    
    
public:
    static void *Handler(void* sock)
    {
    
    
        LOG(NORMAL, "Handler Request Begin");
        int _sock = *(int*)sock;
        delete (int*)sock;       
        EndPoint *ep = new EndPoint(_sock);
        ep->RecvHTTPRequest();
        ep->BuildHTTPResponse();
        ep->SendHTTPResponse();
        delete ep;
        LOG(NORMAL, "Handler Request End");
        return nullptr;
    }
};

The EndPoint class here is explained below.

3.4 HTTP request and response structure

  • HTTP request class
//HTTP请求
class Request
{
    
    
public:
    //HTTP请求内容
    std::string req_line;// 请求行
    std::vector<std::string> req_header;// 请求报头
    std::string req_blank;// 空行
    std::string req_body;// 请求正文

    // 请求行解析完之后的数据
    std::string method;// 请求方法
    std::string uri;// 请求资源
    std::string version;// 版本号

    std::unordered_map<std::string, std::string> header_kv;
    int content_length = 0;

    // uri: path?args
    std::string path;// 路径
    std::string args;// 参数
    std::string suffix;// 文件后缀
    int fd_size = 0;// 资源文件的大小

    bool cgi = false;// 是否使用CGI模式
};

After receiving the request, put the data into the member variables (request line, request header, blank line, request body), because the request line contains three fields, so add three member variables to store the request method and request resource and version number. In order to analyze the data in the request header later, header_kvstructure storage is used. The following member variables are mainly used to obtain information about requested resources and whether to use cgi mode.

  • HTTP response class
//HTTP响应
class Response
{
    
    
public:
    //HTTP响应内容
    std::string status_line;// 状态行
    std::vector<std::string> resp_header;// 响应报头
    std::string resp_blank = LINE_SEP;// 空行
    std::string resp_body;// 响应正文

    int status_code = OK;// 状态码
    int fd = -1;// 响应文件的fd    
};

Let’s talk about fd here: non-CGI text is saved in fd, and CGI is saved in _resp.body.

3.5 Implementation of the EndPoint class

3.5.1 Basic logic of EndPoint

The word EndPoint is often used to describe inter-process communication. For example, when the client communicates with the server, the client is an EndPoint, and the server is another EndPoint. Therefore, the class that processes the request is named EndPoint. The main function: read Analyze requests, build responses, IO communication.

basic structure:

//读取分析请求、构建响应、IO通信
class EndPoint
{
    
    
public:
    EndPoint(int sock)
        : _sock(sock)
    {
    
    }

    // 读取请求
    void RecvHTTPRequest()
    {
    
    }

    // 构建响应
    void BuildHTTPResponse()
    {
    
    }

    // 发送响应
    void SendHTTPResponse()
    {
    
    }

    ~EndPoint()
    {
    
    
        close(_sock);
    }
private:
    int _sock;
    Request _req;
    Response _resp;
};

You can see that the processing flow is:
1️⃣ Read the request
2️⃣ Build the response
3️⃣ Send the response

3.5.2 Read request

While reading the HTTP request, the HTTP request can be parsed. There are five steps here, namely, reading the request line, reading the request header and blank line, parsing the request line, parsing the request header, and reading the request body.

// 读取请求
void RecvHTTPRequest()
{
    
    
    // 读取请求行 + 请求报头
    RecvHTTpRequestLine();
    RecvHTTpRequestHeader();
    // 解析请求行 + 请求报头
    PraseHTTPRequestLine();
    PraseHTTPRequestHeader();
    // 读取请求正文
    RecvHTTPRequsetBody();
}
  • read request line

Here we must first know that there are three ways to end the request line:
\r, \n, \r\n.
From this, we know that we can't read directly by line, we need a custom method to read:
we can write a Utiltool class to ReadLinewrite the method
. The method adopted by this function:

Read character by character.
If the read character is \n, it means that the line separator is \n, and then \npush to the custom buffer and stop reading.
If the read character is \r, you need to continue to spy on whether the next character is \n, regardless of whether it is behind\n , it will be \npushed to the custom buffer and stop reading.

// 按行读取
static int ReadLine(int sock, std::string& out)
{
    
    
    char ch;
    do{
    
    
       // 一次读一个字符
       ssize_t n = recv(sock, &ch, 1, 0);
       if(n > 0)
       {
    
    
            // 读取成功
            if(ch == '\r')
            {
    
    
                // '\r' || "\r\n"
                // 窥探,只看不取
                recv(sock, &ch, 1, MSG_PEEK);
                if(ch == '\n')
                {
    
    
                    // 把"\r\n"中的'\r'覆盖掉
                    recv(sock, &ch, 1, 0);
                }
                else 
                {
    
    
                    // '\r' -> '\n'
                    ch = '\n';
                }
            }
            out.push_back(ch);
       }
       else if(n == 0)
       {
    
    
            // 对端关闭
            return 0;
       }
       else 
       {
    
    
            // 读取错误
            return -1;
       }
    }while(ch != '\n');
    return out.size();
}

Explain: The last parameter of the recv function MSG_PEEK: the recv function will return the data of the specified number of bytes in the header of the TCP receiving buffer, but will not take these data away from the TCP receiving buffer. This is called the data snooping function.

  • Read request headers and blank lines

Here is also read line by line, just use ReadLinethe function:

//读取请求报头和空行
bool RecvHTTpRequestHeader()
{
    
    
    std::string line;
    // 读取报头
    while(true)
    {
    
    
        line.clear();
        Util::ReadLine(_sock, line);
        if(line == "\n")
        {
    
    
            // 读取空行
            _req.req_blank= line;
            break;
        }
        line.resize(line.size() - 1);
        _req.req_header.push_back(line);
        LOG(NORMAL, line);
    }
}
  • parse request line

What this step does is to split the request method, URI, and HTTP version number in the request line, and for case compatibility, we convert the request method to all uppercase:

// 解析请求行
void PraseHTTPRequestLine()
{
    
    
    std::string& line = _req.req_line;
    std::stringstream ss(line);
    ss >> _req.method >> _req.uri >> _req.version;
    // 请求方法全部转大写字母
    std::transform(_req.method.begin(), _req.method.end(), _req.method.begin(), ::toupper);
}
  • Parsing request headers

Because the request header is name: vala structure, the kv key-value pair of each line should be put in header_kv, so that the corresponding value can be obtained through the attribute name later.

As for the segmentation here, you can also Utilencapsulate the cutting method in the tool class:

// 分割字符串
static bool CutString(const std::string& body, std::string& sub1, std::string& sub2, const std::string& sep)
{
    
    
    size_t pos = body.find(sep);
    if(pos == std::string::npos)
    {
    
    
        return false;
    }
    sub1 = body.substr(0, pos);
    sub2 = body.substr(pos + sep.size());
    return true;
}
// 解析请求报头
void PraseHTTPRequestHeader()
{
    
    
    std::string key;
    std::string val;
    for(auto& e : _req.req_header)
    {
    
    
        if(Util::CutString(e, key, val, ": "))
        {
    
    
            _req.header_kv.insert({
    
    key, val});
        }
    }
}
  • read request body

Here we need to analyze whether there is a body , because only the POST method may have a request body. If the request method is POST, we also need to know the length of the request body through the attributes
in the request header . to read the text.Content-Length

// 是否需要读取正文
bool IsNeedRecvHTTPRequsetBody()
{
    
    
    // 通过method判断是否有正文
    if(_req.method == "POST")
    {
    
    
        // 有正文
        auto it = _req.header_kv.find("Content-Length");
        if(it != _req.header_kv.end())
        {
    
    
            LOG(NORMAL, "POST Method, Content-Length: " + it->second);
            _req.content_length = atoi(it->second.c_str());
            return true;
        }
    }
    return false;
}

// 读取正文
void RecvHTTPRequsetBody()
{
    
    
    if(IsNeedRecvHTTPRequsetBody())
    {
    
    
        int len = _req.content_length;
        char ch;
        while(len--)
        {
    
    
            ssize_t n = recv(_sock, &ch, 1, 0);
            if(n > 0)
            {
    
    
                _req.req_body += ch;
            }
            else break;
        }
    }
    LOG(NORMAL, _req.req_body);
}

3.5.3 Building the response

After receiving all the requests above, the request must be processed first, but the process of processing the request may go wrong, no matter what type of error the client hopes to receive feedback, and build a response according to different status codes, so the following is defined status code:

// 状态码
#define OK 200
#define BAD_REQUEST 400// 请求方法不正确
#define NOT_FOUND 404// 请求资源不存在
#define SERVER_ERROR 500// 服务器错误

The first thing to do is parse the incoming request:

1️⃣ First judge the method, if the method fails, set the status code to , BAD_REQUESTand then directly construct the response.
If it is a GET method, there are two cases, one is that the URI carries parameters, and the other is that it does not carry parameters and uses non-CGI to process the request. If the parameters are carried, the request path and parameters must be extracted, and the parameters will be processed in CGI mode. If there are no parameters, just extract the request path .
If it is a POST method, first check whether there is a text, if there is no text, it means that there are no parameters, and the request is processed by non-CGI. If there is a text indicating that there are parameters, it will be processed in CGI mode.
2️⃣ Next, analyze the request path of the client. First, we need to splicing the WEB root directory before the path of the request, which is the root directory defined by ourselves, and judge whether the end of the path is correct. If it means that it is a directory, it is impossible to return the /entire /directory , so each directory must add a default resource index.html.
The resource requested by the client may be an executable program, how to judge?
Obtain the attribute information of the resource file requested by the client through the stat function. If the resource type is an executable program, it means that the subsequent processing needs to use the CGI mode.

Of course, we also need to know what type of resource is requested in order to build a response and return it.
We can judge what type it is by the suffix of the requested resource.
Then write a function to obtain the resource type through the suffix, which Content-Typewill be used when filling in later.

// 根据后缀提取资源类型
static std::string SuffixToDesc(const std::string& suffix)
{
    
    
    static std::unordered_map<std::string, std::string> suffix_to_desc = {
    
    
        {
    
    ".html", "text/html"},
        {
    
    ".css", "text/css"},
        {
    
    ".js", "application/x-javascript"},
        {
    
    ".jpg", "application/x-jpg"},
        {
    
    ".xml", "text/xml"}
    };
    auto it = suffix_to_desc.find(suffix);
    if(it != suffix_to_desc.end())
    {
    
    
        return it->second;
    }
    return "text/html"; //所给后缀未找到则默认该资源为html文件
}

Build a response flow:

// 构建响应
void BuildHTTPResponse()
{
    
    
    // 验证合法性
    struct stat st;
    int size = 0;// 资源大小
    size_t suf_pos = 0;// 找后缀
    if(_req.method != "GET" && _req.method != "POST")
    {
    
    
        LOG(WARNING, "Method Error!");
        _resp.status_code = BAD_REQUEST;
        goto END;
    }
    if(_req.method == "GET")
    {
    
    
        auto pos = _req.uri.find("?");
        if(pos != std::string::npos)
        {
    
    
            // uri携带了参数
            Util::CutString(_req.uri, _req.path, _req.args, "?"); 
            _req.cgi = true;
        }
        else
        {
    
    
            // uri没有携带参数
            _req.path = _req.uri;
        }
    }
    else if(_req.method == "POST")
    {
    
    
        // CGI处理数据
        _req.cgi = true;
        _req.path = _req.uri;
        // 无参数就不走CGI
        if(_req.content_length == 0)
        {
    
    
            _req.cgi = false;
        }
    }
    else
    {
    
    
        // do nothing
    }
    // 添加根目录
    _req.path = WEB_ROOT + _req.path;
    if(_req.path[_req.path.size() - 1] == '/')
    {
    
    
        // 添加首页信息
        _req.path += HOME_PAGE;
    }
    // 判断路径是否存在
    if(stat(_req.path.c_str(), &st) == 0)
    {
    
    
        // 资源存在
        if(S_ISDIR(st.st_mode))
        {
    
    
            // 请求的是一个目录
            _req.path += "/";
            _req.path += HOME_PAGE;
            // 重新获取属性
            stat(_req.path.c_str(), &st);
        }
        // 拥有者、所属组、其他 是否有可执行权限
        if( (st.st_mode & S_IXUSR) || (st.st_mode & S_IXGRP) || (st.st_mode & S_IXOTH) )
        {
    
    
            // 可执行程序
            _req.cgi = true;
        }
        _req.fd_size = st.st_size;
    }
    else
    {
    
    
        std::string msg = _req.path;
        msg += " Not Find";
        LOG(WARNING, msg);
        _resp.status_code = NOT_FOUND;
        goto END;
    }

    // 提取后缀以便确认资源类型
    suf_pos = _req.path.rfind(".");
    if(suf_pos == std::string::npos)
    {
    
    
        // 没找到到,设置默认
        _req.suffix = ".html";
    }
    else
    {
    
    
         _req.suffix = _req.path.substr(suf_pos);
    }
    
    if(_req.cgi == true)
    {
    
    
        // 要用CGI处理请求
         _resp.status_code = ProcessCGI();
    }
    else
    {
    
    
        // 非CGI方式处理请求
        // 返回静态网页 + HTTP响应
        _resp.status_code = ProcessNoCGI();
    }
END:
    // 根据状态码构建响应
    BuildHTTPResponseHelper();
}

illustrate:

stat is a system call function, which can obtain the attribute information of the specified file, including the inode number of the file, the permission of the file, the size of the file, and so on. If calling the stat function to obtain the attribute information of the file fails, it can be considered that the resource file requested by the client does not exist. At this time, directly set the status code to NOT_FOUND and stop processing.
Determine whether it is an executable file: As long as one of the owner, group, and other of a file has executable permissions, it means that it is an executable file.

3.5.3.1 CGI processing

The process of CGI processing has been mentioned above, and here are the details.
Because using the CGI mode means that there are parameters, the parameters need to be passed, and the results need to be returned, so two pipelines are needed.
From the perspective of the parent process, name these two pipes. The pipe used by the parent process to read data is called input, and the pipe used by the parent process to write data is called output.

insert image description here
But if we need to perform program replacementinput later, the pointers and file descriptors we saved outputwill be lost, because program replacement will replace all code and data, and the pipeline still exists at that time. In order for the child process to also obtain these two file descriptors, we can Redirect these two file descriptors to standard input and standard output.

Then writing the child process to the standard output is equivalent to writing to inputthe pipeline.

In addition, before the subprocess performs program replacement , various parameters need to be passed:

First, the request method needs to be imported into the environment variable through the putenv function, so that the CGI program can determine how to read the parameters passed by the parent process.
If the request method is the GET method, the parameters carried in the URL need to be passed to the CGI program by importing environment variables.
If the request method is a POST method, the length of the request body needs to be passed to the CGI program by importing environment variables, so that the CGI program can judge how many parameters should be read from the pipeline.

So far, if it is the GET method, the parameters have already been passed, and the POST method only knows the number of bytes to pass the parameters.
1️⃣ So the parent process is responsible for writing the parameters passed by the POST method into the pipeline.
2️⃣ The next step is to obtain the result of replacing the sub-process program with CGI: continuously call the read function to read the processing result written by the CGI program from the pipeline.

int ProcessCGI()
{
    
    
    LOG(NORMAL, "Process cgi method");
    int code = OK;
    // 可执行程序在path中
    auto& bin = _req.path;
    // 把方法导入环境变量
    std::string method_env;
    
    // 创建两个管道
    int input[2];
    int output[2];
    if(pipe(input) < 0)
    {
    
    
        LOG(ERROR, "Pipe Input Error!");
        code = SERVER_ERROR;
        return code;
    }
    if(pipe(output) < 0)
    {
    
    
        LOG(ERROR, "Pipe Output Error!");
        code = SERVER_ERROR;
        return code;
    }

    pid_t id = fork();
    if(id == 0)
    {
    
    
        // 子进程
        close(input[0]);
        close(output[1]);
        // 通过环境变量传递请求方法
        method_env = "METHOD=";
        method_env += _req.method;
        std::cout << "Method_env: " << method_env << std::endl;
        putenv((char*)method_env.c_str());

        if(_req.method == "GET")
        {
    
    
            // kv模型
            // 把参数导入环境变量
            std::string query_string_env = "QUERY_STRING=";
            query_string_env += _req.args;
            // 导入环境变量
            putenv((char*)query_string_env.c_str());
            LOG(NORMAL, "Get Method, Add QUERY_STRING");
        }
        else if(_req.method == "POST")
        {
    
    
            // 把数据大小导入环境变量
            std::string content_length_env = "CONTENT_LENGTH="; 
            content_length_env += std::to_string(_req.content_length);
            putenv((char*)content_length_env.c_str());
            LOG(NORMAL, "POST Method, Add CONTENT_LENGTH");
        }
        else
        {
    
    
            // Do Nothing
        }
        std::cout << "bin: " << bin << std::endl;
        // 重定向
        dup2(output[0], 0);
        dup2(input[1], 1);
        // 进行程序替换
        execl(bin.c_str(), bin.c_str(), nullptr);
        exit(1);
    }
    else if(id < 0)
    {
    
    
        LOG(ERROR, "Fork Error!");
        return NOT_FOUND;
    }
    else
    {
    
    
        // 父进程
        close(input[1]);
        close(output[0]);
        if(_req.method == "POST")
        {
    
    
            // 可能管道不够大,多次写入
            const char* start = _req.req_body.c_str();
            int total = 0;
            int size = 0;
            while(total < _req.content_length && (size = write(output[1], start + total, _req.req_body.size() - total)) > 0)
            {
    
    
                total += size;
            }
        }

        char ch;
        while(read(input[0], &ch, 1) > 0)
        {
    
    
            _resp.resp_body.push_back(ch);
        }
        int status;// 获取退出码
        pid_t ret = waitpid(id, &status, 0);
        if(ret == id)
        {
    
    
            if(WIFEXITED(status))// 进程退出是正常的
            {
    
    
                if(WEXITSTATUS(status) == 0)
                {
    
    
                    code = OK;
                }
                else
                {
    
    
                    code = BAD_REQUEST;
                }
            }
            else
            {
    
    
                // 进程不正常退出
                code = SERVER_ERROR;
            }
        }
        // 结束后关闭文件描述符
        close(input[0]);
        close(output[1]);
        
    }
    return code;
}

Explain:

WIFEXITEDYou can get whether the child process exited normally, and statusfill in different exit codes according to the value, so that you can build a response later.

3.5.3.2 Non-CGI processing

In fact, the non-CGI processing process is very simple, because there are no parameters, so the request must be a static web page, so we only need to return the resource + build the response. The construction response is finally constructed through the status code. Here we only need to consider how to return our static web page.

If we follow the normal method, it is to open our file, then read the content and copy it to it _resp.body, and then build the response and send it out, but there is a method that does not copy the data to _resp.body(no need to enter the user-level buffer), but directly in the kernel The area is copied and sent directly to the peer by the kernel.

Use sendfilefunction The function of this function is to copy data from one file descriptor to another, and this copy operation is done in the kernel.
But sendfileonce the file is sent, we should call it after building the response sendfile. Then our current work is only to open the target file to be sent, and save the file descriptor corresponding to the opened file to the fd of the HTTP response.

// 返回静态网页 + 响应
int ProcessNoCGI()
{
    
    
    _resp.fd = open(_req.path.c_str(), O_RDONLY);// 只读
    if(_resp.fd >= 0)
    {
    
    
        return OK;
    }    
    return NOT_FOUND;
}

The above has processed the data and obtained the result, and the next step is to construct the response according to the status code .
Regardless of whether the status code is correct, the status line must be filled in first. As for the response header, it depends on the situation:
1️⃣ If the status code is wrong, it must return a static web page, so all error status codes can be packaged with a function to fill in the header.
2️⃣ For the correct situation, it is also necessary to analyze the type of the body, because it may be processed by CGI or non-CGI, and the non- body is saved in fd, and CGI is saved in _resp.body , so fill in the header according to the situation.

// 对于错误直接返回的是页面
void HandlerError(const std::string& page)
{
    
    
    _req.cgi = false;// 保证最后发送的是网页
    _resp.fd = open(page.c_str(), O_RDONLY);
    if(_resp.fd > 0)
    {
    
    
        // 填写报头
        // 获取属性
        struct stat st;
        stat(page.c_str(), &st);
        _req.fd_size = st.st_size;
        std::string line = "Content-Type: text/html";
        line += LINE_SEP;
        _resp.resp_header.push_back(line);
        line = "Content-Length: ";
        line += std::to_string(st.st_size);
        line += LINE_SEP;
        _resp.resp_header.push_back(line);

    }
}

// 构建OK的响应
void BuildOkResponse()
{
    
    
    std::string line = "Content-Type: ";
    line += SuffixToDesc(_req.suffix);
    line += LINE_SEP;
    _resp.resp_header.push_back(line);
    // 正文大小
    line = "Content-Length: ";
    if(_req.cgi)
    {
    
    
        line += std::to_string(_resp.resp_body.size());
    }
    else
    {
    
    
        line += std::to_string(_req.fd_size);
    }
    line += LINE_SEP;
    _resp.resp_header.push_back(line);

}

// 根据状态码构建响应
void BuildHTTPResponseHelper()
{
    
    
    // 状态行
    _resp.status_line += HTTP_VERSION;
    _resp.status_line += " ";
    _resp.status_line += std::to_string(_resp.status_code);
    _resp.status_line += " ";
    _resp.status_line += CodeToDesc(_resp.status_code);
    _resp.status_line += LINE_SEP;
    // 响应报头
    std::string path = WEB_ROOT;// 路径
    path += "/";
    switch(_resp.status_code)
    {
    
    
        case OK:
            BuildOkResponse();
            break;
        case NOT_FOUND:
            path += PAGE_400;
            HandlerError(path);// 返回400页面
            break;
        case BAD_REQUEST:
            path += PAGE_404;
            HandlerError(path);// 返回404页面
            break;
        case SERVER_ERROR:
            path += PAGE_500;
            HandlerError(path);// 返回500页面
            break;
        default:
            break;
    }
}

As for the text, it is sent when the response is sent, because sendfilefunctions are used for non-CGI.

3.5.4 Send Response

Sending process:
1️⃣ Send status line, response header and blank line.
2️⃣ Regarding the text part, we need to see how it is handled. Because non-CGI text is saved in fd, and CGI is saved in _resp.body .
If it is the CGI method, just send the data to the peer directly.
If it is processed in a non-CGI manner or an error occurs during the processing, they return static web pages + responses, and the file descriptors corresponding to the resource files or error page files to be sent are stored in the fd of the HTTP response class. Just call sendfile to send it.

// 发送响应
void SendHTTPResponse()
{
    
    
    // 发送状态行
    send(_sock, _resp.status_line.c_str(), _resp.status_line.size(), 0);
    // 发送响应报头
    for(auto& it : _resp.resp_header)
    {
    
    
        send(_sock, it.c_str(), it.size(), 0);
    }
    // 发送空行
    send(_sock, _resp.resp_blank.c_str(), _resp.resp_blank.size(), 0);
    // 非CGI正文在fd中保存、CGI在_resp.body中保存
    // 发送正文
    if(_req.cgi)
    {
    
    
        size_t size = 0;
        size_t total = 0;
        const char* start = _resp.resp_body.c_str();// 起始
        while( total < _resp.resp_body.size() && (size = send(_sock, start + total, _resp.resp_body.size() - total, 0)) > 0 )
        {
    
    
            total += size;
        }
    }
    else
    {
    
    
        sendfile(_sock, _resp.fd, nullptr, _req.fd_size);
        close(_resp.fd);
    }  
}

3.6 Error Handling

3.6.1 Handling logic errors

Logical errors refer to the fact that the request has been read, but some logical errors are found, such as the wrong method of the request. For this type of error we want to return a response to the client.

3.6.2 Handling read errors

The error that occurs during the process of reading the request is called a read error, such as an error when calling recv to read the request or the other party's connection is closed when the request is read.
This means that the server does not read a complete request at all, so there is no need to return a response , let alone analyze the data, just stop processing.

Processing method:

Add a new member variable in the EndPoint class _stopto indicate whether to stop this processing.

Judge whether the read is successful in all read requests:

//本次处理是否停止
bool Stop()
{
    
    
    return _stop;
}

// 读取请求
void RecvHTTPRequest()
{
    
    
    // 读取请求行 + 请求报头
    if(!RecvHTTpRequestLine() && !RecvHTTpRequestHeader())// 都没出错
    {
    
    
        // 解析请求行 + 请求报头
        PraseHTTPRequestLine();
        PraseHTTPRequestHeader();
        // 读取请求正文
        RecvHTTPRequsetBody();
    }
}
    
//读取请求行
bool RecvHTTpRequestLine()
{
    
    
    if(Util::ReadLine(_sock, _req.req_line) > 0)
    {
    
    
        _req.req_line.resize(_req.req_line.size() - 1);
        LOG(NORMAL, _req.req_line);
    }
    else
    {
    
    
        _stop = true;
    }
    return _stop;
}

// 读取请求报头和空行
bool RecvHTTpRequestHeader()
{
    
    
    std::string line;
    // 读取报头
    while(true)
    {
    
    
        line.clear();
        if(Util::ReadLine(_sock, line) <= 0)
        {
    
    
            _stop = true;
            break;
        }
        if(line == "\n")
        {
    
    
            // 读取空行
            _req.req_blank= line;
            break;
        }
        line.resize(line.size() - 1);
        _req.req_header.push_back(line);
        LOG(NORMAL, line);
    }
    return _stop;
}

3.6.3 Handling write errors

When the response is constructed and returned to the client, the client disconnects the connection while the data is being sent, and a write error occurs.
When the other party closes the read file descriptor, if we are still writing, we will receive a signal SIGNALPIPE, and the server will exit directly.

We can ignore this signal when initializing the server .

//HTTP服务器
class HTTPServer
{
    
    
public:
    //初始化服务器
    void InitServer()
    {
    
    
        signal(SIGPIPE, SIG_IGN); // 忽略掉SIGNAL信号
    }
private:
	int _port; //端口号
};

3.7 Introduce thread pool

At present, our server creates a socket when obtaining a new connection, and then passes the socket to the thread for processing. After processing, the connection is disconnected and the thread is destroyed. Generally speaking, it is a short connection. The way.

In order to improve efficiency. The thread pool can be introduced:
About the thread pool has been introduced in the blogger's previous article: [linux] Realize the thread pool based on the singleton mode

A batch of threads and a task queue are pre-created on the server side, and each time a new connection is obtained, it is encapsulated into a task object and placed in the task queue .
Several threads in the thread pool continuously obtain tasks from the task queue for processing. If there is no task in the task queue, the thread enters a dormant state, and wakes up the thread for task processing when there is a new task.

Then the first step is to encapsulate the task class:
when the server obtains a new connection, it needs to be encapsulated into a task object and placed in the task queue. The task class first needs a socket, and also needs a callback function. When the thread in the thread pool gets the task, it can call this callback function to process the task .

// 任务类
class Task
{
    
    
public:
    Task()
    {
    
    }
   
    Task(int sock)
        :_sock(sock)
    {
    
    }

    //处理任务
    void ProcessOn()
    {
    
    
        _handler(_sock); //调用回调
    }
    
    ~Task()
    {
    
    }

private:
    int _sock;// 套接字
    CallBack _handler;// 回调函数
};

The next step is to deal with the callback function here. In fact, we have written the callback function before, which is the function executed by the previous thread. We can provide a functor to call it.

class CallBack
{
    
    
public:
    CallBack()
    {
    
    }

    ~CallBack()
    {
    
    }

    void operator()(int sock)
    {
    
    
        Handler(sock);
    }

    void Handler(int sock)
    {
    
    
        LOG(NORMAL, "Handler Request Begin");        
        EndPoint *ep = new EndPoint(sock);
        ep->RecvHTTPRequest();
        if(!ep->Stop())
        {
    
    
            LOG(NORMAL, "Recv Success");
            ep->BuildHTTPResponse();
            ep->SendHTTPResponse();
        }
        else
        {
    
    
            LOG(WARNING, "Recv Error");
        }
        delete ep;
    }
};
  • thread pool writing
#define NUM 6

class ThreadPool
{
    
    
public:

    // 获取单例
    static ThreadPool* GetSingle()
    {
    
    
        static pthread_mutex_t _mtx = PTHREAD_MUTEX_INITIALIZER;
        if(_single == nullptr)
        {
    
    
            pthread_mutex_lock(&_mtx);
            if(_single == nullptr)
            {
    
    
                _single = new ThreadPool();
                _single->InitThreadPool();
            }
            pthread_mutex_unlock(&_mtx);
        }
        return _single;
    }

    ~ThreadPool()
    {
    
    
        pthread_mutex_destroy(&_lock);
        pthread_cond_destroy(&_cond);
    }


    // 让线程在条件变量下进行等待
    void ThreadWait()
    {
    
    
        pthread_cond_wait(&_cond, &_lock);
    }

    // 让线程在条件变量下进行唤醒
    void ThreadWakeUp()
    {
    
    
        pthread_cond_signal(&_cond);
    }

    // 加锁
    void Lock()
    {
    
    
        pthread_mutex_lock(&_lock);
    }

    // 解锁
    void unLock()
    {
    
    
        pthread_mutex_unlock(&_lock);
    }

    bool TaskQueueIsEmpty()
    {
    
    
        return _task_q.empty();
    }

    // 线程执行函数
    static void* ThreadRoutine(void* args)
    {
    
    
        ThreadPool* tp = (ThreadPool*)args;
        while(true)
        {
    
    
            Task t;
            tp->Lock();
            while(tp->TaskQueueIsEmpty())
            {
    
    
                // 任务队列为空,线程休眠
                tp->ThreadWait();
            }
            tp->PopTask(t);// 获取任务
            tp->unLock();
            t.ProcessOn();// 处理任务
        }
    }

    // 初始化线程池
    bool InitThreadPool()
    {
    
    
        // 创建一批线程
        for(int i = 0; i < _num; i++)
        {
    
    
            pthread_t id;
            if(0 != pthread_create(&id, nullptr, ThreadRoutine, this))
            {
    
    
                // 创建失败
                LOG(FATAL, "Create ThreadPool Error!");
                return false;
            }   
        }
        LOG(NORMAL, "Create ThreadPool Success");
        return true;
    }

    // 推送任务
    void PushTask(const Task& task)
    {
    
    
        Lock();
        _task_q.push(task);
        unLock();
        // 一旦有了任务就可以唤醒线程进行处理了
        ThreadWakeUp();
    }

    // 获取任务
    void PopTask(Task& task)
    {
    
    
        task = _task_q.front();
        _task_q.pop();
    }
private:
    // 构造私有+防拷贝
    ThreadPool(int num = NUM)
        : _num(num)
    {
    
    
        // 初始化锁和条件变量
        pthread_mutex_init(&_lock, nullptr);
        pthread_cond_init(&_cond, nullptr);
    }

    ThreadPool(const ThreadPool&)=delete;
    ThreadPool& operator=(const ThreadPool&)=delete;
private:
    std::queue<Task> _task_q;// 任务队列
    int _num;// 线程数
    pthread_mutex_t _lock;// 锁
    pthread_cond_t _cond;// 条件变量
    static ThreadPool* _single;// 单例
};

ThreadPool* ThreadPool::_single = nullptr;

When the singleton object is obtained for the first time, the thread pool will create a batch of threads, but the task queue is empty at this time, so it will wait under the condition variable. Once the server pushes the task into the task queue, it will randomly wake up a thread .

4. Project testing

First, CGI needs to obtain the parameters we request:

// 获得参数
bool GetQueryString(std::string& query_string)
{
    
    
    std::string method = getenv("METHOD");
    if(method == "GET")
    {
    
    
        query_string = getenv("QUERY_STRING");
        return true;
    }
    else if(method == "POST")
    {
    
    
        // 通过环境变量得知该从标准输入读取多少字节
        std::cerr << "Content-Length: " << getenv("CONTENT_LENGTH") << std::endl;
        int content_length = atoi(getenv("CONTENT_LENGTH"));
        char ch;
        while(content_length--)
        {
    
    
            read(0, &ch, 1);
            query_string.push_back(ch);
        }
        return true;
    }
    else
    {
    
    
        return false;
    }
}

First obtain the request method through the environment variable.
If the request method is the GET method, continue to obtain the data passed by the parent process through the environment variable.
If the request method is the POST method, first obtain the length of the data passed by the parent process through the environment variable, and then read the data of the specified length from the standard input.

After CGI gets the data, it can process the data. Here we can perform addition, subtraction, multiplication and division operations:

// 分割字符串
static void CutString(const std::string& body, std::string& sub1, std::string& sub2, const std::string& sep)
{
    
    
    size_t pos = body.find(sep);
    if(pos != std::string::npos)
    {
    
    
        sub1 = body.substr(0, pos);
        sub2 = body.substr(pos + sep.size());
    }
}

int main()
{
    
    
    std::string query_string;
    GetQueryString(query_string);
    std::cerr << "query_string: " << query_string << std::endl;
    // x=10&y=20
    //切分
    std::string left;
    std::string right;
    CutString(query_string, left, right, "&");
    std::cerr << "left: " << left << std::endl;
    std::cerr << "right: " << right << std::endl;
    std::string name1, val1;
    std::string name2, val2;
    CutString(left, name1, val1, "=");
    CutString(right, name2, val2, "=");
    
    //处理数据
    int x = atoi(val1.c_str());
    int y = atoi(val2.c_str());
    std::cout << "<html>";
    std::cout << "<head><meta charset=\"UTF-8\"></head>";
    std::cout << "<body>";
    std::cout << "<h3>" << x << "+" << y << "=" << x+y << "</h3>";
    std::cout << "<h3>" << x << "-" << y << "=" << x-y << "</h3>";
    std::cout << "<h3>" << x << "*" << y << "=" << x*y << "</h3>";
    std::cout << "<h3>" << x << "/" << y << "=" << x/y << "</h3>"; //除0后子进程返回错误状态码
    std::cout << "</body>";
    std::cout << "</html>";
    return 0;
}

4.1 GET method upload data test

We can w3Schoolreplicate a form on a website:

Forms in HTML are used to collect user input. We can specify the form submission method by setting the method attribute of the form, and specify which CGI program on the server the form needs to be submitted to by setting the action attribute of the form.

<html>
<body>
    <head>
        <meta charset="UTF-8" />
    </head>

<form action="/test_cgi" method="GET">
x: <input type="text" name="data_x" value="0">
<br>
y: <input type="text" name="data_y" value="0">
<br><br>
<input type="submit" value="提交">
</form> 

<p>点击提交,表单数据发送给CGI</p>

</body>
</html>

insert image description here

In this way, when we click submit when requesting, we can submit the parameters to the CGI program we wrote ourselves.

  • Why is there a size limit for parameters submitted by the GET method?

Because the GET method passes parameters to the child process through environment variables , it is destined that the parameters cannot be too long.

4.2 POST method upload data test

When the test form uploads data through the POST method, you only need to change the method attribute in the form to "post".
insert image description hereinsert image description here

You can see that the parameters are placed in the body.

Of course, if there is a division by 0 error:
insert image description here
the child process will exit abnormally, the exit code is set, and the parent process analyzes the exit code to get a server processing error and return a 500.htmlstatic web page.

5. Project source code

gitee:https://gitee.com/yyh1161/http-server

Guess you like

Origin blog.csdn.net/qq_66314292/article/details/131870925