Design a concurrent Web service using Rust: Commonly used Rust libraries such as Tokio, Hyper, etc., implement a simple concurrent Web server based on the TCP/IP protocol stack, and explain how to program a concurrent Web server based on specific code.

Author: Zen and the Art of Computer Programming

1 Introduction

In 1994, the Internet bubble burst, and a group of outstanding programmers and engineers joined the field of web development. Among them, the Rust language has attracted much attention. It is a modern system programming language that focuses on safety and concurrency. Therefore, Rust has become one of the most popular programming languages ​​today, and many frameworks have begun to use Rust for reconstruction, which makes Rust more and more popular.

In January 2017, Google released its Serverless computing product, which aims to achieve automatic expansion on demand and is mainly implemented by FaaS (Functions as a Service). In order to achieve this goal, it is necessary to build a high-performance, easily scalable, and scalable HTTP server. Therefore, in this context, the Rust language once again becomes worth learning.

This article will first lead readers to understand the concepts, characteristics and application scenarios of concurrent web servers. Then, by learning commonly used Rust libraries such as Tokio, Hyper, etc., based on the TCP/IP protocol stack, a simple concurrent web server was implemented, and combined with specific code to explain how to program a concurrent web server. This article will introduce the following knowledge points:

2. Concepts, characteristics and application scenarios of concurrent web servers

2.1 Concepts and features

Web server usually refers to computer software as a network server. Its main responsibility is to accept client requests, respond and return corresponding content. A traditional web server is a single-process, single-threaded application that processes requests serially. As server pressure increases, this single-process, single-thread approach cannot meet demand, and a multi-process, multi-thread multi-process model has emerged. However, this multi-process and multi-thread model also has the problem of resource competition and cannot effectively utilize multi-core CPU resources. On the other hand, each client request requires the creation and destruction of a new process or thread, resulting in a significant increase in the system overhead of the server.

The concurrent web server was proposed to solve the problems of low efficiency and low resource utilization of traditional web servers. A concurrent web server can handle multiple requests at the same time, with each request running in a different thread, thereby making full use of multi-core CPU resources and improving server throughput. In addition, the asynchronous IO model can also be used to optimize server performance and reduce request waiting time.

2.2 Application scenarios

Concurrent web servers have a wide range of application scenarios. Here are some typical application scenarios:

  • Large-scale concurrent access: Large websites, social media websites, e-commerce platforms, etc. are all typical application scenarios of concurrent web servers. Due to the huge number of visits, the server should be able to respond to a large number of user requests at the same time to ensure normal access to the website. For example, websites such as JD.com, Taobao, NetEase News, and Weibo all use concurrent web servers.

  • High real-time: In real-time application scenarios such as communications and finance, the server must be able to respond quickly to ensure the accuracy and integrity of business data. For example, in activities such as flash sales, it is necessary to respond to user requests in a timely manner to ensure the success rate of transactions.

  • Low latency: In real-time application scenarios such as search engines, live broadcasts, and instant messaging, the server response time cannot exceed 100ms, otherwise it will affect the user experience. For example, video websites such as YouTube and Facebook Messenger all use concurrent web servers.

  • Massive data processing: In some cases, the server needs to process massive amounts of data, which requires the server to have strong processing capabilities. For example, search engines need to process massive amounts of index data, and e-commerce platforms need to process massive order data.

  • Game server: Game servers are very sensitive to real-time response speed, so concurrent web servers are more suitable. For example, the game server of a well-known mobile game company uses a concurrent web server.

  • Other fields: When traditional server architecture cannot meet the needs, concurrent web servers can be used, such as the Internet of Things, blockchain and other fields.

3. Technology selection

In order to implement a high-performance, easy-to-expand, and scalable HTTP server, this article chose the Rust language and Tokio asynchronous runtime. Tokio provides a highly abstract asynchronous I/O interface that can be used to build high-performance I/O-intensive applications.

4. Project plan

4.1 HTTP request parser

The structure of an HTTP request is very complex, and the request header may be split into multiple segments. In order to facilitate subsequent processing, the HTTP request needs to be parsed first and necessary information extracted, such as request method, path, request parameters, etc. Therefore, we will implement an HTTP request parser.

Request parser principle

The request parser can parse HTTP requests according to HTTP protocol specifications, including request lines, request headers, and request bodies. The parsing process is as follows:

(1) Receive client request message;

(2) Parse the request line and obtain the request method, URL and other information;

(3) Parse the request header and obtain the request header fields and corresponding values;

(4) Determine whether there is a request body, and if so, read and save it to memory;

(5) Construct the request object and save relevant information;

(6) Return the request object.

Request parser implementation

use std::collections::HashMap;
use httparse::{
    
    Request, parse_request};
use bytes::BytesMut;

#[derive(Debug)]
pub struct HttpRequest {
    
    
    method: String, // 请求方法 GET / POST...
    path: String,   // 请求路径 /index.html?key=value&...
    headers: HashMap<String, String>,    // 请求头
    body: Option<Vec<u8>>,               // 请求正文
}

impl HttpRequest {
    
    
    /// 从 TCP Socket 中读取 HTTP 请求
    async fn read_from_socket(&mut self, mut reader: &mut dyn AsyncRead) -> Result<(), io::Error> {
    
    
        let mut buf = BytesMut::with_capacity(2 * 1024);
        
        loop {
    
    
            let n = reader.read_buf(&mut buf).await?;

            if n == 0 && buf.is_empty() {
    
    
                break;
            }
            
            match parse_request(&buf[..]) {
    
    
                Ok((nparsed, req)) => {
    
    
                    println!("Parsed request:
{:#?}", req);
self.parse(req)?;
return Ok(());
},
Err(_) => continue,
}

            // 如果请求还没有结束,则继续读取剩余部分
            buf.split_to(nparsed);
        }

        // 如果没有完整的请求,则认为客户端断开连接
        Err(io::ErrorKind::ConnectionReset.into())
    }

    fn parse(&mut self, req: Request) -> Result<(), ()> {
    
    
        // 解析请求行
        if req.method.len() > 0 {
    
    
            self.method = unsafe {
    
     String::from_utf8_unchecked(req.method.to_vec()) };
        } else {
    
    
            return Err(());
        }
        if req.path.len() > 0 {
    
    
            self.path = unsafe {
    
     String::from_utf8_unchecked(req.path.to_vec()) };
        } else {
    
    
            return Err(());
        }

        // 解析请求头
        for header in req.headers.iter() {
    
    
            let key = unsafe {
    
     String::from_utf8_unchecked(header.name.to_vec()) };
            let value = unsafe {
    
     String::from_utf8_unchecked(header.value.to_vec()) };
            self.headers.insert(key, value);
        }

        // 判断是否有请求正文
        if let Some(body) = req.body {
    
    
            self.body = Some(body.to_vec());
        }

        Ok(())
    }
}

4.2 Browser caching mechanism

When the browser sends an HTTP request to the server, it can set HTTP header fields such as Cache-Control and If-Modified-Since to control caching behavior. Among them, the Cache-Control header field specifies the caching rules that the request/response follows, such as public, private, max-age, etc.; the If-Modified-Since header field indicates that the client only wants caches that are newer than the specified date. The server receives When the request is made, check whether the file has changed, and only return a new response if there are changes.

Through the Cache-Control header field, the following caching strategies can be implemented on the server side:

(1) public: can be cached by all middleware;

(2) private: not allowed to be shared cache (such as CDN cache);

(3) no-cache: Each time you need to go to the origin site to verify the validity of the resource;

(4) max-age: the maximum validity period of the cache, in seconds;

(5) no-store: All content will not be cached.

Currently, the cache management module is still in its infancy and only supports some functions.

#[derive(Debug, Clone)]
enum CacheType {
    
    
Public,     // 可以被所有中间件缓存
Private,    // 不允许被共享缓存(比如CDN缓存)
NoCache,    // 每次需要去源站校验资源的有效性
MaxAge(i32),// 缓存的最大有效期,单位为秒
NoStore,    // 所有内容都不会被缓存
}

#[derive(Debug, Clone)]
pub struct CacheConfig {
    
    
cache_type: CacheType,
max_stale: i32,
min_fresh: i32,
no_transform: bool,
}

#[derive(Debug)]
pub struct CachedResponse {
    
    
status_code: u16,
version: String,      // "HTTP/1.1" 或 "HTTP/2.0"
headers: Vec<(String, String)>,   // 响应头
content: Vec<u8>,        // 响应正文
}

#[derive(Debug)]
pub struct HttpCacheManager {
    
    }

impl HttpCacheManager {
    
    
pub fn new() -> Self {
    
    
    Self {
    
    }
}

pub fn is_cached(&self, config: &CacheConfig, response: &CachedResponse) -> bool {
    
    
    true // 此处添加判断逻辑
}

pub fn save_response(&self, config: &CacheConfig, response: &mut HttpResponse) -> Result<bool, ()> {
    
    
    false // 此处添加存储逻辑
}
}

4.3 File processing

In HTTP servers, file uploading and downloading, cache search, static resource hosting, etc. are generally involved. Therefore, there is a need for a file processing module that is responsible for processing files in HTTP requests.

The file processing module needs to have the following functions:

  1. Supports Range request and can realize breakpoint resumption;
  2. Implement basic file permission verification and directory list display;
  3. Support compressed transmission;
  4. Supports virtual hosting and can decide which directories to serve based on domain names;

File processing principles

The file processing module locates the corresponding file and reads the content according to the URI in the HTTP request. The file reading process is given below:

  1. To locate the corresponding file based on the URI, you can consider using a virtual directory to map the directory to a specific location on the local disk. The configuration file of the virtual directory is usually saved on the server, so it is relatively simple to modify.
  2. Check file permissions and return an error page if the file is unreadable or does not exist;
  3. For GET request, read the file content and assemble the response message;
  4. For requests in HEAD mode, you only need to copy the response header, and there is no need to actually read the file content;
  5. For POST request, write the content of the file, or perform file upload operation;
  6. For PUT request, create a new file, write new content, or perform file upload operation;
  7. Request for DELETE mode to delete files.

File processing implementation

use std::collections::HashMap;
use tokio::fs::File;
use mime_guess::MimeGuess;
use hyper::{
    
    Body, Response, StatusCode};

#[derive(Debug, Clone)]
pub struct FileContext {
    
    
base_dir: PathBuf,          // 服务根目录
virtual_dirs: HashMap<String, PathBuf>,   // 虚拟目录配置
index_files: Vec<String>,              // 默认索引文件
}

impl Default for FileContext {
    
    
fn default() -> Self {
    
    
    Self {
    
    
        base_dir: "/var/www".into(),
        virtual_dirs: HashMap::new(),
        index_files: vec!["index.html", "default.htm"],
    }
}
}

impl FileContext {
    
    
/// 设置服务根目录
pub fn set_base_dir(&mut self, dir: &str) -> Result<(), std::io::Error> {
    
    
    self.base_dir = PathBuf::from(dir);
    Ok(())
}

/// 添加虚拟目录
pub fn add_virtual_dir(&mut self, name: &str, dir: &str) -> Result<(), std::io::Error> {
    
    
    self.virtual_dirs.insert(name.into(), PathBuf::from(dir));
    Ok(())
}

/// 删除虚拟目录
pub fn remove_virtual_dir(&mut self, name: &str) -> bool {
    
    
    self.virtual_dirs.remove(name).is_some()
}

/// 获取文件内容
pub async fn get_file_content(&self, uri: &str) -> Result<Option<Vec<u8>>, std::io::Error> {
    
    
    let file_path = self.get_file_path(uri)?;
    if!file_path.exists() ||!file_path.is_file() {
    
    
        return Ok(None);
    }
    let f = File::open(file_path).await?;
    Ok(Some(f.bytes().await?.collect()))
}

/// 获取文件路径
fn get_file_path(&self, uri: &str) -> Result<PathBuf, std::io::Error> {
    
    
    let (virtual_dir, real_path) = self.resolve_virtual_dir(uri)?;
    
    let mut p = self.base_dir.clone();
    if let Some(vd) = virtual_dir {
    
    
        p.push(vd);
    }
    p.push(real_path);
    Ok(p)
}

/// 解析虚拟目录和真实路径
fn resolve_virtual_dir(&self, uri: &str) -> Result<(Option<&str>, &str), std::io::Error> {
    
    
    let parts: Vec<_> = uri.trim_start_matches('/').split('/').collect();
    if parts.len() < 2 {
    
    
        return Err(std::io::Error::new(std::io::ErrorKind::NotFound, "Not Found"));
    }
    
    let vd = parts[0];
    let vp = parts[1..].join("/");
    if let Some(vdp) = self.virtual_dirs.get(vd) {
    
    
        return Ok((Some(&*vdp.display()), &vp));
    }
    Ok((None, &vp))
}
}

trait MimeExt {
    
    
fn to_mime_string(&self) -> String;
}

impl MimeExt for MimeGuess {
    
    
fn to_mime_string(&self) -> String {
    
    
    format!("{}", self.first_raw())
}
}

impl FileContext {
    
    
/// 返回默认索引文件内容
pub async fn get_default_page(&self) -> Option<Vec<u8>> {
    
    
    for filename in &self.index_files {
    
    
        if let Some(data) = self.get_file_content(filename).await? {
    
    
            return Some(data);
        }
    }
    None
}

/// 返回HTTP响应
pub async fn create_response(&self, uri: &str) -> Option<Response<Body>> {
    
    
    if let Some(data) = self.get_file_content(uri).await? {
    
    
        return Some(create_http_response(&uri, data)?);
    }
    None
}
}

fn create_http_response(uri: &str, data: Vec<u8>) -> Option<Response<Body>> {
    
    
let mut resp = Response::builder();

// 设置响应状态码和版本
let code = if uri.ends_with("/") {
    
    
    StatusCode::OK
} else {
    
    
    StatusCode::NOT_FOUND
};
resp.status(code);
resp.version("HTTP/1.1");

// 设置响应头
let ext = uri.rfind('.').unwrap_or(0);
let ct = if ext >= 0 {
    
    
    let mt = mime:: guess_mime_type(&uri[(ext + 1)..]);
    format!("{}; charset={}", mt, encoding_for_mime_type(mt))
} else {
    
    
    DEFAULT_MIME_TYPE.into()
};
resp.header("Content-Type", ct.as_str());
resp.header("Content-Length", data.len());
resp.header("Last-Modified", Utc::now().to_rfc2822());
resp.header("Accept-Ranges", "bytes");
resp.body(data.into()).ok()
}

const DEFAULT_MIME_TYPE: &str = "application/octet-stream";

fn encoding_for_mime_type(ct: &str) -> &'static str {
    
    
match ct {
    
    
    _ if ct.starts_with("text/") => "UTF-8",
    "image/" | "video/" | "audio/" => "binary",
    _ => "",
}
}

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132033830