[Project] Boost Search Engine

 

Table of contents

1. Project related background

2. The macro principle of the project

3. Technology stack and project environment

4. Forward index && inverted index

5. Delabeling and data cleaning

6. Build the index module Index

6.1 Forward Index

6.2 Create an inversion

jiebacpp use

build participle

7. Search engine module Searcher

Jsoncpp -- Serialization and deserialization via jsoncpp

Processing Content

8. Import http-lib

9. Web front-end code writing

10. Project log writing

11. Project testing


1. Project related background

Since the boost official website does not have an on-site search, we need to make one ourselves. What we do is on-site search, so-called on-site search, which searches more vertically and has a smaller amount of data.

 

2. The macro principle of the project

3. Technology stack and project environment

Technology stack: C/C++ C++11, STL, quasi-standard library Boost, Jsoncpp, cppjieba, cpp-httplib, optional: html5, css, js, jQuery, Ajax.

Project environment: Centos 7 cloud server, vim/gcc(g++)/Makefile, vs2019 or vs code

4. Forward index && inverted index

Document example:

Document ID

document content

1

Lei Jun bought four catties of millet

2

Lei Jun released the Xiaomi phone

Positive index: find document content from document ID

Word segmentation: easy to build inverted index and search

Lei Jun bought four catties of millet: Lei Jun/buy/got/four catties/millet

Lei Jun released the Xiaomi mobile phone: Lei Jun /published/ed/Xiaomi/phone

Inverted index: Find the content of the document ID according to the keyword

keywords

Document ID

Lei Jun

1,2

purchase

1

Four catties

1

Millet

1,2

Four catties of millet

1

release

2

millet phone

2

Stop words: 了, 的, 你, a, the, generally we can ignore them when segmenting words

User input: Xiaomi -> find the document ID in the inverted index -> extract the document ID (1,2) -> according to the forward index -> find the document ID -> build the response result

5. Delabeling and data cleaning

We only need to index the files in doc/html/ under boost

Function: raw data -> label -> put the result in the same row text document

Parser.cc

There are three main functions in Parser.cc: EnumFile, ParseHtml, and SaveHtml

5.1 EnumFile():

Function: recursively save each html file name with path to files_list

step:

  1. Determine whether the path exists
  2. Determine whether the file is an ordinary file, because the .html file is an ordinary file
  3. Determine whether the suffix meets the requirements, it must end with .html

To use the filesystem in the Boost library

Boost development library installation: sudo yum install -y boost-devel

Recursive traversal: use recursive_directory_iterator in the boost library

It must be a .html file before it can be traversed and inserted

iter->path().extension() == ".html"
bool EnumFile(const std::string& src_path,std::vector<std::string> * files_list)
{
    namespace fs = boost::filesystem;
    fs::path root_path(src_path);

    //查找路径是否存在 不存在 就没有必要往后走了
    if(!fs::exists(root_path))
    {
        std::cerr<<src_path<<"not exists" <<std::endl;
        return false;
    }

    //递归遍历文件
    //定义一个空的迭代器 用来进行判断递归结束
    fs::recursive_directory_iterator end;
    for(fs::recursive_directory_iterator iter(root_path);iter != end;iter++)
    {
        //判断文件是否是普通文件 html都是普通文件 
        if(!fs::is_regular_file(*iter)) 
        {
            continue;
        }
        //一定是一个普通文件  判断文件后缀 只需要html
        if(iter->path().extension() != ".html")
        {
            continue;
        }
        //std::cout<<"debug: "<<iter->path().string()<<std::endl;
        //当前的路径一定是一个合法的,以.html结束的普通网页文件
        files_list->push_back(iter->path().string());
    }
    return true;
}

5.2 ParseHtml()

Role: read the content of each file, parse the file content to create DocInfo_t

step:

  1. read file
  2. Parse the specified file and extract title -> <title> </title>
  3. Parse the specified file and extract the content
  4. Parse the specified file and extract the url
typedef struct DocInfo
{
    std::string title;  //文档的标题
    std::string content;//文档内容
    std::string url;    //文档在官网中的url
}DocInfo_t;

bool ParseHtml(const std::vector<std::string>& files_list,std::vector<DocInfo_t>*results)
{
    //遍历文件 解析文件
    for(const std::string &file : files_list)
    {
        //1.读取文件 Read()
        std::string result;
        if(!ns_util::FileUtil::ReadFile(file,&result))
        {
            continue;
        }
        //2.解析指定的文件,提取title
        DocInfo_t doc;
        //解析title
        if(!ParseTitle(result,&doc.title))
        {
            continue;
        }
        //3.解析指定的文件,提取content 本质是去标签
        if(!ParseContent(result,&doc.content))
        {
            continue;
        }
        //4.解析指定的文件,提取url
        if(!ParseUrl(file,&doc.url))
        {
            continue;
        }
        //这里一定是完成了解析任务,当前文档的相关结果都保存在了doc里面
        results->push_back(std::move(doc));//细节 本质会发生拷贝 效率可能比较低
    }
    return true;
}

5.2.1 Reading files

static bool ReadFile(const std::string &file_path,std::string *out)
{
    std::ifstream in(file_path,std::ios::in);
    if(!in.is_open())
    {
        std::cerr<<"open file"<<file_path<<" error " << std::endl;
        return false;
    }
    //文件打开成功
    std::string line;
    //如何理解getline读取到文件结束呢?
    //getline的返回值是一个& 
    //while(bool) 本质是因为返回的对象重载了强制类型转换
    while(std::getline(in,line))
    {
        *out += line;
    }

    in.close();
    return true;
}

5.2.2 Parse the specified file and extract the title

Since title titles are between <title> </title> tags, we can use string operations to extract

//找到<title> </title>位置,然后选取中间的位置
static bool ParseTitle(const std::string& file,std::string *title)
{
    std::size_t begin = file.find("<title>");
    if(begin == std::string::npos)
    {
        return false;
    }
    std::size_t end = file.find("</title>");
    if(end == std::string::npos){
        return false;
    }
    begin+=std::string("<title>").size();

    if(begin>end)
    {
        return false;
    }

    *title = file.substr(begin,end-begin);
    return true;
}

5.2.3 Remove tags

The label removal is read based on a state machine. When traversing, once you encounter '>', it means that the label has been processed. We don't want to keep the '\n' in the original text, so we set it to a null character

static bool ParseContent(const std::string& file,std::string *content)
{
    //去标签,基于一个简易的状态机来编写
    enum status
    {
        LABLE,
        CONTENT
    };
    enum status s = LABLE;
    //在遍历的时候 只要碰到'>'当前的标签被处理完毕
    for(char c:file)
    {
        switch(s)
        {
            case LABLE:
                if(c == '>') s = CONTENT;
                break;
            case CONTENT:
                //只要碰到了左尖括号 一位置新的标签开始了
                if(c == '<') 
                    s=LABLE;
                else{
                    //我们不想保留原始文件中的\n,因为我们想用\n作为html解析之后文本的分隔符4
                    if(c == '\n') c= ' ';
                    content->push_back(c);
                }
                break;
            default:
                break;
        }
    }
    return true;
}

5.2.4 Splicing urls

When we observe the url of the boost official library, we can find that there is a path correspondence between the official documents under the boost library and the downloaded documents.

Example official website URL: https://www.boost.org/doc/libs/1_78_0/doc/html/accumulators.html

Where url is spliced ​​by url_head and url_tail. And url_head is composed of a fixed string:

"https://www.boost.org/doc/libs/1_81_0/doc/html"

And url_tail is the file path name of our html file, only the file name is reserved

//构建url boost库的官方文档,和我们下载下来的文档是有路径对应关系的
static bool ParseUrl(const std::string & file_path,std::string *url)
{ 
    const std::string url_head = "https://www.boost.org/doc/libs/1_81_0/doc/html";
    std::string url_tail = file_path.substr(src_path.size());

    *url = url_head + url_tail;
    return true;
}

5.3 SaveHtml function

Function: Write the analysis content into the document, and it must be considered that it is convenient to operate when reading next time. So the format we use here is as follows:

类似:title\3content\3url \n title\3content\3url \n title\3content\3url \n ...

This is convenient for us to use getline(ifsream, line) to directly obtain the entire content of the document: title\3content\3url

//每一个文档包含3个部分 title\3 content\3 url \3 \n title\3 content\3 url \3 
//每个文档和文档之间用'/n'分隔开
bool SaveHtml(const std::vector<DocInfo_t>& results,const std::string & output)
{
    #define SEP '\3'
    //按照二进制的方式进行写入
    std::ofstream out(output,std::ios::out | std::ios::binary);
    if(!out.is_open())
    {
        std::cerr <<"open "<<output<<"Failed" <<std::endl;
        return false;
    }
    //begin
    for(auto &item : results)
    {
        std::string out_string;
        out_string = item.title;
        out_string += SEP;
        out_string += item.content;
        out_string += SEP;
        out_string += item.url;
        out_string += '\n';

        //写入文件
        out.write(out_string.c_str(),out_string.size());
    }
    out.close();
    return true;
}

6. Build the index module Index

In this step, we will build forward and inverted indexes.

    struct DocInfo
    {
        std::string title;  //文档的标题
        std::string content;//文档对应去标签之后的内容
        std::string url;    //文档的url
        uint64_t doc_id;         //文档id
    };

6.1 Forward Index

//The data structure of the forward index uses an array, and the subscript of the array is naturally the ID of the document

std::vector<DocInfo> forward_index; //forward index

The positive index is to find the content of the document according to the doc_id

step:

  1. String segmentation, parsing line
  2. String to fill into DocInfo
  3. Insert DocInfo into the vector of forward index
//namespace ns_util 中,类class FileUtil的成员函数
class StringUtil{
    public:
        static void split(const std::string &target, std::vector<std::string> *out, const std::string &sep)
        {
            //boost split
            boost::split(*out, target, boost::is_any_of(sep), boost::token_compress_on);
        }
};

DocInfo *BuildForwardIndex(const std::string&line)
{
    //1.解析line,做字符串切分
    std::vector<std::string> results;
    const std::string sep = "\3";//行内分隔符
    ns_util::StringUtil::split(line,&results,sep);
    if(results.size() != 3)
    {
        return nullptr;
    }
    //2.字符串进行填充DocInfo
    DocInfo doc;
    doc.title = results[0];
    doc.content = results[1];
    doc.url = results[2];
    doc.doc_id = forward_index.size();//先进行保存id,再插入,对应的id就是当前doc在vector中的下标!
    //3.插入到正排索引的vector
    forward_index.push_back(std::move(doc));//doc.html文件
    return &forward_index.back();
}

The split method in the boost library is used here. If there are multiple "\3" separators, set the third parameter to boost::token_compress_on

Note: save the id first, and then insert, the corresponding id is the subscript of the current doc in the vector!

6.2 Create an inversion

 struct InvertedElem
    {
        uint64_t doc_id;
        std::string word;
        int weight;//权重
        InvertedElem():weight(0){}
    };

According to the content of the document, one or more InvertedElem (inverted zipper) is lined up, because currently we are processing in one document, and one document will contain multiple "words", all corresponding to the current doc_id.

  1. Word segmentation is required for title and content
  2. Set the correlation between words and documents -- we use word frequency statistics and weight here, so we need to define a structure of inverted zippers, here is a simple process.
  3. Custom correlation, so that the weight value of keywords appearing in the title is higher. So weight = 10*title + content.

Install cpp-jieba:

Get the link: cppjieba: cppjieba cppjieba

After the download is successful, add rz -E to our project path. Then we establish a soft link to the current directory

ln -s cppjieba/dict dict -- thesaurus

ln -s cppjieba/include/cppjieba/ cppjieba -- related header files

Note: There is a pitfall when using cppjieba, for example, copy the files under deps/limonp to include/cppjieba/ for normal use

cp deps/limonp include/cppjieba/ -rf

jiebacpp use

#pragma once

#include <iostream>
#include <vector>
#include <string>
#include <fstream>
#include <mutex>
#include <unordered_map>
#include <boost/algorithm/string.hpp>
#include "cppjieba/Jieba.hpp"
#include "log.hpp"

namespace ns_util{
    const char* const DICT_PATH = "./dict/jieba.dict.utf8";
    const char* const HMM_PATH = "./dict/hmm_model.utf8";
    const char* const USER_DICT_PATH = "./dict/user.dict.utf8";
    const char* const IDF_PATH = "./dict/idf.utf8";
    const char* const STOP_WORD_PATH = "./dict/stop_words.utf8";

    class JiebaUtil{
        private:
            //static cppjieba::Jieba jieba;
            cppjieba::Jieba jieba;
            std::unordered_map<std::string, bool> stop_words;
        private:
            JiebaUtil():jieba(DICT_PATH, HMM_PATH, USER_DICT_PATH, IDF_PATH, STOP_WORD_PATH)
            {}
            JiebaUtil(const JiebaUtil&) = delete;

            static JiebaUtil *instance;
        public:

            static JiebaUtil* get_instance()
            {
                static std::mutex mtx;
                if(nullptr == instance){
                    mtx.lock();
                    if(nullptr == instance){
                        instance = new JiebaUtil();
                        instance->InitJiebaUtil();
                    }
                    mtx.unlock();
                }

                return instance;
            }

            void InitJiebaUtil()
            {
                std::ifstream in(STOP_WORD_PATH);
                if(!in.is_open()){
                    LOG(FATAL, "load stop words file error");
                    return;
                }

                std::string line;
                while(std::getline(in, line)){
                    stop_words.insert({line, true});
                }

                in.close();
            }

            void CutStringHelper(const std::string &src, std::vector<std::string> *out)
            {
                //核心代码
                jieba.CutForSearch(src, *out);
                for(auto iter = out->begin(); iter != out->end(); ){
                    auto it = stop_words.find(*iter);
                    if(it != stop_words.end()){
                        //说明当前的string 是暂停词,需要去掉
                        iter = out->erase(iter);
                    }
                    else{
                        iter++;
                    }
                }
            }

        public:
            static void CutString(const std::string &src, std::vector<std::string> *out)
            {
                ns_util::JiebaUtil::get_instance()->CutStringHelper(src, out);
                //jieba.CutForSearch(src, *out);
            }
    };

    JiebaUtil *JiebaUtil::instance = nullptr;
    //cppjieba::Jieba JiebaUtil::jieba(DICT_PATH, HMM_PATH, USER_DICT_PATH, IDF_PATH, STOP_WORD_PATH);
}

There may be thread safety issues when acquiring a singleton, we lock it

So far we have introduced the jieba participle, we can just write the inverted index

build participle

First, the title and content are segmented, so after the title and content are segmented, we need to create a mapping table for words and word frequencies. We segment the title and content because we want to count the word frequency of each word. We create a vector to hold the separated words.

When this word appears in title, its weight is considered to be heavier, and when it appears in content, its weight is considered to be lighter.

Note: Since the search itself is not case-sensitive, we convert all the words that appear to lowercase after word segmentation, and then make statistics.

 bool BuildInvertedIndex(const DocInfo& doc)
            {
                
                //DocInfo{title,content,url,doc_id}
                //word-> 
                //需要对title和content进行分词
                //example: 吃/葡萄/不吐/葡萄皮
                 struct word_cnt{
                    int title_cnt;
                    int content_cnt;

                    word_cnt():title_cnt(0), content_cnt(0){}
                };
                std::unordered_map<std::string, word_cnt> word_map; //用来暂存词频的映射表

                //对标题进行分词
                std::vector<std::string> title_words;
                ns_util::JiebaUtil::CutString(doc.title, &title_words);
                 for(std::string s : title_words){
                    boost::to_lower(s); //需要统一转化成为小写
                    word_map[s].title_cnt++; //如果存在就获取,如果不存在就新建
                }
                //对文档内容进行分词
                std::vector<std::string> content_words;
                ns_util::JiebaUtil::CutString(doc.content, &content_words);
                //对内容进行词频统计
                for(std::string s : content_words){
                    boost::to_lower(s);
                    word_map[s].content_cnt++;
                }
#define X 10
#define Y 1
                for(auto &word_pair : word_map){
                    InvertedElem item;
                    item.doc_id = doc.doc_id;
                    item.word = word_pair.first;
                    item.weight = X*word_pair.second.title_cnt + Y*word_pair.second.content_cnt; //相关性
                    InvertedList &inverted_list = inverted_index[word_pair.first];
                    inverted_list.push_back(std::move(item));
                }

                return true;
            }

7. Search engine module Searcher

When we write Searcher, we first need to obtain or wear an index object, and then build an index based on the index object

After the forward and inverted indexes are established successfully, the user needs to search. First of all, we need to segment the keyword entered by the user, and then perform an index search based on each word of the segment, and ignore case when creating the index. Then merge and sort, summarize the search results, sort in descending order according to the relevance, and rank the ones with higher weights first, and finally we construct the Json string based on the search results

void Search(const std::string &query, std::string *json_string)
        {
            //1.[分词]:对我们的query进行按照searcher的要求进行分词
            std::vector<std::string> words;
            ns_util::JiebaUtil::CutString(query, &words);
            //2.触发 根据分词的各个词,进行index查找 建立index是忽略大小写的
            //ns_index::InvertedList inverted_list_all;//内部放的是InvertedElem
            std::vector<InvertedElemPrint> inverted_list_all;

            std::unordered_map<uint64_t,InvertedElemPrint> tokens_map;
            for(std::string word : words){
                    boost::to_lower(word);

                    ns_index::InvertedList *inverted_list = index->GetInvertedList(word);
                    if(nullptr == inverted_list){
                        continue;
                    }
                    for(const auto&elem : *inverted_list)
                    {
                        auto &item = tokens_map[elem.doc_id];
                        //item一定是doc_id相同的节点
                        item.doc_id = elem.doc_id;
                        item.weight = elem.weight;
                        item.words.push_back(elem.word);
                    }
            }
            for(const auto&item : tokens_map)
            {
                inverted_list_all.push_back(std::move(item.second));
            }
            std::sort(inverted_list_all.begin(),inverted_list_all.end(),
            [](const InvertedElemPrint &e1,const InvertedElemPrint& e2)
            {
                return e1.weight > e2.weight;
            });
            //4.构建 根据查找出来的结果 构建json -- 通过第三方库
            Json::Value root;
            for(auto &item : inverted_list_all)
            {
                ns_index::DocInfo *doc = index->GetForwardIndex(item.doc_id);
                if(nullptr == doc)
                {
                    continue;
                }
                Json::Value elem;
                elem["title"] = doc->title;
                elem["desc"] = GetDesc(doc->content,item.words[0]);
                /*content是文档的去标签的结果 但是不是我们想要的,我们要的是一部分*/
                elem["url"] = doc->url;
                
                root.append(elem);
            }
            //Json::StyledWriter writer;
            Json::FastWriter writer;
            *json_string = writer.write(root);
        }

When looking up we first need to get the inverted zipper.

Jsoncpp -- Serialization and deserialization via jsoncpp

jsoncpp installation: sudo yum install -y jsoncpp-devel

So how do we use Jsoncpp? let's do a demo

#include <iostream>
#include <string>
#include <vector>
#include <jsoncpp/json/json.h>

int main()
{
    Json::Value root;
    Json::Value item1;
    item1["key1"] = "value1";
    item1["key2"] = "value2";

    Json::Value item2;
    item2["key1"] = "value1";
    item2["key2"] = "value2";

    root.append(item1);
    root.append(item2);

    //进行序列化
    Json::StyledWriter writer;
    std::string s = writer.write(root);
    std::cout<<s<<std::endl;
    return 0;
} 

Note here that we need to link the json library when compiling, otherwise an error will be reported when connecting

require-ljsoncpp

We found that the printed results were serialized. We also have another form, FastWriter, which is more concise

After doing these preparations, we will build the Json string

Another thing to note here is that content is the de-tagging result of the document, but it is not what we want, we only need a part, so it needs to be processed here.

Processing Content

std::string GetDesc(const std::string &html_content, const std::string &word)
        {
            //找到word在html_content中的首次出现,然后往前找50字节(如果没有,从begin开始),往后找100字节(如果没有,到end就可以的)
            //截取出这部分内容
            const int prev_step = 50;
            const int next_step = 100;
            //1. 找到首次出现
            auto iter = std::search(html_content.begin(), html_content.end(), word.begin(), word.end(), [](int x, int y){
                    return (std::tolower(x) == std::tolower(y));
                    });
            if(iter == html_content.end()){
                return "None1";
            }
            int pos = std::distance(html_content.begin(), iter);

            //2. 获取start,end , std::size_t 无符号整数
            int start = 0; 
            int end = html_content.size() - 1;
            //如果之前有50+字符,就更新开始位置
            if(pos > start + prev_step) start = pos - prev_step;
            if(pos < end - next_step) end = pos + next_step;、、

            //3. 截取子串,return
            if(start >= end) return "None2";
            std::string desc = html_content.substr(start, end - start);
            desc += "...";
            return desc;
        }

8. Import http-lib

Import http-lib: cpp-httplib: cpp-httplib - Gitee.com

Note: A newer version of gcc is required when introducing http-lib, and you can view the gcc version by using gcc -v

If your current version is lower, please upgrade to a newer version first, upgrade method: upgrade GCC-Linux CentOS7

When the GCC version is updated, we create a soft connection of httplib under the current file

ln -s /home/Lxy/cpp-httplib-v0.7.15 cpp-httplib

#include "searcher.hpp"
#include "cpp-httplib/httplib.h"
#include "log.hpp"
const std::string input = "data/raw_html/raw.txt";
const std::string root_path = "./wwwroot";

int main()
{
    ns_searcher::Searcher search;
    search.InitSearcher(input);
    httplib::Server svr;
    svr.set_base_dir(root_path.c_str());
    svr.Get("/s", [&search](const httplib::Request &req, httplib::Response &rsp)
    {
        if(!req.has_param("word"))
        {
            rsp.set_content("必须要有搜索关键字!", "text/plain; charset=utf-8");
            return;
        }
        std::string word = req.get_param_value("word");
        //std::cout << "用户在搜索:" << word << std::endl;
        LOG(NORMAL,"用户搜索的: "+word);
        std::string json_string;
        search.Search(word, &json_string);
        rsp.set_content(json_string, "application/json");
        //rsp.set_content("你好,世界!", "text/plain; charset=utf-8");
    });
    LOG(NORMAL,"服务器启动成功....");
    svr.listen("0.0.0.0", 8081);
    return 0;
}

9. Web front-end code writing

Know html, css, js

html: is the skeleton of the web page -- responsible for the structure of the web page

css: the flesh and blood of the webpage -- responsible for the beauty of the webpage

js (javascript): the soul of the web page---responsible for dynamic effects and interaction with front and back ends

Tutorials: w3school online tutorials

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <script src="http://code.jquery.com/jquery-2.1.1.min.js"></script>

    <title>boost 搜索引擎</title>
    <style>
        /* 去掉网页中的所有的默认内外边距,html的盒子模型 */
        * {
            /* 设置外边距 */
            margin: 0;
            /* 设置内边距 */
            padding: 0;
        }
        /* 将我们的body内的内容100%和html的呈现吻合 */
        html,
        body {
            height: 100%;
        }
        /* 类选择器.container */
        .container {
            /* 设置div的宽度 */
            width: 800px;
            /* 通过设置外边距达到居中对齐的目的 */
            margin: 0px auto;
            /* 设置外边距的上边距,保持元素和网页的上部距离 */
            margin-top: 15px;
        }
        /* 复合选择器,选中container 下的 search */
        .container .search {
            /* 宽度与父标签保持一致 */
            width: 100%;
            /* 高度设置为52px */
            height: 52px;
        }
        /* 先选中input标签, 直接设置标签的属性,先要选中, input:标签选择器*/
        /* input在进行高度设置的时候,没有考虑边框的问题 */
        .container .search input {
            /* 设置left浮动 */
            float: left;
            width: 600px;
            height: 50px;
            /* 设置边框属性:边框的宽度,样式,颜色 */
            border: 1px solid black;
            /* 去掉input输入框的有边框 */
            border-right: none;
            /* 设置内边距,默认文字不要和左侧边框紧挨着 */
            padding-left: 10px;
            /* 设置input内部的字体的颜色和样式 */
            color: #CCC;
            font-size: 14px;
        }
        /* 先选中button标签, 直接设置标签的属性,先要选中, button:标签选择器*/
        .container .search button {
            /* 设置left浮动 */
            float: left;
            width: 150px;
            height: 52px;
            /* 设置button的背景颜色,#4e6ef2 */
            background-color: #4e6ef2;
            /* 设置button中的字体颜色 */
            color: #FFF;
            /* 设置字体的大小 */
            font-size: 19px;
            font-family:Georgia, 'Times New Roman', Times, serif;
        }
        .container .result {
            width: 100%;
        }
        .container .result .item {
            margin-top: 15px;
        }

        .container .result .item a {
            /* 设置为块级元素,单独站一行 */
            display: block;
            /* a标签的下划线去掉 */
            text-decoration: none;
            /* 设置a标签中的文字的字体大小 */
            font-size: 20px;
            /* 设置字体的颜色 */
            color: #4e6ef2;
        }
        .container .result .item a:hover {
            text-decoration: underline;
        }
        .container .result .item p {
            margin-top: 5px;
            font-size: 16px;
            font-family:'Lucida Sans', 'Lucida Sans Regular', 'Lucida Grande', 'Lucida Sans Unicode', Geneva, Verdana, sans-serif;
        }

        .container .result .item i{
            /* 设置为块级元素,单独站一行 */
            display: block;
            /* 取消斜体风格 */
            font-style: normal;
            color: green;
        }
    </style>
</head>
<body>
    <div class="container">
        <div class="search">
            <input type="text" value="请输入搜索关键字">
            <button onclick="Search()">搜索一下</button>
        </div>
        <div class="result">
            <!-- 动态生成网页内容 -->
            <!-- <div class="item">
                <a href="#">这是标题</a>
                <p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p>
                <i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i>
            </div>
            <div class="item">
                <a href="#">这是标题</a>
                <p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p>
                <i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i>
            </div>
            <div class="item">
                <a href="#">这是标题</a>
                <p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p>
                <i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i>
            </div>
            <div class="item">
                <a href="#">这是标题</a>
                <p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p>
                <i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i>
            </div>
            <div class="item">
                <a href="#">这是标题</a>
                <p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p>
                <i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i>
            </div> -->
        </div>
    </div>
    <script>
        function Search(){
            // 是浏览器的一个弹出框
            // alert("hello js!");
            // 1. 提取数据, $可以理解成就是JQuery的别称
            let query = $(".container .search input").val();
            console.log("query = " + query); //console是浏览器的对话框,可以用来进行查看js数据

            //2. 发起http请求,ajax: 属于一个和后端进行数据交互的函数,JQuery中的
            $.ajax({
                type: "GET",
                url: "/s?word=" + query,
                success: function(data){
                    console.log(data);
                    BuildHtml(data);
                }
            });
        }

        function BuildHtml(data){
            // 获取html中的result标签
            let result_lable = $(".container .result");
            // 清空历史搜索结果
            result_lable.empty();

            for( let elem of data){
                // console.log(elem.title);
                // console.log(elem.url);
                let a_lable = $("<a>", {
                    text: elem.title,
                    href: elem.url,
                    // 跳转到新的页面
                    target: "_blank"
                });
                let p_lable = $("<p>", {
                    text: elem.desc
                });
                let i_lable = $("<i>", {
                    text: elem.url
                });
                let div_lable = $("<div>", {
                    class: "item"
                });
                a_lable.appendTo(div_lable);
                p_lable.appendTo(div_lable);
                i_lable.appendTo(div_lable);
                div_lable.appendTo(result_lable);
            }
        }
    </script>
</body>
</html>

10. Project log writing

#include <iostream>
#include <string>
#include <ctime>

#define NORMAL 1
#define WARNING 2
#define DEBUG 3
#define FATAL 4

#define LOG(LEVEL,MESSAGE) log(#LEVEL,MESSAGE,__FILE__,__LINE__)
void log(std::string level,std::string message,std::string file,int line)
{
    std::cout<<"[" <<level<<"]" <<"[" << time(nullptr)<<"]"<<"[" <<message<<"]"<<"[" <<file<<"]"<<"[" <<line<<"]"<<std::endl;
} 

The project is deployed to the Linux server

nohup ./http_server > log/log.txt 2>&1 &

11. Project testing

Project Results: boost search engine

Project code address: project-boost-search-engine

Guess you like

Origin blog.csdn.net/qq_58325487/article/details/129380908