Table of contents
3. Knowledge points involved in the project
4. Basic theory of project realization
6. Increase the system tool module
6.1. The function of scanning local files
Seven, increase the data management module
7.1, first understand the database sqlite
7.2 Encapsulate sqlite database management class
7.3, encapsulation data management class
7.3.2 Using the RAII mechanism to solve the automatic release of table results
8.1. Synchronization function, synchronizing database and local
8.2. New real-time scanning function
8.3 Singleton of scan management class
Nine, the use of static link library for sqlite
1. Generate a static link library
2. Use to generate a static link library
2. Add mutex and condition variable in scan management class
3. Upgrade scanning thread and monitoring thread
11. Realization of intermediate logic layer
1. Realize the search of full pinyin and initial letters
2.1 The key to realize highlight search is to realize segmentation function
Problems encountered in the project:
1. Project background
So I wondered if I could write a quick search tool myself?
2. Project demand analysis
-
1. Support regular search of documents
-
2. Support Pinyin Quanpin search
-
3. Support pinyin initial letter search
-
4. Support search keyword highlighting
-
5. Scanning and monitoring (not perceived by users)
3. Knowledge points involved in the project
4. Basic theory of project realization
5. Project framework
6. Increase the system tool module
sysutil.h and sysutil.cpp
6.1. The function of scanning local files
The final function of this function is to save the traversed directory (save it in the database)
To be able to scan local files, you must first know some system functions:
//功能是搜索与指定的文件名称匹配的第一个实例,若成功则返回第一个实例的句柄,否则返回-1L
long _findfirst( char *filespec, struct _finddata_t *fileinfo );
//_findnext函数提供搜索文件名称匹配的下一个实例,若成功则返回0,否则返回-1
int _findnext( long handle, struct _finddata_t *fileinfo );
//_findclose用于释放由_findfirst分配的内存,可以停止一个_findfirst/_findnext序列
int _findclose( long handle );
//系统工具 -- 体现为函数
void DirectionList(const string& path, vector<string>& sub_dir, vector<string>& sub_file)
{
struct _finddata_t file;
//"C:\\Users\\86188\\Desktop\\项目1\\项目—文档快速搜索工具\\TestDoc"
string _path = path;
//"C:\\Users\\86188\\Desktop\\项目1\\项目—文档快速搜索工具\\TestDoc"
_path += "\\*.*";
long handle = _findfirst(_path.c_str(), &file);
if (handle == -1)
{
//printf("扫描目录失败.\n");
ERROR_LOG("扫描目录失败");
return;
}
do
{
if (file.name[0] == '.')
continue;
//cout<<file.name<<endl;
if (file.attrib & _A_SUBDIR)
sub_dir.push_back(file.name);
else
sub_file.push_back(file.name);
if (file.attrib & _A_SUBDIR)
{
//文件为目录(文件夹)
//"C:\\Users\\86188\\Desktop\\项目1\\项目—文档快速搜索工具\\TestDoc"
string tmp_path = path;
//"C:\\Users\\86188\\Desktop\\项目1\\项目—文档快速搜索工具\\TestDoc"
tmp_path += "\\";
//"C:\\Users\\86188\\Desktop\\项目1\\项目—文档快速搜索工具\\TestDoc"
tmp_path += file.name;
//目录递归遍历
DirectionList(tmp_path, sub_dir, sub_file);
}
} while (_findnext(handle, &file) == 0);
_findclose(handle);
}
Seven, increase the data management module
7.1, first understand the database sqlite
Introduction to SQlite
- Does not require a separate server process or operating system (serverless).
- SQLite requires no configuration, which means no installation or administration. (Simple to use)
- A complete SQLite database is stored in a single cross-platform disk file.
- SQLite is very small and lightweight, less than 400KiB fully configured and less than 250KiB configured omitting optional features.
- SQLite is self-sufficient, which means it does not require any external dependencies.
- SQLite transactions are fully ACID compliant, allowing safe access from multiple processes or threads.
- SQLite supports most of the query language features of the SQL92 (SQL2) standard.
- SQLite is written in ANSI-C and provides a simple and easy-to-use API.
//打开数据库
int sqlite3_open(const char *filename, sqlite3 **ppDb);
//关闭
int sqlite3_close(sqlite3*);
//执行操作 后面的创建表,插入数据其实就是把sql的内容换了而已
int sqlite3_exec(sqlite3*, const char *sql, sqlite_callback,
void *data, char **errmsg);
int sqlite3_get_table(
sqlite3 *db, /* An open database */
const char *zSql, /* SQL to be evaluated */
char ***pazResult, /* Results of the query */
int *pnRow, /* Number of result rows written here */
int *pnColumn, /* Number of result columns written here */
char **pzErrmsg /* Error msg written here */
);
void sqlite3_free_table(char **result);
With these functions, we can encapsulate these functions into a class SqliteManager
7.2 Encapsulate sqlite database management class
//封装数据库sqlite
class SqliteManager
{
public:
SqliteManager();
~SqliteManager();
public:
void Open(const string& database); //打开或者创建一个数据库
void Close(); //关闭数据库
void ExecuteSq1(const string& sql);//执行SQL 创建表,插入,删除都是通过执行sql语句
void GetResultTable(const string& sql, char**& ppRet, int& row, int& col);
private:
sqlite3* m_db;
};
SqliteManager::SqliteManager() :m_db(nullptr)
{}
SqliteManager::~SqliteManager()
{
Close();//关闭数据库
}
void SqliteManager::Open(const string& database)
{
int rc = sqlite3_open(database.c_str(), &m_db);
if (rc != SQLITE_OK)
{
//fprintf(stderr, "Can't open database: %s\n", sqlite3_errmsg(m_db));
ERROR_LOG("Can't open database: %s\n", sqlite3_errmsg(m_db));
exit(1);
}
else
{
//fprintf(stderr, "Opened database successfully\n");
TRACE_LOG("Opened database successfully\n");
}
}
void SqliteManager::Close()
{
int rc = sqlite3_close(m_db);
if (rc != SQLITE_OK)
{
//fprintf(stderr, "Can't close database: %s\n", sqlite3_errmsg(m_db));
ERROR_LOG("Can't close database: %s\n", sqlite3_errmsg(m_db));
exit(1);
}
else
{
//fprintf(stderr, "Close database successfully\n");
TRACE_LOG("Close database successfully\n");
}
}
void SqliteManager::ExecuteSq1(const string& sql)
{
char* zErrMsg = 0;
int rc = sqlite3_exec(m_db, sql.c_str(), 0, 0, &zErrMsg);
if (rc != SQLITE_OK)
{
//fprintf(stderr, "SQL error: %s\n", zErrMsg);
ERROR_LOG("SQL error: %s\n", zErrMsg);
sqlite3_free(zErrMsg);
}
else
{
//fprintf(stdout, "Operation sql successfully\n");
TRACE_LOG("Operation sql successfully\n");
}
}
void SqliteManager::GetResultTable(const string& sql, char**& ppRet, int& row, int& col)
{
char* zErrMsg = 0;
int rc = sqlite3_get_table(m_db, sql.c_str(), &ppRet, &row, &col, &zErrMsg);
if (rc != SQLITE_OK)
{
//fprintf(stderr, "SQL Error: %s\n", zErrMsg);
ERROR_LOG("SQL Error: %s\n", zErrMsg);
sqlite3_free(zErrMsg);
}
else
{
//fprintf(stdout, "Get Result table successfully\n");
TRACE_LOG("Get Result table successfully\n");
}
}
7.3 , encapsulation data management class
It is convenient for us to operate the database, because we are not going to operate the database in the end, but let the local files and the files of the database be continuously compared.
Ensure that the local files and database files are synchronized, or simply understand that we do not directly operate the database
//封装数据管理类
class DataManager
{
public:
DataManager();
~DataManager();
public:
void InitSqlite(); //初始化数据库
void InsertDoc(const string &path, const string &doc);
void DeleteDoc(const string &path, const string &doc);
void GetDoc(const string &path, multiset<string> &docs);
private:
SqliteManager m_dbmgr;
};
DataManager::DataManager()
{
m_dbmgr.Open(DOC_DB);
InitSqlite(); //创建表
}
DataManager::~DataManager()
{}
void DataManager::InitSqlite()
{
char sql[SQL_BUFFER_SIZE] = {0};
sprintf(sql, "CREATE TABLE if not exists %s(\
id integer primary key autoincrement,\
doc_name text,\
doc_path text)", DOC_TB);
m_dbmgr.ExecuteSql(sql);
}
void DataManager::InsertDoc(const string &path, const string &doc)
{
char sql[SQL_BUFFER_SIZE] = {0};
sprintf(sql, "INSERT INTO %s values(null, '%s', '%s')",
DOC_TB, doc.c_str(), path.c_str());
m_dbmgr.ExecuteSql(sql);
}
void DataManager::DeleteDoc(const string &path, const string &doc)
{
char sql[SQL_BUFFER_SIZE] = {0};
sprintf(sql, "DELETE FROM %s where doc_path='%s' and doc_name='%s'",
DOC_TB, path.c_str(), doc.c_str());
m_dbmgr.ExecuteSql(sql);
}
void DataManager::GetDoc(const string &path, multiset<string> &docs)
{
char sql[SQL_BUFFER_SIZE] = {0};
sprintf(sql, "SELECT doc_name from %s where doc_path='%s'",
DOC_TB, path.c_str());
char **ppRet = 0;
int row = 0, col = 0;
m_dbmgr.GetResultTable(sql, ppRet, row, col);
for(int i=1; i<=row; ++i)
docs.insert(ppRet[i]);
//释放表结果
sqlite3_free_table(ppRet);
}
7.3.1 Add search function
The search here uses like fuzzy matching
void DataManager::Search(const string &key, vector<pair<string,string>>
&doc_path)
{
char sql[SQL_BUFFER_SIZE] = {0};
sprintf(sql, "SELECT doc_name, doc_path from %s where doc_name like
'%%%s%%'",
DOC_TB, key.c_str());
char **ppRet;
int row, col;
m_dbmgr.GetResultTable(sql, ppRet, row, col);
for(int i=1; i<=row; ++i)
{
doc_path.push_back(make_pair(ppRet[i*col], ppRet[i*col+1]));
}
sqlite3_free_table(ppRet);
}
7.3.2 Using the RAII mechanism to solve the automatic release of table results
Add an AutoGetResultTable class
We will find that when managing data, as long as the table is obtained, it is necessary to perform the operation of releasing the table result at the end
If we forget to release the table result, it will lead to a memory leak , that is to say, it will leak once after searching. If there are too many searches, it will inevitably lead to the exhaustion of memory resources.
It is still a bit troublesome to manually release the table result , and we cannot guarantee that we will remember to release it every time
So we wondered if we could let him be released automatically?
At this time, I thought of the idea of smart pointers
class AutoGetResultTable
{
public:
AutoGetResultTable(SqliteManager& db, const string& sql, char**& ppRet, int& row, int& col);
~AutoGetResultTable();
private:
SqliteManager& m_db;
char** m_ppRet;
};
AutoGetResultTable::AutoGetResultTable(SqliteManager& db, const string& sql,
char**& ppRet, int& row, int& col)
:m_db(db), m_ppRet(nullptr)
{
//获取数据库表的函数在数据库类中,所以必须要有一个数据库类对象才能去调
m_db.GetResultTable(sql, ppRet, row, col);
m_ppRet = ppRet;
}
AutoGetResultTable::~AutoGetResultTable()
{
if (m_ppRet)//如果这个指针不空的话,说明就需要进行释放
sqlite3_free_table(m_ppRet);
}
Small question: How do we know which parameters to pass when writing this class? How do we know which members to have?
This class itself is to solve the problem of releasing space, so if we don't save the space, what should we use to release it? So keep ppRet in the class.
With the smart pointer, it is easier to get the table
8. New scanning module
ScanManager.h and ScanManager.cpp
8.1. Synchronization function, synchronizing database and local
//同步本地数据和数据库数据
void ScanManger::ScanDirectory(const string& path)
{
//1 扫描本地文件
vector<string> local_dir;
vector<string> local_file;
DirectionList(path, local_dir, local_file);
multiset<string> local_set;
local_set.insert(local_file.begin(), local_file.end());
local_set.insert(local_dir.begin(), local_dir.end());
//2 扫描数据库文件
multiset<string> db_set;
DataManager& m_dbmgr = DataManager::GetInstance();//注意一定使用引用接收
m_dbmgr.GetDoc(path, db_set);
//3 同步数据
auto local_it = local_set.begin();
auto db_it = db_set.begin();
while (local_it != local_set.end() && db_it != db_set.end())
{
if (*local_it < *db_it)
{
//本地有,数据库没有,数据库插入文件
m_dbmgr.InsertDoc(path, *local_it);
++local_it;
}
else if (*local_it > *db_it)
{
//本地没有,数据库有,数据库删除文件
m_dbmgr.DeleteDoc(path, *db_it);
++db_it;
}
else
{
//两者都有
++local_it;
++db_it;
}
}
while (local_it != local_set.end())
{
//本地有,数据库没有,数据库插入文件
m_dbmgr.InsertDoc(path, *local_it);
++local_it;
}
while (db_it != db_set.end())
{
//本地没有,数据库有,数据库删除文件
m_dbmgr.DeleteDoc(path, *db_it);
++db_it;
}
}
8.2. New real-time scanning function
The scan I wrote before is to scan before the search. When the program runs, it will no longer be able to synchronize the database. For example, after we run the program, we delete a file at this time, and the database cannot synchronize the data. This is question. If we want to synchronize, we must restart the program, which is obviously inappropriate.
So is there any way to synchronize in real time?
If we want to scan continuously in real time, we need the idea of multi-threading , let one thread scan exclusively
Create a new scanning thread in the constructor of ScanManger
ScanManger::ScanManger(const string &path)
{
//扫描对象
thread ScanObj(&ScanManger::ScanThread,this,path);
ScanObj.detach();
}
The function of the thread is always doing the scanning work. Of course, the efficiency of the while (1) is definitely not high, and the condition variable will be used later to make the scanning less blind.
void ScanManger::ScanThread(const string& path)
{
//这个线程就是一直在扫描
while (1)
{
ScanDirectory(path);
}
}
8.3 Singleton of scan management class
Why singleton?
Because our scan needs to instantiate an object first, so what happens if someone else instantiates an object?
for example:
Then it is possible to create a thread to continue scanning. Under this program, the entire system only needs to generate one object. It is definitely not good to scan multiple objects.
And a class that only produces one object is called singleton
We are using the lazy mode here:
class ScanManager
{
public:
static ScanManager& GetInstance(const string &path);
protected:
ScanManager(const string &path);
ScanManager(ScanManager &);
ScanManager& operator=(const ScanManager&);
private:
//DataManager m_dbmgr;
};
ScanManager& ScanManager::GetInstance(const string &path)
{
static ScanManager _inst(path);
return _inst;
}
Nine, the use of static link library for sqlite
1. Generate a static link library
2. Use to generate a static link library
10. New monitoring module
If there is only scanning without monitoring, the monitoring thread will always scan in an endless loop. When there are few files, there is no big problem. If there are many files, it will be a big trouble.
We should set up another monitoring thread , and then notify the scanning thread to start working when the local file changes
#include<windows.h>
HANDLE FindFirstChangeNotification(
LPCTSTR lpPathName, // pointer to name of directory to watch
BOOL bWatchSubtree, // flag for monitoring directory or
// directory tree
DWORD dwNotifyFilter // filter conditions to watch for
);
BOOL FindNextChangeNotification(
HANDLE hChangeHandle // handle to change notification to signal
);
DWORD WaitForSingleObject(
HANDLE hHandle, // handle to object to wait for
DWORD dwMilliseconds // time-out interval in milliseconds
);
2. Add mutexes and condition variables in the scan management class
#include<mutex>
#include<condition_variable>
class ScanManager
{
//...............
mutex m_mutex;
condition_variable m_cond;
};
unique_lock<mutex> lock(m_mutex); This lock object is locked by the constructor and unlocked by the destructor
3. Upgrade scanning thread and monitoring thread
void ScanManger::ScanThread(const string& path)
{
//初始化扫描
ScanDirectory(path);//防止第一次扫描的时候数据库里没有东西
while (1)
{
unique_lock<mutex> lock(m_mutex);
m_cond.wait(lock); //条件阻塞等待,阻塞的时候不占CPU资源
ScanDirectory(path);
}
}
void ScanManger::WatchThread(const string& path)
{
//true表示的是监控子目录
HANDLE hd = FindFirstChangeNotification(path.c_str(), true,
FILE_NOTIFY_CHANGE_FILE_NAME | FILE_NOTIFY_CHANGE_DIR_NAME);
if (hd == INVALID_HANDLE_VALUE)
{
//cout<<"监控目录失败."<<endl;
ERROR_LOG("监控目录失败.");
return;
}
while (1)//监控成功,监控到了就要通知别人
{
WaitForSingleObject(hd, INFINITE); //永不超时等待
m_cond.notify_one();//通知扫描线程去干活
FindNextChangeNotification(hd);//接下来继续监控
}
}
11. Realization of intermediate logic layer
1. Realize the search of full pinyin and initial letters
//汉字转拼音
string ChineseConvertPinYinAllSpell(const string &dest_chinese);
//汉字转拼音首字母
string ChineseConvertPinYinInitials(const string &name);
void DataManager::InitSqlite()
{
char sql[SQL_BUFFER_SIZE] = {0};
sprintf(sql, "CREATE TABLE if not exists %s(\
id integer primary key autoincrement,\
doc_name text,\
doc_name_py text,\
doc_name_initials text,\
doc_path text)", DOC_TB);
m_dbmgr.ExecuteSql(sql);
}
void DataManager::InsertDoc(const string &path, const string &doc)
{
//汉字转拼音
string doc_py = ChineseConvertPinYinAllSpell(doc);
//汉字转首字母
string doc_initials = ChineseConvertPinYinInitials(doc);
char sql[SQL_BUFFER_SIZE] = {0};
sprintf(sql, "INSERT INTO %s values(null, '%s', '%s','%s', '%s')",
DOC_TB, doc.c_str(), doc_py.c_str(), doc_initials.c_str(),
path.c_str());
m_dbmgr.ExecuteSql(sql);
}
2. Implement highlight search
// 颜色高亮显示一段字符串
void ColourPrintf(const char* str)
{
// 0-黑 1-蓝 2-绿 3-浅绿 4-红 5-紫 6-黄 7-白 8-灰 9-淡蓝 10-淡绿
// 11-淡浅绿 12-淡红 13-淡紫 14-淡黄 15-亮白
//颜色:前景色 + 背景色*0x10
//例如:字是红色,背景色是白色,即 红色 + 亮白 = 4 + 15*0x10
WORD color = 9 + 0 * 0x10;
WORD colorOld;
HANDLE handle = GetStdHandle(STD_OUTPUT_HANDLE);
CONSOLE_SCREEN_BUFFER_INFO csbi;
GetConsoleScreenBufferInfo(handle, &csbi);
colorOld = csbi.wAttributes;
SetConsoleTextAttribute(handle, color);
printf("%s", str);
SetConsoleTextAttribute(handle, colorOld);
}
2.1 The key to realize highlight search is to realize segmentation function
The key is how to synchronize the original string with the py string, which is a challenge
12. Client implementation
Problems encountered in the project:
But I think the highlight should be
This can make C++ light up alone, and other colors remain unchanged. At this time, I thought about how to split this string , and then thought about writing a function to realize it. I wrote it later and checked it online. , I found out that there is a special highlighting function, so I learned this function and used it
2. I didn't expect to have a monitoring thread at the beginning
At that time, in order to be able to realize the real-time scanning function, I thought of creating a worker thread to scan specifically, but at this time, it was done in a while (1), endless loop to scan. At the beginning, there were fewer files and the scanning speed was faster. It is relatively fast, knowing that it will take up a lot of CPU resources, but I didn’t care because the results were facing each other anyway. When there were more files found the problem: it would take a long time to scan the files once more, and it was indeed It's very CPU intensive, and that's a problem.
At the beginning, the corresponding solution was that if one thread is slow, can several more threads scan the database at the same time, divide the database into several parts, and each thread is responsible for a small part, but to realize it, I thought about it for a period of time without success, and then I thought about it later. , we can solve it essentially by not letting the thread scan without thinking all the time, so that the time is greatly reduced, and then after thinking and trying and checking information online. . .
It was only later that I thought about how to use the idea of using lazy people I learned before, which is similar to realistic copying, that is, when the local file changes, the scanning thread goes to work, and if it does not change, it just waits in place and gives up cpu resources. I thought of condition variables, because condition variables have the function of notification, so I thought of creating a special monitoring thread