[Linux: Thread Pool]

Article directory

1 Thread pool concept
2 The first version of the thread pool
3 The second version of the thread pool
4 The third version of the thread pool
5 Containers in STL and thread safety issues of smart pointers
6 other common locks
7 Reader-Writer Problems (Understanding)

1 Thread pool concept

A thread usage pattern. Too many threads will bring scheduling overhead, which will affect cache locality and overall performance. The thread pool maintains multiple threads, waiting for the supervisor to assign tasks that can be executed concurrently. This avoids the cost of creating and destroying threads when handling short-lived tasks. The thread pool not only ensures full utilization of the core, but also prevents over-scheduling. The number of available threads should depend on the number of available concurrent processors, processor cores, memory, network sockets, etc.

Application scenarios of thread pool:

1️⃣A large number of threads are required to complete the task, and the time to complete the task is relatively short. When a WEB server completes tasks such as web page requests, it is very appropriate to use thread pool technology. Because a single task is small and the number of tasks is huge, you can imagine the number of clicks on a popular website. But for long-term tasks, such as a Telnet connection request, the advantages of the thread pool are not obvious. Because the Telnet session time is much longer than the thread creation time.
2️⃣Applications with stringent performance requirements, such as requiring the server to respond quickly to customer requests.
3️⃣Accept a large number of sudden requests, but do not cause the server to generate a large number of threads. A sudden large number of customer requests will generate a large number of threads without a thread pool. Although theoretically the maximum number of threads in most operating systems is not a problem, generating a large number of threads in a short period of time may cause the memory to reach its limit and errors may occur.

Thread pool example:

1. Create a fixed number of thread pools and obtain task objects from the task queue in a loop.
1. After obtaining the task object, execute the task interface in the task object.

2 The first version of the thread pool

Before creating a thread pool, let us think about what the member variables of the thread pool should be? First of all, we need a container to store threads, so we might as well use it vector; we also need to use an integer variable to record the number of threads in the thread pool; in order to ensure thread safety, we also need a lock, and in order to maintain the synchronization relationship, we also need A condition variable is needed (the synchronization relationship here means that the thread in the thread library sleeps when there is no task, and executes the task when there is a task); in addition, we also need a task queue. Here we encapsulate a task class to make the effect more obvious during verification later.

Task.hpp:

#pragma once
#include <iostream>
using namespace std;

class Task
{
    
    
public:
    Task(int x=0, int y=0, char op='+')
        : _x(x), _y(y), _op(op)
    {
    
    
    }
    void run()
    {
    
    
        switch (_op)
        {
    
    
        case '+':
            _res = _x + _y;
            break;
        case '-':
            _res = _x - _y;
            break;
        case '*':
            _res = _x * _y;
            break;
        case '/': 
            if(_y==0)
            {
    
    
                _exitCode=1;
                return;
            }
            _res = _x / _y;
            break;
            case '%':
            _res = _x % _y;
            break;
        }
    }

    void formatMsk()
    {
    
    
        cout<<"mask:"<<_x<<_op<<_y<<"==?"<<endl;
    }

    void formatRes()
    {
    
    
        cout<<"res:"<<_x<<_op<<_y<<"=="<<_res<<endl;
    }

private:
    int _x;
    int _y;
    char _op;
    int _res = 0;
    int _exitCode = 0;
};

Now let's implement the first version:

#pragma once 
#include<iostream>
#include<vector>
#include<queue>
using namespace std;

const int N=5;
template<class T>
class threadPool
{
    
    
public:
    threadPool(int sz=N)
    :_sz(sz)
    ,_threads(sz)
    {
    
    
        pthread_mutex_init(&_mutex,nullptr);
        pthread_cond_init(&_cond,nullptr);
    }


    static void* Routine(void* args)//用内存池的多线程执行任务
    {
    
    
        pthread_detach(pthread_self());//先让自己与主线程分离
        threadPool<T> *ptp=static_cast<threadPool<T> *>(args);

        while(true)
        {
    
    
            pthread_mutex_lock(&(ptp->_mutex));

            while((ptp->_masks).empty())
            {
    
    
                pthread_cond_wait(&(ptp->_cond),&(ptp->_mutex));
            }

            T task=(ptp->_masks).front();
            (ptp->_masks).pop();
            pthread_mutex_unlock(&(ptp->_mutex));

            task.run();//在临界区外执行任务
            task.formatRes();
        }

        return nullptr;
    }

    void Start()
    {
    
    
        for(int i=0;i<_sz;++i)
        {
    
    
            pthread_create(&_threads[i],nullptr,Routine,this);
        }
    }

    void PushTask(const T& task)
    {
    
    
        pthread_mutex_lock(&_mutex);
        _masks.push(task);
        pthread_mutex_unlock(&_mutex);
        pthread_cond_signal(&_cond);//记得唤醒休眠的线程去执行任务
    }

    ~threadPool()
    {
    
    
        pthread_mutex_destroy(&_mutex);
        pthread_cond_destroy(&_cond);
    }

    vector<pthread_t> _threads;
    queue<T> _masks;
    int _sz;//线程池中线程的个数
    pthread_mutex_t _mutex;
    pthread_cond_t _cond;
};

There are several points that need special attention here:
Insert image description here

We implement the static version of Routine because the function pointer required to create a thread does not match the member function in the class, and the member function in the class has a thispointer.
After creating a thread, separate the thread from the main thread, that is, the main thread does not care about the resource recycling of the new thread.
When executing tasks, they must be executed outside the critical section, so that the efficiency of concurrent execution will be more efficient.
For the convenience of use, all member variables in the class are made public. It is recommended not to do this. You can write a get yourself.

As for other aspects, it is very simple and I believe everyone can easily understand it.
test program:

const char *ops = "+-*/%";

int main()
{
    
    
    threadPool<Task> *threads = new threadPool<Task>(30);
    threads->Start();
    srand((size_t)time(nullptr));
    while (true)
    {
    
    
        int x = rand() % 30 + 1;
        int y = rand() % 30 + 1;
        char op = ops[rand() % strlen(ops)];

        Task t(x, y, op);
        threads->PushTask(t);
        t.formatMsk();
        sleep(1);
    }

    return 0;
}

Let's run the results:
Insert image description here

3 The second version of the thread pool

In fact, the core idea of the second version of the thread pool is basically the same as the first one. The main reason is that the second version uses the thread creation class that we simulated and implemented ourselves. For example, we simulated and implemented it ourselves before (essentially encapsulating the thread pool in the library). A copy of Thread.hpp of the thread library interface):

#pragma once
#include <iostream>
#include <functional>
using namespace std;

class threadProcess
{
    
    
public:
    enum stu
    {
    
    
        NEW,
        RUNNING,
        EXIT
    };

    template<class T>
    threadProcess(int num, T exe, void *args)
        : _tid(0)
        , _status(NEW)
        ,_exe(exe)
        , _args(args)
    {
    
    
        char name[26];
        snprintf(name, 26, "thread%d", num);
        _name = name;
    }

    static void* runHelper(void *args)
    {
    
    
        threadProcess *ts = (threadProcess *)args; 
        
        (*ts)();
        return nullptr;
    }

    void operator()() // 仿函数
    {
    
    
        if (_exe != nullptr)
            _exe(_args);
    }

    void Run()
    {
    
    
        int n = pthread_create(&_tid, nullptr, runHelper, this);
        if (n != 0)
            exit(-1);
        _status = RUNNING;
    }

    void Join()
    {
    
    
        int n = pthread_join(_tid, nullptr);
        if (n != 0)
            exit(-1);
        _status = EXIT;
    }


    string _name;
    pthread_t _tid;
    stu _status;
    function<void*(void*)> _exe;
    void *_args;
};

In this way we can do it ourselves using our own thread library:

#pragma once 
#include"Thread.hpp"
#include<iostream>
#include<vector>
#include<queue>
using namespace std;

const int N=5;
template<class T>
class threadPool
{
    
    
public:
    threadPool(int sz=N)
        :_sz(sz)
    {
    
    
        pthread_mutex_init(&_mutex,nullptr);
        pthread_cond_init(&_cond,nullptr);
    }


    static void* Routine(void* args)//用内存池的多线程执行任务
    {
    
    
        //pthread_detach(pthread_self());调用自己写的线程接口不用在分离了，析构时在join掉就好了
        threadPool<T> *ptp=static_cast<threadPool<T> *>(args);

        while(true)
        {
    
    
            pthread_mutex_lock(&(ptp->_mutex));

            while((ptp->_masks).empty())
            {
    
    
                pthread_cond_wait(&(ptp->_cond),&(ptp->_mutex));
            }

            T task=(ptp->_masks).front();
            (ptp->_masks).pop();
            pthread_mutex_unlock(&(ptp->_mutex));

            task.run();//在临界区外执行任务
            task.formatRes();

        }

        return nullptr;

    }

    void Init()
    {
    
    
        for(int i=0;i<_sz;++i)
        {
    
    
           _threads.push_back(threadProcess(i+1,Routine,this));
        }
    }

    void Start()
    {
    
    
        for(auto& e:_threads)
        {
    
    
            e.Run();
        }
    }

    void PushTask(const T& task)
    {
    
    
        
        pthread_mutex_lock(&_mutex);
        _masks.push(task);
        pthread_mutex_unlock(&_mutex);
        pthread_cond_signal(&_cond);//记得唤醒休眠的线程去执行任务
    }

    ~threadPool()
    {
    
    
        for(auto& e:_threads)
        {
    
    
            e.Join();
        }
        pthread_mutex_destroy(&_mutex);
        pthread_cond_destroy(&_cond);
    }

    void Check()
    {
    
    
        for(auto& e:_threads)
        {
    
    
            cout<<"name:"<<e._name<<" id"<<e._tid<<endl;
        }
    }

    vector<threadProcess> _threads;
    queue<T> _masks;
    int _sz;//线程池中线程的个数
    pthread_mutex_t _mutex;
    pthread_cond_t _cond;
};

Test code:

int main()
{
    
    
    threadPool<Task> *threads = new threadPool<Task>(8);
    threads->Init();
    threads->Start();
    srand((size_t)time(nullptr));
    while (true)
    {
    
    
        int x = rand() % 30 + 1;
        int y = rand() % 30 + 1;
        char op = ops[rand() % strlen(ops)];

        Task t(x, y, op);
        threads->PushTask(t);
        t.formatMsk();
        sleep(1);
    }
    return 0;
}

operation result:
Insert image description here

4 The third version of the thread pool

This version of the thread pool adds aSingleton pattern. Because we found that we only need one thread pool, we use it 懒汉模式to create a singleton.

Code:

#pragma once
#include "Thread.hpp"
#include <iostream>
#include <vector>
#include <queue>
using namespace std;

const int N = 5;
template <class T>
class threadPool
{
    
    
public:

    static threadPool<T>* GetInstance(int sz=N)
    {
    
    
        if(_sta_obj==nullptr)
        {
    
    
            pthread_mutex_lock(&_mutex);
            if(_sta_obj==nullptr)
            {
    
    
                _sta_obj=new threadPool<T>(sz);
            }
            pthread_mutex_unlock(&_mutex);
        }
    }

    static void *Routine(void *args) // 用内存池的多线程执行任务
    {
    
    
        // pthread_detach(pthread_self());调用自己写的线程接口不用在分离了，析构时在join掉就好了
        threadPool<T> *ptp = static_cast<threadPool<T> *>(args);

        while (true)
        {
    
    
            pthread_mutex_lock(&(ptp->_mutex));

            while ((ptp->_masks).empty())
            {
    
    
                pthread_cond_wait(&(ptp->_cond), &(ptp->_mutex));
            }

            T task = (ptp->_masks).front();
            (ptp->_masks).pop();
            pthread_mutex_unlock(&(ptp->_mutex));

            task.run(); // 在临界区外执行任务
            task.formatRes();
        }

        return nullptr;
    }

    void Init()
    {
    
    
        for (int i = 0; i < _sz; ++i)
        {
    
    
            _threads.push_back(threadProcess(i + 1, Routine, this));
        }
    }

    void Start()
    {
    
    
        for (auto &e : _threads)
        {
    
    
            e.Run();
        }
    }

    void PushTask(const T &task)
    {
    
    

        pthread_mutex_lock(&_mutex);
        _masks.push(task);
        pthread_mutex_unlock(&_mutex);
        pthread_cond_signal(&_cond); // 记得唤醒休眠的线程去执行任务
    }

    ~threadPool()
    {
    
    
        for (auto &e : _threads)
        {
    
    
            e.Join();
        }
        pthread_mutex_destroy(&_mutex);
        pthread_cond_destroy(&_cond);
    }

    void Check()
    {
    
    
        for (auto &e : _threads)
        {
    
    
            cout << "name:" << e._name << " id" << e._tid << endl;
        }
    }

    vector<threadProcess> _threads;
    queue<T> _masks;
    int _sz; // 线程池中线程的个数
    pthread_mutex_t _mutex;
    pthread_cond_t _cond;

private:
    threadPool(int sz = N)
        : _sz(sz)
    {
    
    
        pthread_mutex_init(&_mutex, nullptr);
        pthread_cond_init(&_cond, nullptr);
    }

    threadPool(const threadPool<T>& th)=delete;
    threadPool<T>& operator=(const threadPool<T>& th)=delete;

    static threadPool<T>* _sta_obj;
};
template<class T>
threadPool<T>* threadPool<T>::_sta_obj=nullptr;

Points of note:
Insert image description here
We use it for efficiency when locking 双重if条件判断.

Insert image description here
Note that the constructor is made private, and copy construction and copy assignment are deleted.

5 Containers in STL and thread safety issues of smart pointers

Are containers in STL thread-safe?

no. The reason is that the original intention of STL is to maximize performance, but once it involves locking to ensure thread safety, it will have a huge impact on performance. Moreover, for different containers, the performance may be different depending on the locking method (for example, hash table lock table and lock bucket). Therefore, STL is not thread-safe by default. If it needs to be used in a multi-threaded environment, the caller is often required to ensure thread safety.

Are smart pointers thread-safe?

For unique_ptr, since it only takes effect within the scope of the current code block, there is no thread safety issue involved.
For shared_ptr, multiple objects need to share a reference count variable, so there will be thread safety issues. However, the standard library took this issue into consideration when implementing it, and based on the atomic operation (CAS) method to ensure that shared_ptr can operate reference counting efficiently and atomically.

6 other common locks

Pessimistic lock: Every time you fetch data, you are always worried that the data will be modified by other threads, so you will lock it (read lock, write lock, row lock, etc.) before fetching the data. When other threads want to access the data, Blocked and hung.
Optimistic locking: Every time when data is fetched, it is always optimistic that the data will not be modified by other threads, so it is not locked. But before updating the data, it will be judged whether other data has modified the data before updating. There are two main methods: version number mechanism and CAS operation. CAS operation: When data needs to be updated, determine whether the current memory value is equal to the previously obtained value. If equal, update with new value. If it does not wait, it will fail, and if it fails, it will try again. It is generally a spinning process, that is, constantly retrying.
Spin lock, fair lock, unfair lock.

7 Reader-Writer Problems (Understanding)

Read-write lock :
When writing multi-threading, there is a situation that is very common. That is, some public data have fewer opportunities to be modified. Compared with rewriting, the chance of them being read is much higher. Generally speaking, the reading process is often accompanied by a search operation, which takes a long time. Locking this kind of code segment will greatly reduce the efficiency of our program. So is there a way to specifically deal with this situation of more reading and less writing? Yes, that is a read-write lock.

Read-write lock behavior:

Current lock status	read lock request	write lock request
no lock	Can	Can
read lock	Can	block
write lock	block	block

Note: Write exclusive, read shared, read lock priority is high