Performance Comparison C ++ concurrent programming of the mutex and condition variable

Introduction

In this paper, the simplest producer-consumer model, by running the program, observe the cpu usage of the process, to compare the use of mutex and mutex + condition variable performance comparison.

Producers and consumers model of the present example, . 1 producers, . 5 th consumer.
Producer thread to queue into the data, five consumer threads fetch data from the queue, you need to determine before taking the data to see if there is data in the queue, the queue is a global queue, the data is shared between threads, so it is necessary to use a mutex lock protection. I.e., when the producers put into the queue data, consumers can not get to rest, and vice versa.

Mutex achieve code

#include <iostream> // std::cout
#include <deque>    // std::deque
#include <thread>   // std::thread
#include <chrono>   // std::chrono
#include <mutex>    // std::mutex


// 全局队列
std::deque<int> g_deque;

// 全局锁
std::mutex g_mutex;

// 生产者运行标记
bool producer_is_running = true;

// 生产者线程函数
void Producer()
{
    // 库存个数
    int count = 8;
    
    do
    {
        // 智能锁，初始化后即加锁，保护的范围是代码花括号内，花括号退出即会自动解锁
        // 可以手动解锁，从而控制互斥锁的细粒度
        std::unique_lock<std::mutex> locker( g_mutex );
        // 入队一个数据
        g_deque.push_front( count );
        // 提前解锁，缩小互斥锁的细粒度，只针对共享的队列数据进行同步保护
        locker.unlock(); 

        std::cout << "生产者    ：我现在库存有 :" << count << std::endl;
            
        // 放慢生产者生产速度，睡1秒
        std::this_thread::sleep_for( std::chrono::seconds( 1 ) );

        // 库存自减少
        count--;
    } while( count > 0 );
    
    // 标记生产者打样了
    producer_is_running = false;

    std::cout << "生产者    ： 我的库存没有了，我要打样了！"  << std::endl;
}

// 消费者线程函数
void Consumer(int id)
{
    int data = 0;

    do
    {
        std::unique_lock<std::mutex> locker( g_mutex );
        if( !g_deque.empty() )
        {
            data = g_deque.back();
            g_deque.pop_back();
            locker.unlock();

            std::cout << "消费者[" << id << "] : 我抢到货的编号是 :" << data << std::endl;
        }
        else
        {
            locker.unlock();
        }
    } while( producer_is_running );
    
    std::cout << "消费者[" << id << "] ：卖家没有货打样了，真可惜，下次再来抢！"  << std::endl;
}

int main(void)
{
    std::cout << "1 producer start ..." << std::endl;
    std::thread producer( Producer );

    std::cout << "5 consumer start ..." << std::endl;
    std::thread consumer[ 5 ];
    for(int i = 0; i < 5; i++)
    {
        consumer[i] = std::thread(Consumer, i + 1);
    }

    producer.join();

    for(int i = 0; i < 5; i++)
    {
        consumer[i].join();
    }

    std::cout << "All threads joined." << std::endl;

    return 0;
}

Mutex to achieve operating results:

The results output

[root@lincoding condition]# g++ -std=c++0x -pthread -D_GLIBCXX_USE_NANOSLEEP main.cpp -o  main
[root@lincoding condition]# ./main
1 producer start ...
5 consumer start ...
生产者    ：我现在库存有 :8
消费者[1] : 我抢到货的编号是 :8
消费者[1] : 我抢到货的编号是 :7
生产者    ：我现在库存有 :7
生产者    ：我现在库存有 :6
消费者[3] : 我抢到货的编号是 :6
生产者    ：我现在库存有 :5
消费者[1] : 我抢到货的编号是 :5
生产者    ：我现在库存有 :4
消费者[2] : 我抢到货的编号是 :4
生产者    ：我现在库存有 :3
消费者[5] : 我抢到货的编号是 :3
生产者    ：我现在库存有 :2
消费者[2] : 我抢到货的编号是 :2
生产者    ：我现在库存有 :1
消费者[1] : 我抢到货的编号是 :1
生产者    ： 我的库存没有了，我要打样了！消费者[
5] ：卖家没有货打样了，真可惜，下次再来抢！
消费者[2] ：卖家没有货打样了，真可惜，下次再来抢！
消费者[3] ：卖家没有货打样了，真可惜，下次再来抢！
消费者[4] ：卖家没有货打样了，真可惜，下次再来抢！
消费者[1] ：卖家没有货打样了，真可惜，下次再来抢！
All threads joined.

It can be seen mutex can actually accomplish this task, but there are performance issues.

ProducerIs the producer thread, the producer data process, will rest 1秒, so the production process is very slow;
ConsumerConsumer thread, there is a whilecycle, only to determine the producer does not run, and will exit the whileloop, the loop every time in the body, are locked will first determine the queue is not empty, then remove from a data queue, Finally unlock. So, the producer resting 1秒time, consumer thread actually do a lot of wasted effort, resulting in CPU usage is very high!

Operating environment is the 4-core cpu

[root@lincoding ~]# grep 'model name' /proc/cpuinfo | wc -l
4

top command to check the cpu usage, visible using pure mutex cpu overhead is great, maincpu utilization of the process reached 357.5%CPU, the system overhead is cpu 54.5%sy, cpu overhead for the user18.2%us

[root@lincoding ~]# top
top - 19:13:41 up 36 min,  3 users,  load average: 0.06, 0.05, 0.01
Tasks: 179 total,   1 running, 178 sleeping,   0 stopped,   0 zombie
Cpu(s): 18.2%us, 54.5%sy,  0.0%ni, 27.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1004412k total,   313492k used,   690920k free,    41424k buffers
Swap:  2031608k total,        0k used,  2031608k free,    79968k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                       
 35346 root      20   0  137m 3288 1024 S 357.5  0.3   0:05.92 main                                                                                                                          
     1 root      20   0 19232 1492 1224 S  0.0  0.1   0:02.16 init                                                                                                                           
     2 root      20   0     0    0    0 S  0.0  0.0   0:00.01 kthreadd                                                                                                                       
     3 root      RT   0     0    0    0 S  0.0  0.0   0:00.68 migration/0

One of the solutions is to give consumers also add a small delay when consumers did not get to the data, on a break 500毫秒, thus reducing the overhead of mutex to bring the cpu.

// 消费者线程函数
void Consumer(int id)
{
    int data = 0;

    do
    {
        std::unique_lock<std::mutex> locker( g_mutex );
        if( !g_deque.empty() )
        {
            data = g_deque.back();
            g_deque.pop_back();
            locker.unlock();

            std::cout << "消费者[" << id << "] : 我抢到货的编号是 :" << data << std::endl;
        }
        else
        {
            locker.unlock();
            // 当消费者没取到数据时，就休息一下500毫秒
            std::this_thread::sleep_for( std::chrono::milliseconds( 500 ) );
        }
    } while( producer_is_running );
    
    std::cout << "消费者[" << id << "] ：卖家没有货打样了，真可惜，下次再来抢！"  << std::endl;
}

From the running result shows, cpu usage greatly reduced

[root@lincoding ~]# ps aux | grep -v grep  |grep main
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      61296  0.0  0.1 141068  1244 pts/1    Sl+  19:40   0:00 ./main

Mutex condition variable codes implemented +

So the question is, how to determine the consumer delay (rest) for how long?

If the producers to produce very fast, consumers have delayed 500毫秒, not very good
If the producers to produce more slowly, so consumption delay 500毫秒, there will be a wasted effort, taking up CPU

This requires the introduction of condition variables std::condition_variable, used in the production of consumer models, is the producer after a complete production data, through notify_one()awaken wait()consumer threads, allowing consumers to remove data from a queue.

#include <iostream> // std::cout
#include <deque>    // std::deque
#include <thread>   // std::thread
#include <chrono>   // std::chrono
#include <mutex>    // std::mutex

#include <condition_variable> // std::condition_variable


// 全局队列
std::deque<int> g_deque;

// 全局锁
std::mutex g_mutex;

// 全局条件变量
std::condition_variable g_cond;

// 生产者运行标记
bool producer_is_running = true;

// 生产者线程函数
void Producer()
{
    // 库存个数
    int count = 8;
    
    do
    {
        // 智能锁，初始化后即加锁，保护的范围是代码花括号内，花括号退出即会自动解锁
        // 可以手动解锁，从而控制互斥锁的细粒度
        std::unique_lock<std::mutex> locker( g_mutex );
        // 入队一个数据
        g_deque.push_front( count );
        // 提前解锁，缩小互斥锁的细粒度，只针对共享的队列数据进行同步保护
        locker.unlock(); 

        std::cout << "生产者    ：我现在库存有 :" << count << std::endl;
        
        // 唤醒一个线程
        g_cond.notify_one();
        
        // 睡1秒
        std::this_thread::sleep_for( std::chrono::seconds( 1 ) );

        // 库存自减少
        count--;
    } while( count > 0 );
    
    // 标记生产者打样了
    producer_is_running = false;
    
    // 唤醒所有消费线程
    g_cond.notify_all();
    
    std::cout << "生产者    ： 我的库存没有了，我要打样了！"  << std::endl;
}

// 消费者线程函数
void Consumer(int id)
{
    // 购买的货品编号
    int data = 0;

    do
    {
        // 智能锁，初始化后即加锁，保护的范围是代码花括号内，花括号退出即会自动解锁
        // 可以手动解锁，从而控制互斥锁的细粒度
        std::unique_lock<std::mutex> locker( g_mutex );
        
        // wait()函数会先调用互斥锁的unlock()函数，然后再将自己睡眠，在被唤醒后，又会继续持有锁，保护后面的队列操作
        // 必须使用unique_lock，不能使用lock_guard，因为lock_guard没有lock和unlock接口，而unique_lock则都提供了
        g_cond.wait(locker); 
        
        // 队列不为空
        if( !g_deque.empty() )
        {
            // 取出队列里最后一个数据
            data = g_deque.back();
            
            // 删除队列里最后一个数据
            g_deque.pop_back();
            
            // 提前解锁，缩小互斥锁的细粒度，只针对共享的队列数据进行同步保护
            locker.unlock(); 

            std::cout << "消费者[" << id << "] : 我抢到货的编号是 :" << data << std::endl;
        }
        // 队列为空
        else
        {
            locker.unlock();
        }
    
    } while( producer_is_running );
    
    std::cout << "消费者[" << id << "] ：卖家没有货打样了，真可惜，下次再来抢！"  << std::endl;
}

int main(void)
{
    std::cout << "1 producer start ..." << std::endl;
    std::thread producer( Producer );

    std::cout << "5 consumer start ..." << std::endl;
    std::thread consumer[ 5 ];
    for(int i = 0; i < 5; i++)
    {
        consumer[i] = std::thread(Consumer, i + 1);
    }

    producer.join();

    for(int i = 0; i < 5; i++)
    {
        consumer[i].join();
    }

    std::cout << "All threads joined." << std::endl;

    return 0;
}

+ Mutex condition variable operating results

[root@lincoding condition]# g++ -std=c++0x -pthread -D_GLIBCXX_USE_NANOSLEEP main.cpp -o  main
[root@lincoding condition]# 
[root@lincoding condition]# ./main 
1 producer start ...
5 consumer start ...
生产者    ：我现在库存有 :8
消费者[4] : 我抢到货的编号是 :8
生产者    ：我现在库存有 :7
消费者[2] : 我抢到货的编号是 :7
生产者    ：我现在库存有 :6
消费者[3] : 我抢到货的编号是 :6
生产者    ：我现在库存有 :5
消费者[5] : 我抢到货的编号是 :5
生产者    ：我现在库存有 :4
消费者[1] : 我抢到货的编号是 :4
生产者    ：我现在库存有 :3
消费者[4] : 我抢到货的编号是 :3
生产者    ：我现在库存有 :2
消费者[2] : 我抢到货的编号是 :2
生产者    ：我现在库存有 :1
消费者[3] : 我抢到货的编号是 :1
生产者    ： 我的库存没有了，我要打样了！
消费者[5] ：卖家没有货打样了，真可惜，下次再来抢！
消费者[1] ：卖家没有货打样了，真可惜，下次再来抢！
消费者[4] ：卖家没有货打样了，真可惜，下次再来抢！
消费者[2] ：卖家没有货打样了，真可惜，下次再来抢！
消费者[3] ：卖家没有货打样了，真可惜，下次再来抢！
All threads joined.

CPU overhead is very small

[root@lincoding ~]# ps aux | grep -v grep  |grep main
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      73838  0.0  0.1 141068  1256 pts/1    Sl+  19:54   0:00 ./main

to sum up

In the production rate uncertainty producers is fast or slow scene, not just use a mutex to protect shared data, this CPU performance overhead will have a very large way mutex + condition variable can be used, when production who produced a thread data, they wake up the consumer thread consumption, avoid wasted effort performance overhead.