C++: 07. Lock-free data structures

I have read a lot of blogs, most of which are very profound, and seem to be very strenuous, and I can only briefly summarize them due to my limited ability.

What is a lock-free data structure:

Let me first talk about what the lock is for. In a multi-threaded environment, because many operations are not atomic operations, multiple threads do a job at the same time. In order to prevent this from happening, we lock the code before executing it, so that Other processes cannot perform this step, and then unlock after execution, so that other threads can complete this step.

Let me talk about why you need a lock-free data structure: the process of locking and unlocking is very resource-intensive, because you have to switch from user mode to kernel mode. Simple steps plus a large number of locks will cause frequent cut-in and cut-out. In order to solve this paradoxical problem, there is a lock-free data structure.

Take a chestnut:

For example, simple operations like ++count (count is an integer variable) must be locked, because even incremental operations are actually performed in three steps: read, modify, and write.

movl x, %eax

addl $1, %eax

movl %eax, x

As long as another process is switched before writing, the other process completes the writing operation. Then switch back, the original process will continue to execute, and the final result will lead to +2.

In order to solve this problem, we must have thought of adding a mutex to control that only one thread can write to the queue at the same time, but the locking operation consumes too much system resources. Because there is only one step in the operation of the critical section is to update the tail node of the queue, so as long as this step is an atomic operation. The solution is to use the CAS operation.

Atomic operations (understanding):

The implementation mechanism of atomic operations is at the hardware level. The CPU will guarantee the atomicity of basic memory operations by default. The CPU guarantees that the behavior of reading or writing a byte from the system memory is definitely atomic. When a CPU reads When a byte is stored, other CPU processors cannot access the memory address of this byte. However, the CPU processor cannot automatically guarantee the atomicity of complex memory operations, such as cross-bus width or across multiple cache lines (Cache Line), cross-page table access, etc. At this time, the atomic operation instructions designed in the CPU instruction set need to be used. Most CPU instruction sets now support a series of atomic operations. The atomic operation often used in lock-free programming is the type of Read-Modify-Write (RMW), and the most commonly used atomic operation is COMPARE AND SWAP (CAS), which is supported by almost all CPU instruction sets. Atomic operation of CAS.

CAS operation:

CAS: Compare and Swap, compare and exchange.

CAS is implemented by CPU hardware. Way through the bus shackles. The process is not finished, and other instructions cannot use the bus.

伪代码如下:
bool CAS( int * pAddr, int nOld, int nNew )
{
    if ( *pAddr == nOld )   如果pAddr地址中值还等于原先的nOld
    {
        *pAddr = nNew ;   那么将 nNew 的值赋给此变量,
        return true ;     并返回true;
    }
    else  否则说明pAddr中的值已经不是nOld中的值了,那就不交换了。
    {
        return false ;
    }    
}

CAS所有执行过程都是原子性的、不可分的,不会产生任何可见的部分结果。

Of course the above returns bool. If you want to know the current value in the previous memory unit, just change the return value.

int CAS( int * pAddr, int nOld, int nNew )
{
    if ( *pAddr == nOld ) 
    {
        *pAddr = nNew ;
        return nOld;
    }
    else
    {
        return *pAddr;
    }   
}

 The corresponding two functions are provided in gcc:

bool __sync_bool_compare_and_swap (type *ptr, type oldval, type newval, ...)
type __sync_val_compare_and_swap (type *ptr, type oldval, type newval, ...)

Take the second chestnut:

In a multi-threaded environment, when enqueuing the same chain queue, a thread A is mounting the new queue node to the next of the tail node of the queue, but it has not yet come and updated the tail node of the queue. At the same time, another thread B is also performing the enqueue operation and hangs the new queue node on the tail node that has not been updated, then the node mounted by the A thread is lost.

EnQueue(x) //入队列方法
{ 
    q = new record();
    q->value = x; //队列节点的值
    q->next = NULL;//下一个节点
    p = tail; //保存尾节点指针
    oldp = p;
    do { //开始 loop  cas
         while (p->next != NULL) //用来防止进行cas(tail,oldp,q)操作的线程挂掉引起死循环
            p = p->next;
    } while( CAS(p.next, NULL, q) != TRUE); 
CAS(tail, oldp, q); 
}

DeQueue() //出队列方法
{
    do{
        p = head;
        if (p->next == NULL)
        {
            return ERR_EMPTY_QUEUE;
        }
    }while( CAS(head, p, p->next) != TRUE );
    return p->next->value;
}

Implementation:

From 4.1.2, gcc provides the __sync_* series of built-in functions to provide atomic operations for addition, subtraction and logical operations.

Its declaration is as follows:

原子操作的 后置加加:
type __sync_fetch_and_add (type *ptr, type value, ...)

原子操作的 前置加加:
type __sync_add_and_fetch (type *ptr, type value, ...)

其他类比
type __sync_fetch_and_sub (type *ptr, type value, ...)
type __sync_fetch_and_or (type *ptr, type value, ...)
type __sync_fetch_and_and (type *ptr, type value, ...)
type __sync_fetch_and_xor (type *ptr, type value, ...)
type __sync_fetch_and_nand (type *ptr, type value, ...)


type __sync_sub_and_fetch (type *ptr, type value, ...)
type __sync_or_and_fetch (type *ptr, type value, ...)
type __sync_and_and_fetch (type *ptr, type value, ...)
type __sync_xor_and_fetch (type *ptr, type value, ...)
type __sync_nand_and_fetch (type *ptr, type value, ...)

这两组函数的区别在于第一组返回更新前的值,第二组返回更新后的值。

Disadvantages of CAS:

1. ABA questions:

Because CAS needs to check whether the value has changed when operating the value, if there is no change, it will be updated, but if a value was originally A, changed to B, and then changed to A, then when using CAS to check, it will be found Its value didn't change, but it actually did.

The solution to the ABA problem is to use the version number. Add the version number in front of the variable, and add one to the version number every time the variable is updated, then A-B-A will become 1A - 2B-3A.

2. Long cycle time and high cost:

If the spin CAS fails for a long time, it will bring a very large execution overhead to the CPU.

3. When multiple shared variables are operated, CAS cannot guarantee the atomicity of the operation.

We can only guarantee the atomic operation of a shared variable. When performing an operation on a shared variable, we can use the cyclic CAS method to ensure the atomic operation, but when operating on multiple shared variables, the cyclic CAS cannot guarantee the atomicity of the operation. At this time You can use locks, or there is a tricky way to combine multiple shared variables into one shared variable for operation. For example, there are two shared variables i=2, j=a, merge ij=2a, and then use CAS to operate ij.

Guess you like

Origin blog.csdn.net/qq_41214278/article/details/83825815