Talk about those things about atomic operations

Atomic operation, the most fine-grained synchronization operation of interactive data between threads, which can ensure the atomicity of reading and writing a certain value between threads.

Since there is no need to add heavyweight mutex locks for synchronization, it is very lightweight, and there is no need to switch scheduling back and forth between the cores, and the efficiency is very high. .

So how to use atomic operations, there are related APIs under each platform to provide support, and compiler-level __builtin interfaces are also provided to support compilers such as gcc and clang

  1. Windows Interlockedxxx and Interlockedxxx64 series api
  2. OSAtomicXXX series api of macosx
  3. gcc __sync_val_compare_and_swapand __sync_val_compare_and_swap_8other __builtin interface
  4. lockAssembly instructions for x86 and x86_64 architecture
  5. tbox's cross-platform atomic interface

tbox interface use

tb_atomic_fetch_and_addTake the interface of tbox as an example. As the name suggests, this api will first read the original value, and then add a value to it:

// 相当于原子进行:b = *a++;
tb_atomic_t a = 0;
tb_long_t   b = tb_atomic_fetch_and_add(&a, 1);

If you need to perform add calculation first, and then return the result, you can use:

// 相当于原子进行:b = ++*a;
tb_atomic_t a = 0;
tb_long_t   b = tb_atomic_add_and_fetch(&a, 1);

Or it can be more simplified as:

tb_long_t b = tb_atomic_fetch_and_inc(&a);
tb_long_t b = tb_atomic_inc_and_fetch(&a);

So how does tbox adapt to various platforms internally, we can simply look at it, basically it is just a layer of wrap for the native api.

windows interface package

static __tb_inline__ tb_long_t tb_atomic_fetch_and_add_windows(tb_atomic_t* a, tb_long_t v)
{
    return (tb_long_t)InterlockedExchangeAdd((LONG __tb_volatile__*)a, v);
}
static __tb_inline__ tb_long_t tb_atomic_inc_and_fetch_windows(tb_atomic_t* a)
{
    return (tb_long_t)InterlockedIncrement((LONG __tb_volatile__*)a);
}

Encapsulation of gcc interface

static __tb_inline__ tb_long_t tb_atomic_fetch_and_add_sync(tb_atomic_t* a, tb_long_t v)
{
    return __sync_fetch_and_add(a, v);
}

Assembly implementation of x86 and x86_64 architecture

static __tb_inline__ tb_long_t tb_atomic_fetch_and_add_x86(tb_atomic_t* a, tb_long_t v)
{
    /*
     * xaddl v, [a]:
     *
     * o = [a]
     * [a] += v;
     * v = o;
     *
     * cf, ef, of, sf, zf, pf... maybe changed
     */
    __tb_asm__ __tb_volatile__ 
    (
#if TB_CPU_BITSIZE == 64
        "lock xaddq %0, %1 \n"          //!< xaddq v, [a]
#else
        "lock xaddl %0, %1 \n"          //!< xaddl v, [a]
#endif

        : "+r" (v) 
        : "m" (*a) 
        : "cc", "memory"
    );

    return v;
}

In addition to atomic operations that can perform addition, subtraction and multiplication of int32 and int64 values, it can also perform logical calculations such as xor, or, and. The usage is similar, so I won't say more here.

Let's take a simple example below, to put it into practice, there are still many application scenarios for atoms, such as:

  • Used to implement spin lock
  • Used to implement lock-free queues
  • State synchronization between threads
  • Used to implement singleton

and many more. .

Implementation of spin lock

Let's first look at how to implement a simple spin lock. In order to unify the standard demonstration code, the following code uses the atomic interface provided by tbox as an example:

static __tb_inline_force__ tb_bool_t tb_spinlock_init(tb_spinlock_ref_t lock)
{
    // init 
    *lock = 0;

    // ok
    return tb_true;
}
static __tb_inline_force__ tb_void_t tb_spinlock_exit(tb_spinlock_ref_t lock)
{
    // exit 
    *lock = 0;
}
static __tb_inline_force__ tb_void_t tb_spinlock_enter(tb_spinlock_ref_t lock)
{
    /* 尝试读取lock的状态值,如果还没获取到lock(状态0),则获取它(设置为1)
     * 如果对方线程已经获取到lock(状态1),那么循环等待尝试重新获取
     *
     * 注:整个状态读取和设置,是原子的,无法被打断
     */
    tb_size_t tryn = 5;
    while (tb_atomic_fetch_and_pset((tb_atomic_t*)lock, 0, 1))
    {
        // 没获取到lock,尝试5次后,还不成功,则让出cpu切到其他线程运行,之后重新尝试获取
        if (!tryn--)
        {
            // yield
            tb_sched_yield();

            // reset tryn
            tryn = 5;
        }
    }
}
static __tb_inline_force__ tb_void_t tb_spinlock_leave(tb_spinlock_ref_t lock)
{
    // 释放lock,此处无需原子,设置到一半被打断,数值部位0,对方线程还是在等待中,不收影响
    *((tb_atomic_t*)lock) = 0;
}

This implementation is very simple, but in tbox, this spinlock is basically used by default, because most of the multi-threaded implementation in tbox has a very fine granularity.

In most cases, using a spin lock is ok, and there is no need to enter the kernel mode to switch and wait. .

The usage is as follows:

// 获取lock
tb_spinlock_enter(&lock);

// 一些同步操作
// ..

// 释放lock
tb_spinlock_leave(&lock);

In the above code, the init and exit operations are omitted. In actual use, it is sufficient to do the corresponding processing in the response to the initialization and release. .

pthread_onceImplementation of the class

pthread_once It can be in a multi-threaded function to ensure that the passed-in function is only called once. Generally, it can be used to initialize a global singleton or TLS key initialization

Taking the interface of tbox as an example, let me first look at how this function is used:


// 初始化函数,只会被调用到一次
static tb_void_t tb_once_func(tb_cpointer_t priv)
{
    // 初始化一些单例对象,全局变量
    // 或者执行一些初始化调用
}

// 线程函数
static tb_int_t tb_thread_func(tb_cpointer_t priv)
{
    // 全局存储lock,并初始化为0
    static tb_atomic_t lock = 0;
    if (tb_thread_once(&lock, tb_once_func, "user data"))
    {
        // ok
    }
}

Let's take the atomic operation here, and we can simply simulate and implement this function:

tb_bool_t tb_thread_once(tb_atomic_t* lock, tb_bool_t (*func)(tb_cpointer_t), tb_cpointer_t priv)
{
    // check
    tb_check_return_val(lock && func, tb_false);

    /* 原子获取lock的状态
     *
     * 0: func还没有被调用
     * 1: 已经获取到lock,func正在被其他线程调用中
     * 2: func已经被调用完成,并且func返回ok
     * -2: func已经被调用,并且func返回失败failed
     */
    tb_atomic_t called = tb_atomic_fetch_and_pset(lock, 0, 1);

    // func已经被其他线程调用过了?直接返回
    if (called && called != 1) 
    {
        return called == 2;
    }
    // func还没有被调用过?那么调用它
    else if (!called)
    {
        // 调用函数
        tb_bool_t ok = func(priv);

        // 设置返回状态
        tb_atomic_set(lock, ok? 2 : -1);

        // ok?
        return ok;
    }
    // 正在被其他线程获取到lock,func正在被调用中,还没完成?尝试等待lock
    else
    {
        // 此处简单的做了些sleep循环等待,直到对方线程func执行完成
        tb_size_t tryn = 50;
        while ((1 == tb_atomic_get(lock)) && tryn--)
        {
            // wait some time
            tb_msleep(100);
        }
    }

    /* 重新获取lock的状态,判断是否成功
     * 
     * 成功:2
     * 超时:1
     * 失败:-2
     *
     * 此处只要不是2,都算失败
     */
    return tb_atomic_get(lock) == 2;
}

64-bit atomic operation

The 64-bit operation is exactly the same as the 32-bit interface usage, except for the difference in variable types:

  1. The type in tbox is tb_atomic64_t, the interface is changed totb_atomic64_xxxx
  2. The type in gcc is volatile long long, the interface is changed to __sync_xxxx_8series
  3. Interlockedxxx64 on windows

Refer to 32-bit for specific usage, which will not be described in detail here. .


Personal homepage: TBOOX open source project
Original source: http://tboox.org/cn/2016/09/30/atomic-operation/

Guess you like

Origin blog.csdn.net/waruqi/article/details/53201596