In-depth understanding of glibc's mutex locking process

In-depth understanding of glibc's mutex locks

Mutex lock is a commonly used method for synchronization of multithreads. Using mutex lock can protect the pairOperations of shared resources. Shared resources are also called critical sections. When a thread After the critical section is locked, other threads cannot enter the critical section until the thread holding the critical section lock releases the lock.

This article takes the implementation of mutex in glibc as an example to explain the implementation principle behind it.

glibc mutex type

The type name of glibc's mutex lock ispthread_mutex_t, and its structure can be represented by the following structure:

typedef struct {
    
    
    int __lock;
    int __count;
    int __owner;
    int __nusers;
    int __kind;
    // other ignore
} pthread_mutex_t;

in:

__lock indicatesthe current mutex status, 0 indicates that the mutex is not locked, 1 indicates that the mutex has been locked, and 2 indicates that the mutex is locked by a certain A thread holds it and another thread is waiting for its release.
__count representsthe number of times mutex is locked. For non-reentrant locks, the value is 0 or 1. For reentrant locks , count can be greater than 1.
__owner is used to record the thread id that holds the current mutex.
__nusers is used to recordhow many threads hold the mutex lock. Generally speaking, the value can only be 0 or 1, but For read-write locks, multiple read threads can jointly hold the lock, so nusers are usually used in read-write lock scenarios.
__kind displaylock type.

The pthread_mutex_t lock can be of the following types:

PTHREAD_MUTEX_TIMED_NP: Normal lock, when a thread locks, the other threads requesting the lock will form a waiting queue, and after unlocking Get locks by priority. This locking strategy ensures fairness in resource allocation. When the lock is unlocked, a thread in the waiting queue will be awakened.
PTHREAD_MUTEX_RECURSIVE_NP: Reentrant lock, if the thread contends for the lock without acquiring the mutex, it is the same as PTHREAD_MUTEX_TIMED_NP. If a thread has already acquired the lock, it can acquire the lock again and unlock it by unlocking multiple times.
PTHREAD_MUTEX_ERRORCHECK_NP: Error checking lock, if the same thread requests the same lock, EDEADLK is returned instead of deadlock, other points and Same as PTHREAD_MUTEX_TIMED_NP.
PTHREAD_MUTEX_ADAPTIVE_NP: Adaptive lock, this lock first spins to acquire the lock under multi-core processors. If the number of spins exceeds the configured maximum number of times, it will also fall into kernel state hang.

mutex locking process

The source code used in this article is version glibc-2.34, http://mirror.keystealth.org/gnu/libc/glibc-2.34.tar.gz.

This article mainly focuses on explaining the locking process of mutex lockfrom user mode to kernel mode, and the implementation details of different types of locks, This article does not focus on the discussion. This will be discussed in other articles later.

The following is the simplest typePTHREAD_MUTEX_TIMED_NP to track the locking process, starting from ___pthread_mutex_lock, which is defined in pthread_mutex_lock.c middle.

As shown below, the lock of PTHREAD_MUTEX_TIMED_NP will callllll_mutex_lock_optimized Method to lock, as shown below:

  if (__builtin_expect (type & ~(PTHREAD_MUTEX_KIND_MASK_NP
				 | PTHREAD_MUTEX_ELISION_FLAGS_NP), 0))
    return __pthread_mutex_lock_full (mutex);

  if (__glibc_likely (type == PTHREAD_MUTEX_TIMED_NP))
    {
    
    
      FORCE_ELISION (mutex, goto elision);
    simple:
      /* Normal mutex.  */
      LLL_MUTEX_LOCK_OPTIMIZED (mutex);
      assert (mutex->__data.__owner == 0);
    }

lll_mutex_lock_optimized is also defined in the nptl/pthread_mutex_lock.c file. From the comments, we know that this is an optimization for a single thread. If it is a single thread, then Directly modify the value of mutex's __lock to 1 (because there is no competition), and if it is not single-threaded, call the lll_lock method.

#ifndef LLL_MUTEX_LOCK
/* lll_lock with single-thread optimization.  */
static inline void
lll_mutex_lock_optimized (pthread_mutex_t *mutex)
{
    
    
  /* The single-threaded optimization is only valid for private
     mutexes.  For process-shared mutexes, the mutex could be in a
     shared mapping, so synchronization with another process is needed
     even without any threads.  If the lock is already marked as
     acquired, POSIX requires that pthread_mutex_lock deadlocks for
     normal mutexes, so skip the optimization in that case as
     well.  */
  int private = PTHREAD_MUTEX_PSHARED (mutex);
  if (private == LLL_PRIVATE && SINGLE_THREAD_P && mutex->__data.__lock == 0)
    mutex->__data.__lock = 1;
  else
    lll_lock (mutex->__data.__lock, private);
}

# define LLL_MUTEX_LOCK(mutex)						\
  lll_lock ((mutex)->__data.__lock, PTHREAD_MUTEX_PSHARED (mutex))

lll_lock is defined in the file sysdeps/nptl/lowlevellock.h, and the **__lll_lock method will be called due to competition. , so the CAS method** is used in the __lll_lock method to try to modify the __lock value of mutex.

CAS means compare-and-swap, which is the basis for the implementation of atomic variables. Its pseudocode is as follows, that is, if the value out of memory mem is equal to old_value, it is replaced with new_value. This process is atomic. The bottom layer is guaranteed by the CMPXCHG instruction.

bool CAS(T* mem, T new_value, T old_value) {
    
    
    if (*mem == old_value) {
    
    
        *mem = new_value;
        return true;
    } else {
    
    
        return false;
    }
}

__lll_lockatomic_compare_and_exchange_bool_acq is the CAS method mentioned above. If futex = 0, try to modify it to 1, which means adding The lock is successful. If futex >= 1, **__lll_lock_wait_privateor__lll_lock_wait** will be called. Note that the futex here is actually the __lock in the mutex structure.

#define __lll_lock(futex, private)                                      \
  ((void)                                                               \
   ({
      
                                                                         \
     int *__futex = (futex);                                            \
     if (__glibc_unlikely                                               \
         (atomic_compare_and_exchange_bool_acq (__futex, 1, 0)))        \
       {
      
                                                                      \
         if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \
           __lll_lock_wait_private (__futex);                           \
         else                                                           \
           __lll_lock_wait (__futex, private);                          \
       }                                                                \
   }))
#define lll_lock(futex, private)	\
  __lll_lock (&(futex), private)

__lll_lock_wait_private and __lll_lock_wait are similar. Here we will first callatomic_exchange_acquire to exchange the old value of futex with 2, and the return value is futex Theold value.

So if its return value is not 0, it means that the current lock is still locked state, you may need to enter the kernel state to wait (call futex_wait). If it returns 0, it means that the current lock has been released, the locking is successful, and the loop exits.

Note that the purpose of changing the futex value to 2 is to improve the efficiency ofpthread_mutex_unlock. In pthread_mutex_unlock, atomic_exchange_rel() will be called to unconditionally update the value of mutex->__lock to 0, and check the original value of mutex->__lock Value, if the original value is 0 or 1, it means that no competition occurs, and naturally there is no need to call the futex system call, which is a waste of time. Only when it is checked that the value of mutex->__lock is greater than 1, it is necessary to call the futex system call to wake up the thread waiting for the lock.

void
__lll_lock_wait_private (int *futex)
{
    
    
  if (atomic_load_relaxed (futex) == 2)
    goto futex;

  while (atomic_exchange_acquire (futex, 2) != 0)
    {
    
    
    futex:
      LIBC_PROBE (lll_lock_wait_private, 1, futex);
      futex_wait ((unsigned int *) futex, 2, LLL_PRIVATE); /* Wait if *futex == 2.  */
    }
}
libc_hidden_def (__lll_lock_wait_private)

void
__lll_lock_wait (int *futex, int private)
{
    
    
  if (atomic_load_relaxed (futex) == 2)
    goto futex;

  while (atomic_exchange_acquire (futex, 2) != 0)
    {
    
    
    futex:
      LIBC_PROBE (lll_lock_wait, 1, futex);
      futex_wait ((unsigned int *) futex, 2, private); /* Wait if *futex == 2.  */
    }
}

__lll_lock_wait_private and __lll_lock_wait call futex_wait. This function is relatively simple and will call the lll_futex_timed_wait method internally.

static __always_inline int
futex_wait (unsigned int *futex_word, unsigned int expected, int private)
{
    
    
  int err = lll_futex_timed_wait (futex_word, expected, NULL, private);
  switch (err)
    {
    
    
    case 0:
    case -EAGAIN:
    case -EINTR:
      return -err;

    case -ETIMEDOUT: /* Cannot have happened as we provided no timeout.  */
    case -EFAULT: /* Must have been caused by a glibc or application bug.  */
    case -EINVAL: /* Either due to wrong alignment or due to the timeout not
		     being normalized.  Must have been caused by a glibc or
		     application bug.  */
    case -ENOSYS: /* Must have been caused by a glibc bug.  */
    /* No other errors are documented at this time.  */
    default:
      futex_fatal_error ();
    }
}

The lll_futex_timed_wait method is actually an encapsulation of the sys_futex system call, which will eventually call the sys_futex method.

# define lll_futex_timed_wait(futexp, val, timeout, private)     \
  lll_futex_syscall (4, futexp,                                 \
		     __lll_private_flag (FUTEX_WAIT, private),  \
		     val, timeout)

# define lll_futex_syscall(nargs, futexp, op, ...)                      \
  ({
      
                                                                          \
    long int __ret = INTERNAL_SYSCALL (futex, nargs, futexp, op, 	\
				       __VA_ARGS__);                    \
    (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (__ret))         	\
     ? -INTERNAL_SYSCALL_ERRNO (__ret) : 0);                     	\
  })

#define __NR_futex 202
#undef INTERNAL_SYSCALL
#define INTERNAL_SYSCALL(name, nr, args...)				\
	internal_syscall##nr (SYS_ify (name), args)

#undef SYS_ify
#define SYS_ify(syscall_name)	__NR_##syscall_name


#undef internal_syscall4
#define internal_syscall4(number, arg1, arg2, arg3, arg4)		\
({
      
      									\
    unsigned long int resultvar;					\
    TYPEFY (arg4, __arg4) = ARGIFY (arg4);			 	\
    TYPEFY (arg3, __arg3) = ARGIFY (arg3);			 	\
    TYPEFY (arg2, __arg2) = ARGIFY (arg2);			 	\
    TYPEFY (arg1, __arg1) = ARGIFY (arg1);			 	\
    register TYPEFY (arg4, _a4) asm ("r10") = __arg4;			\
    register TYPEFY (arg3, _a3) asm ("rdx") = __arg3;			\
    register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;			\
    register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;			\
    asm volatile (							\
    "syscall\n\t"							\
    : "=a" (resultvar)							\
    : "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4)		\
    : "memory", REGISTERS_CLOBBERED_BY_SYSCALL);			\
    (long int) resultvar;						\
})

The function prototype of sys_futex is as follows:

int sys_futex (int *uaddr, int op, int val, const struct timespec *timeout);

Its function is to check whether the counter in uaddr is val, if so, let the process sleep until FUTEX_WAKE or time-out. That is, the process is placed in the waiting queue corresponding to uaddr.

This is actually checking whether the **__lock of mutex is equal to2**.

If it is not equal to 2, it means that the lock may have been released and there is no need to add the thread to the sleep queue. sys_futex returns directly and tries to lock again.
If it is equal to 2, it means that the value of the lock has not changed during the period from user mode to kernel segment, so the thread is added to the sleep queue and waits for other threads to release the lock.

The locking of glibc's mutex is the result of the joint action of user-mode atomic operations and kernel-mode sys_futex. The above process can be summarized by the following flow chart:

glic-mutex

The unlocking process of mutex

Also starting from the simplest PTHREAD_MUTEX_TIMED_NP, it calls the lll_mutex_unlock_optimized method to unlock.

  if (__builtin_expect (type, PTHREAD_MUTEX_TIMED_NP)
      == PTHREAD_MUTEX_TIMED_NP)
    {
    
    
      /* Always reset the owner field.  */
    normal:
      mutex->__data.__owner = 0;
      if (decr)
	/* One less user.  */
	--mutex->__data.__nusers;

      /* Unlock.  */
      lll_mutex_unlock_optimized (mutex);

      LIBC_PROBE (mutex_release, 1, mutex);

      return 0;
    }

lll_mutex_unlock_optimized is a function that optimizes single-thread unlocking. If it is a single thread, which means there is no competition, you can directly modify the lock value __lock to 0. If it is multi-threaded, call the lll_unlock method.

static inline void
lll_mutex_unlock_optimized (pthread_mutex_t *mutex)
{
    
    
  /* The single-threaded optimization is only valid for private
     mutexes.  For process-shared mutexes, the mutex could be in a
     shared mapping, so synchronization with another process is needed
     even without any threads.  */
  int private = PTHREAD_MUTEX_PSHARED (mutex);
  if (private == LLL_PRIVATE && SINGLE_THREAD_P)
    mutex->__data.__lock = 0;
  else
    lll_unlock (mutex->__data.__lock, private);
}

lll_unlock is defined insysdeps/nptl/lowlevellock.h. Here, atomic_exchange_rel is called to atomically exchange __futex with 0 and store the original value in __oldval.

If __oldval is less than or equal to 1, it means there is no competition and no need to wake up other threads. If __oldval is greater than 1, it means that other threads need to be awakened, so __lll_lock_wake_private or __lll_lock_wake will be called for thread wake-up.

#define __lll_unlock(futex, private)					\
  ((void)								\
  ({
      
      									\
     int *__futex = (futex);						\
     int __private = (private);						\
     int __oldval = atomic_exchange_rel (__futex, 0);			\
     if (__glibc_unlikely (__oldval > 1))				\
       {
      
      								\
         if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \
           __lll_lock_wake_private (__futex);                           \
         else                                                           \
           __lll_lock_wake (__futex, __private);			\
       }								\
   }))
#define lll_unlock(futex, private)	\
  __lll_unlock (&(futex), private)

__lll_lock_wake_private and __lll_lock_wake are similar methods, both of which call lll_futex_wake.

void
__lll_lock_wake_private (int *futex)
{
    
    
  lll_futex_wake (futex, 1, LLL_PRIVATE);
}
libc_hidden_def (__lll_lock_wake_private)

void
__lll_lock_wake (int *futex, int private)
{
    
    
  lll_futex_wake (futex, 1, private);
}

lll_futex_wake is actually an encapsulation of the sys_futex system call, which will pass in the FUTEX_WAKE parameter to wake up a thread.

/* Wake up up to NR waiters on FUTEXP.  */
# define lll_futex_wake(futexp, nr, private)                             \
  lll_futex_syscall (4, futexp,                                         \
		     __lll_private_flag (FUTEX_WAKE, private), nr, 0)
# define lll_futex_syscall(nargs, futexp, op, ...)                      \
  ({
      
                                                                          \
    long int __ret = INTERNAL_SYSCALL (futex, nargs, futexp, op, 	\
				       __VA_ARGS__);                    \
    (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (__ret))         	\
     ? -INTERNAL_SYSCALL_ERRNO (__ret) : 0);                     	\
  })
#undef internal_syscall4
#define internal_syscall4(number, arg1, arg2, arg3, arg4)		\
({
      
      									\
    unsigned long int resultvar;					\
    TYPEFY (arg4, __arg4) = ARGIFY (arg4);			 	\
    TYPEFY (arg3, __arg3) = ARGIFY (arg3);			 	\
    TYPEFY (arg2, __arg2) = ARGIFY (arg2);			 	\
    TYPEFY (arg1, __arg1) = ARGIFY (arg1);			 	\
    register TYPEFY (arg4, _a4) asm ("r10") = __arg4;			\
    register TYPEFY (arg3, _a3) asm ("rdx") = __arg3;			\
    register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;			\
    register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;			\
    asm volatile (							\
    "syscall\n\t"							\
    : "=a" (resultvar)							\
    : "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4)		\
    : "memory", REGISTERS_CLOBBERED_BY_SYSCALL);			\
    (long int) resultvar;						\
})

Implementation of mutex with different attributes in glibc

The above chapters mainly focus on the implementation of PTHREAD_MUTEX_TIMED_NP. The implementation of mutex with different attributes will be explained below.

PTHREAD_MUTEX_TIMED_NP

PTHREAD_MUTEX_TIMED_NP has been explained in a large chapter above. Let’s review it again. Its behavior is relatively simple. It directly calls LLL_MUTEX_LOCK_OPTIMIZED to compete for the lock.

if (__glibc_likely (type == PTHREAD_MUTEX_TIMED_NP))
{
    
    
    FORCE_ELISION (mutex, goto elision);
simple:
    /* Normal mutex.  */
    LLL_MUTEX_LOCK_OPTIMIZED (mutex);
    assert (mutex->__data.__owner == 0);
}

PTHREAD_MUTEX_RECURSIVE_NP

Reentrant locks mean that the same thread can lock the mutex multiple times. The following is the implementation part of the source code. First, the thread id is obtained, and it is judged whether the thread id of the lock owner is equal to the current thread id. If they are equal, the count value of the lock is increased by 1. If the thread id of the lock owner is not equal to the current thread id, use LLL_MUTEX_LOCK_OPTIMIZED to lock (it is the same as a normal lock here).

  else if (__builtin_expect (PTHREAD_MUTEX_TYPE (mutex)
			     == PTHREAD_MUTEX_RECURSIVE_NP, 1))
    {
    
    
        /* Recursive mutex.  */
        pid_t id = THREAD_GETMEM (THREAD_SELF, tid);

        /* Check whether we already hold the mutex.  */
        if (mutex->__data.__owner == id)
        {
    
    
            /* Just bump the counter.  */
            if (__glibc_unlikely (mutex->__data.__count + 1 == 0))
            /* Overflow of the counter.  */
            return EAGAIN;

            ++mutex->__data.__count;

            return 0;
        }
        LLL_MUTEX_LOCK_OPTIMIZED (mutex);

        assert (mutex->__data.__owner == 0);
        mutex->__data.__count = 1;
    }

PTHREAD_MUTEX_ADAPTIVE_NP

The adaptive lock will spin when the lock is not successful. When the spin exceeds a certain number of times, LLL_MUTEX_LOCK will be called to try to lock. If the lock fails, it will sleep.

  else if (__builtin_expect (PTHREAD_MUTEX_TYPE (mutex)
			  == PTHREAD_MUTEX_ADAPTIVE_NP, 1))
    {
    
    
      if (LLL_MUTEX_TRYLOCK (mutex) != 0)
	{
    
    
	  int cnt = 0;
	  int max_cnt = MIN (max_adaptive_count (),
			     mutex->__data.__spins * 2 + 10);
	  do
	    {
    
    
	      if (cnt++ >= max_cnt)
		{
    
    
		  LLL_MUTEX_LOCK (mutex);
		  break;
		}
	      atomic_spin_nop ();
	    }
	  while (LLL_MUTEX_TRYLOCK (mutex) != 0);

	  mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8;
	}
      assert (mutex->__data.__owner == 0);
    }

PTHREAD_MUTEX_ERRORCHECK_NP

The function of the error-checking lock is to prevent the same thread from locking the same mutex multiple times, causing deadlock. Here, the thread id of the owner of the lock is obtained and compared with the current thread id. If they are equal, an EDEADLK error is returned.

  else
    {
    
    
      pid_t id = THREAD_GETMEM (THREAD_SELF, tid);
      assert (PTHREAD_MUTEX_TYPE (mutex) == PTHREAD_MUTEX_ERRORCHECK_NP);
      /* Check whether we already hold the mutex.  */
      if (__glibc_unlikely (mutex->__data.__owner == id))
	return EDEADLK;
      goto simple;
    }

Summarize

This article analyzes the underlying implementation principle of the mutex lock from the glic source code. Its implementation includes the process from user mode to kernel mode, using CAS technology and sys_futex system call. Mutex locks also have a variety of attributes, forming types such as ordinary locks/reentrant locks/error-checking locks/adaptive locks. Each type has its own characteristics. When developing, you need to select the appropriate attributes based on the scenario.