Linux kernel: process management - mutual exclusion lock

1. Mutual exclusion lock (mutex)

1.1 What is a mutex

A mutex implements a simple form of "mutual exclusion" synchronization, hence the name. A mutex prevents multiple processes from simultaneously entering a protected "critical section" of code. Therefore, at any one time, only one process is allowed to enter such a code-protected area.

The semantics of mutex are simpler and lighter than that of semaphore. In test scenarios with intense lock contention, mutex executes faster than semaphore and has better scalability. In addition, the definition of mutex data structure is smaller than that of semaphore.

1.2 Characteristics of mutexes

  • A mutex is a synchronization primitive used in the Linux kernel for mutual exclusion operations;
  • A mutex is a dormant lock. During lock contention, there may be sleep and wake-up of the process. The cost of context switching is high, and it is suitable for scenarios with a long locking time;
  • A mutex only allows one process to enter the critical section at a time, which is somewhat similar to a binary semaphore;
  • When the mutex is in lock contention, when the lock is held, it chooses to spin and wait instead of sleeping immediately, which can greatly improve performance. This mechanism (optimistic spinning) is also applied to read and write semaphores;
  • The disadvantage of the mutex is that the structure of the mutex object is larger, which will take up more CPU cache and memory space;
  • Compared with semaphores, mutexes have better performance and scalability, so mutexes are always given priority in the kernel;
  • In order to improve performance, mutex provides three paths for processing: fast path, medium path, and slow path;

1.3 Use of mutexes

Define a mutex:

struct mutex my_mutex;

Initialize the mutex:

mutex_init(&my_mutex);

Or use a macro definition, and initialize the mutex:

DEFINE_MUTEX(my_mutex)

Acquire a mutex:

void mutex_lock(struct mutex *lock);

This function is used to get a mutex, which causes a sleep, so it cannot be used in interrupt context.

int mutex_lock_interruptible(struct mutex *lock);

The function of this function is similar to that of mutex_lock, the difference is that the process of mutex_lock entering the sleep state cannot be interrupted by signals, while the process of mutex_lock_interruptible entering sleep state can be interrupted by signals, and after using this function to enter sleep, the process state is set to TASK_INTERRUPTIBLE , this type of sleep can be interrupted by a signal.

If it returns 0, it means that the mutex is acquired; if it is interrupted by a signal, it returns EINTR.

int mutex_trylock(struct mutex *lock);

mutex_trylock is used to try to obtain a mutex, and will not cause the process to sleep when the mutex is not obtained.

Release the mutex:

void mutex_unlock(struct mutex *lock);

1.4 mutex and semaphore

Mutex is much more efficient than semaphore:

  • Mutex is the first to implement the spin waiting mechanism;
  • The mutex tries to acquire the lock before sleeping;
  • Mutex implements MCS to avoid CPU cache thrashing caused by multiple CPUs competing for locks;

2. MCS lock mechanism

2.1 MCS lock

  • As mentioned above, in the implementation process of mutex, the optimistic spinning spin waiting mechanism is adopted. The core of this mechanism is realized based on the MCS lock mechanism;
  • The MCS lock mechanism was proposed by John Mellor Crummey and Michael Scott in the paper "algorithms for scalable synchronization on shared-memory multiprocessors", and named after them;
  • The problem to be solved by the MCS lock mechanism is: in a multi-CPU system, whenever the value of a spinlock changes, all CPUs trying to acquire the spinlock need to read the memory and refresh their corresponding cache lines, and in the end there is only one CPU The lock can be acquired, and only its refresh is meaningful. The more intense the lock competition (the more CPUs trying to acquire the lock), the greater the unnecessary overhead;
  • The core idea of ​​the MCS lock mechanism: each CPU is assigned a spin lock structure, and the spin lock applicant (per-CPU) spins on the local-CPU variable. These structures form a linked list, and the applicant automatically Spin and wait for the predecessor node to release the lock;
  • osq (optimistci spinning queue) is a specific implementation based on the MCS algorithm and has passed iterative optimization;

2.2 oqs process analysis

Optimistic spinning, optimistic spin, how optimistic is it? When it is found that the lock is held, optimistic spinning believes that the holder can release the lock soon, so it chooses to spin and wait instead of sleep and wait, which can also reduce the overhead caused by process switching.

Take a look at the data structure:

osq_lock is as follows:

There are several situations for osq locking:

  • Atomic operations are used in the locking process to ensure correctness; no one holds the lock, which is the most ideal state, and returns directly;
  • Someone holds the lock and adds the current Node to the OSQ queue. When there is no high-priority task to preempt, it spins and waits for the predecessor node to release the lock;
  • During the spin waiting process, if a high-priority task is preempted, what needs to be done is to remove the current node that was previously added to the OSQ queue from the OSQ queue. The removal process is divided into three steps. They are respectively processing the next pointer pointing to the prev precursor node, pointing to the next pointer pointing to the current node Node, and connecting the prev node to the next successor node;

Atomic operations are used in the locking process to ensure correctness;

osq_unlock is as follows:

There are also several situations when unlocking:

  • If no one is competing for the lock, the lock can be released directly;
  • Get the next node pointed to by the current node, if the next node is not NULL, unlock the next node;
  • The next node of the current node is NULL, then call osq_wait_next to wait for the next node to be acquired, and unlock the next node after the acquisition is successful;

It can be seen from the unlocking situation that this process is equivalent to the transfer of locks, from the previous node to the next node;

In the process of locking and unlocking, since there may be operations to change the osq queue, osq_wait_next is called to obtain the next determined node:

Kernel information through train: Linux kernel source code technology learning route + video tutorial code information

Learning through train: Linux kernel source code/memory tuning/file system/process management/device driver/network protocol stack

3. Mutex lock source code implementation

3.1 mutex

The mutext structure is defined in the include/linux/mutex.h file:

/*
 * Simple, straightforward mutexes with strict semantics:
 *
 * - only one task can hold the mutex at a time
 * - only the owner can unlock the mutex
 * - multiple unlocks are not permitted
 * - recursive locking is not permitted
 * - a mutex object must be initialized via the API
 * - a mutex object must not be initialized via memset or copying
 * - task may not exit with mutex held
 * - memory areas where held locks reside must not be freed
 * - held mutexes must not be reinitialized
 * - mutexes may not be used in hardware or software interrupt
 *   contexts such as tasklets and timers
 *
 * These semantics are fully enforced when DEBUG_MUTEXES is
 * enabled. Furthermore, besides enforcing the above rules, the mutex
 * debugging code also implements a number of additional features
 * that make lock debugging easier and faster:
 *
 * - uses symbolic names of mutexes, whenever they are printed in debug output
 * - point-of-acquire tracking, symbolic lookup of function names
 * - list of all locks held in the system, printout of them
 * - owner tracking
 * - detects self-recursing locks and prints out all relevant info
 * - detects multi-task circular deadlocks and prints out all affected
 *   locks and tasks (and only those tasks)
 */
struct mutex {
        atomic_long_t           owner;
        spinlock_t              wait_lock;
#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
        struct optimistic_spin_queue osq; /* Spinner MCS lock */
#endif
        struct list_head        wait_list;
#ifdef CONFIG_DEBUG_MUTEXES
        void                    *magic;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
        struct lockdep_map      dep_map;
#endif
};

You can see the English notes above:

  • Only one process can hold a mutex at a time;
  • Only the lock holder can perform the unlock operation;
  • Multiple unlock operations are prohibited;
  • Recursive locking operations are prohibited;
  • The mutext structure must be initialized through the API;
  • The mutex structure prohibits initialization through memset or copying;
  • A process holding a mutex may not be able to exit;
  • The memory area where the lock is held cannot be released;
  • The muetxt lock already held is prohibited from being reinitialized;
  • Mutext locks cannot be used in hardware or software interrupt contexts, such as tasklets, timers, etc.;

Then let's introduce several important members of this structure:

  • owner: atomic count. It is used to point to the task struct of the lock holder process, 0 means that the lock is not held by the process;
  • wait_lock: spin lock, used for the protection operation of the wait_list linked list;
  • wait_list: It is a doubly linked list, use this waiting list to save the process that sleeps because the mutex cannot be obtained:;

As can be seen from the above members, the source code implementation of mutext should use atomic operations and spin locks.

When there are multiple processes competing for the mutex, since the mutex is a shared variable, modifications to the member variables of the mutex must be mutually exclusive.

3.2 mutext initialization

There are two ways to initialize the mutex lock, one is to use the DEFINE_MUTEX macro statically:

#define __MUTEX_INITIALIZER(lockname) \
                { .owner = ATOMIC_LONG_INIT(0) \
                , .wait_lock = __SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
                , .wait_list = LIST_HEAD_INIT(lockname.wait_list) \
                __DEBUG_MUTEX_INITIALIZER(lockname) \
                __DEP_MAP_MUTEX_INITIALIZER(lockname) }

#define DEFINE_MUTEX(mutexname) \
        struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)

Here the atomic count owner, the spin lock structure wait_lock, and the wait list wait_list are initialized.

The other is to dynamically use the mutex_init function in the kernel code, defined in the kernel/locking/mutex.c file::

# define mutex_init(mutex) \
do {                            \
    static struct lock_class_key __key;        \
                            \
    __mutex_init((mutex), #mutex, &__key);        \
} while (0)

void
__mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key)
{
    atomic_set(&lock->count, 1);
    spin_lock_init(&lock->wait_lock);
    INIT_LIST_HEAD(&lock->wait_list);
    mutex_clear_owner(lock);
#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
    osq_lock_init(&lock->osq);      //初始化MCS锁
#endif

    debug_mutex_init(lock, name, key);
}

3.2 mutex_lock

The mutext_lock locking process is as follows:

mutex_lock is defined in the kernel/locking/mutex.c file:

/**
 * mutex_lock - acquire the mutex
 * @lock: the mutex to be acquired
 *
 * Lock the mutex exclusively for this task. If the mutex is not
 * available right now, it will sleep until it can get it.
 *
 * The mutex must later on be released by the same task that
 * acquired it. Recursive locking is not allowed. The task
 * may not exit without first unlocking the mutex. Also, kernel
 * memory where the mutex resides must not be freed with
 * the mutex still locked. The mutex must first be initialized
 * (or statically defined) before it can be locked. memset()-ing
 * the mutex to 0 is not allowed.
 *
 * (The CONFIG_DEBUG_MUTEXES .config option turns on debugging
 * checks that will enforce the restrictions and will also do
 * deadlock debugging)
 *
 * This function is similar to (but not equivalent to) down().
 */
void __sched mutex_lock(struct mutex *lock)
{
        might_sleep();

        if (!__mutex_trylock_fast(lock))
                __mutex_lock_slowpath(lock);
}

In order to improve performance, mutex_lock is divided into three paths for processing. The fast and medium paths are preferred for processing. If the conditions are not met, it will jump to the slow path for processing. Sleep and scheduling will be performed in the slow path, so the overhead is also biggest.

3.3 fast-path

The fast path is implemented in __mutex_trylock_fast:

/*
 * Lockdep annotations are contained to the slow paths for simplicity.
 * There is nothing that would stop spreading the lockdep annotations outwards
 * except more code.
 */

/*
 * Optimistic trylock that only works in the uncontended case. Make sure to
 * follow with a __mutex_trylock() before failing.
 */
static __always_inline bool __mutex_trylock_fast(struct mutex *lock)
{
        unsigned long curr = (unsigned long)current;
        unsigned long zero = 0UL;

        if (atomic_long_try_cmpxchg_acquire(&lock->owner, &zero, curr))
                return true;

        return false;
}

Directly call the atomic operation function atomic_long_try_cmpxchg_acquire to make a judgment:

  • If lock->owner is equal to 0, assign curr to lock->owner, identify the curr process holding the lock, and return directly:
  • If lock->owner is not equal to 0, it indicates that the lock is held and needs to enter the next path for processing;

3.4 mid-path

Both the medium and slow paths are implemented in __mutex_lock_common:

static noinline void __sched
__mutex_lock_slowpath(struct mutex *lock)
{
        __mutex_lock(lock, TASK_UNINTERRUPTIBLE, 0, NULL, _RET_IP_);
}
static int __sched
__mutex_lock(struct mutex *lock, long state, unsigned int subclass,
             struct lockdep_map *nest_lock, unsigned long ip)
{
        return __mutex_lock_common(lock, state, subclass, nest_lock, ip, NULL, false);
}

You can see that the final implementation of __mutex_lock_slowpath is in the __mutex_lock_common function:

View Code

This code is too much, I am too lazy to read it, just look at the flowchart analyzed by other bloggers:

When it is found that the holder of the mutex lock is running (another CPU), you can choose to wait instead of sleep scheduling. When the lock holder is running, it is very likely that it will release the lock soon. This is The reason for the optimistic spin;

The condition of spin waiting is that the lock holder is running in the critical section, and spin waiting is valuable;

The __mutex_trylock_or_owner function is used to try to acquire a lock, and if the acquisition fails, returns the lock holder. The owner field in the mutex structure is divided into two parts:

1) The task_struct of the lock holder process (due to L1_CACHE_BYTES alignment, the lower bits are not used);

2) The MUTEX_FLAGS part, which corresponds to the lower three bits, is as follows:

  • MUTEX_FLAG_WAITERS: bit 0, indicating that there is a non-empty waiter linked list, and a wake-up operation needs to be performed when unlocking;
  • MUTEX_FLAG_HANDOFF: Bit 1, indicating that the lock needs to be passed to the top waiter when unlocking;
  • MUTEX_FLAG_PICKUP: bit 2, indicating that the handover preparation of the lock has been completed and can wait to be taken away;

mutex_optimistic_spin is used to perform optimistic spin. Ideally, after the lock holder executes the release, the current process can quickly acquire the lock. In fact, it needs to be considered. If the lock holder is scheduled out in the critical section, task_struct->on_cpu == 0, then the spin waiting needs to be ended, otherwise it would be silly to wait.

  • mutex_can_spin_on_owner: Check before entering the spin, if the current process needs to be scheduled, or the lock holder has been scheduled, then return directly, no need to do the next osq_lock/oqs_unlock work, saving some extra overhead ;
  • osq_lock is used to ensure that only one waiter participates in the spin, preventing a large number of waiters from flocking to acquire the mutex;
  • for(;;) calls __mutex_trylock_or_owner during the spin process to try to acquire the lock, and after obtaining it, everyone is happy, just return directly;
  • mutex_spin_on_owner, judging that the condition of spin waiting is not met, then return, let us enter the slow path, after all, we cannot force it;

3.5 slow-path

The main code flow of the slow path is as follows:

From the process of the for(;;) part, we can see that when the lock is not acquired, it will call schedule_preempt_disabled to switch out its own tasks, sleep and wait, which is also the reason for its slowness;

3.6 mutex_unlock

The mutex_unlock release lock process is as follows:

mutex_unlock is defined in the kernel/locking/mutex.c file:

/**
 * mutex_unlock - release the mutex
 * @lock: the mutex to be released
 *
 * Unlock a mutex that has been locked by this task previously.
 *
 * This function must not be used in interrupt context. Unlocking
 * of a not locked mutex is not allowed.
 *
 * This function is similar to (but not equivalent to) up().
 */
void __sched mutex_unlock(struct mutex *lock)
{
#ifndef CONFIG_DEBUG_LOCK_ALLOC
        if (__mutex_unlock_fast(lock))
                return;
#endif
        __mutex_unlock_slowpath(lock, _RET_IP_);
}

The process of releasing the lock is relatively simple, and it is also divided into fast path and slow path;

The fast path is implemented in __mutex_unlock_fast:

static __always_inline bool __mutex_unlock_fast(struct mutex *lock)
{
        unsigned long curr = (unsigned long)current;

        if (atomic_long_cmpxchg_release(&lock->owner, curr, 0UL) == curr)
                return true;

        return false;
}

Directly call the atomic operation function atomic_long_cmpxchg_release to make a judgment:

  • If lock->owner is equal to curr and the lock holder is the current process, set lock->owner to 0 and return true;
  • If lock->owner is not equal to curr, it indicates that the lock holder is not the current process, and returns false;

The slow path releases the lock, judges and processes three different MUTEX_FLAGs, and finally wakes up the tasks waiting on the lock;

void __sched __mutex_unlock_slowpath(struct mutex *lock, ...)
{
    // 释放mutex,同时获取记录状态的低3个bits
    unsigned long old = atomic_long_cmpxchg_release(&lock->owner, 
                        owner, __owner_flags(owner));
    ...
    spin_lock(&lock->wait_lock);
    if (!list_empty(&lock->wait_list)) {
        // 获取等待队列中的第一个线程
        struct mutex_waiter *waiter = list_first_entry
                                      (&lock->wait_list, struct mutex_waiter, list);
                   
        // 将该线程加入wake_q     
        struct task_struct *next = waiter->task;
        wake_q_add(&wake_q, next);
    }

    spin_unlock(&lock->wait_lock);

    // 唤醒该线程
    wake_up_q(&wake_q);
}

Original Author: Proficient in Linux Kernel

Original address: Linux Kernel: Process Management - Mutual Exclusion Lock - Zhihu (Copyright belongs to the original author, contact to delete infringement message)

Guess you like

Origin blog.csdn.net/m0_74282605/article/details/130162245
Recommended