UMA spin lock in architecture and NUMA architecture (CLH locks and lock MCS)

About Spinlocks

We know that the spin lock is to achieve a synchronization program, it is a non-blocking lock. It is conventional lock the main difference lies in the different treatment after failure to acquire the lock, conventional lock the thread will be blocked and wake it when appropriate. The core mechanism spin lock spin in two words, which uses the spin operation to replace a blocking operation. When a thread attempts to acquire a lock if the lock is already occupied by another thread, then this thread will continue to cycle to check whether the lock is released, rather than let this thread suspend or sleep. Once another thread releases the lock, this thread will be able to obtain the lock. Spin is a busy-wait state, the process would have been time-consuming piece of the CPU.

Spinlocks

UMA Ka构

Because when analyzing the shortcomings of CLH locks and lock MCS processor architecture will involve problems, so before each spin locks introduction we need to understand the two processors architecture: UMA architecture and NUMA architectures. In a multiprocessor system, according to the sharing of memory can be divided into UMA (Uniform Memory Access) and NUMA (Non-uniform Memory Access), i.e. uniform memory access and non-uniform memory access.

Nature of the UMA architecture is the main storage for each CPU core access time is the same. See below the UMA-based bus architecture, a total of four CPU processors, which are directly connected to the bus, to communicate via a bus. You can see from this structure, each CPU makes no difference, they equal access to primary storage. Needed to access main storage time is the same, namely uniform memory access.

UMA Ka构

When a CPU wants to read and write, it first checks the bus is free, and only in the idle state to allow it to communicate with the primary storage, otherwise it will wait until the bus is free. To optimize this problem, the introduction of the cache within each CPU. Thus, CPU read operations can be performed in the local cache. But then we need to consider the problem of data consistency CPU cache and main memory, otherwise it may cause dirty data problem.

UMA Ka构 2

NUMA architecture

In contrast with the UMA architecture, NUMA architecture, not every time the CPU access main memory are the same, NUMA architecture CPU can access all the main storage. It can be seen by the CPU of FIG. If the corresponding local access main memory through a local bus, then access time is short, but if non-local main storage access (remote main memory), the time will be very long, i.e. the CPU accesses the local remote access main memory and the main memory speed is not the same. NUMA architecture advantage is that it has excellent scalability can be implemented as a combination of more than one hundred CPU.

NUMA architecture

CLH 锁

Craig, Landin, Hagersten three people invented CLH lock. The core idea is: by all means certain threads polling competition for a shared variable into a thread queue, and the queue of threads each polling own local variables.

This transformation process has two main points: First, what should be built and how to build queue queue? In order to ensure fairness, we will build a FIFO queue. When constructed to achieve the main line of the queue by moving the tail node tail, each wants to acquire the lock thread creates a new node and by atomic CAS operation a new node is assigned to tail, and then let a current node before a polling thread status bits. FIG queue structure can clearly see the spin operation and, thus successfully constructed queuing thread queue. The second is how to release the queue? After executing the current thread the thread simply the node corresponding to the position of a state to the unlocked state, since the next node has been polled, it is possible to obtain the lock.

CLH 锁

So, CLH lock core idea is to thread a long time to compete in a number of resources, by ordering these threads to convert it to just detect the local variables. The only place where there is competition is in the queue before the tail-to-tail node of the competition, but this time the number of threads of competition has been reduced a lot. Compared to all the threads directly to the number of polls in a competition for resources also reduced a lot, which also saves the consumption of CPU cache synchronization, thereby greatly enhancing system performance.

CLH lock has solved a lot of problems with the synchronization of threads operate simultaneously brought a variable, but it is a precursor object spin nodes. There may be performance issues in NUMA architecture, because if the precursor node and the current node is no longer the same local access main memory, then the time will be very long, which can lead to performance is affected.

MCS locks

MCS locks invented by John Mellor-Crummey Michael Scott and two, it appears aimed at resolving existing problems CLH lock. It is also based on a FIFO queue, the lock CLH similar difference is that the different objects polling. MCS lock thread only to the local variable spin, and its predecessor node is responsible for notifying the end of the spin operation. So reducing unnecessary synchronization operations between the CPU cache and main memory to reduce the performance cost of synchronization.

Below, each thread corresponding to a node in the queue. There is a spin variable within a node, which represents the need for rotation. Once you are finished using the precursor node lock, then modify the spin variable successor node, informing it does not have to continue to do the spin operation has been successful in acquiring the lock.

MCS locks

Focus on artificial intelligence, reading and feelings and talk about mathematics, computer science, distributed, machine learning, deep learning, natural language processing, algorithms and data structures, Java depth, Tomcat kernel and so on.

Guess you like

Origin juejin.im/post/5d8ab21b51882509593fd2a4