【Disruptor】

Anatomy of a Disruptor

(1) Concurrency in Disruptor

    Using RingBuffer's lock-free queue implementation, for traditional concurrent queues, at least two pointers, a head pointer and a tail pointer, must be maintained. The maintenance of the head and tail pointers inevitably applies locks during concurrent access and modification. Because the Disruptor is a circular queue, there is only a head pointer for the Producer, and the lock is an optimistic lock. In a standard Disruptor application, there is only one producer, which avoids the contention of the head pointer lock. Therefore, we can understand that Disruptor is a lock-free queue.

    lock-free, do not use locks, use CAS (Compare And Swap/Set) Strictly speaking, locks are still used, because CAS is also an optimistic lock in essence, it is just a CPU-level instruction and does not involve the operating system, so high efficiency.

CAS advantages

CAS depends on the support of the processor, of course most modern processors support it;
CAS is very efficient relative to locks, because it does not need to involve kernel context switching for arbitration;
but CAS is not free, it will involve The pipeline is locked, and the memory barrier is used (used to refresh the memory state, a simple understanding is to synchronize the data in the cache and the register to the memory);

Reference: http://ifeve.com/locks-are-bad/

(2) Cache line filling

    The cache is made up of cache lines, usually 64 bytes, and it effectively refers to a block of addresses in main memory. A Java long type is 8 bytes, so 8 variables of type long can be stored in a cache line.

    Solve the problem: Avoid RingBuffer head and tail pointers in the same cache line. When multi-threaded acquisition, the acquisition of head and tail does not affect each other, avoiding write conflicts.

Why does appending 64 bytes improve the efficiency of concurrent programming ?

    Because for Intel Core i7, Core, Atom and NetBurst, Core Solo and Pentium M processors the L1, L2 or L3 cache cache line is 64 bytes wide and does not support partially filling the cache line, which means that if the queue's If the head and tail nodes are both less than 64 bytes, the processor will read them both into the same cache line. Under multiprocessors, each processor will cache the same head and tail nodes. When a processor tries to modify The entire cache line will be locked at the head node, so under the action of the cache coherence mechanism, other processors will not be able to access the tail node in their own cache, and the queue entry and dequeue operations need to constantly modify the head Nodes and tail nodes, so in the case of multiprocessors, it will seriously affect the efficiency of queue enqueue and dequeue. Doug lea uses appending to 64 bytes to fill up the cache line of the high-speed buffer, avoiding the head and tail nodes to be loaded into the same cache line, so that the head and tail nodes do not lock each other when they are modified.

Should I append to 64 bytes when using variables?

    no. In both scenarios this should not be used. First: Processors whose cache lines are not 64 bytes wide , such as the P6 series and Pentium processors, have L1 and L2 cache lines that are 32 bytes wide. Second: Shared variables are not frequently written . Because the method of appending bytes requires the processor to read more bytes into the high-speed buffer, which itself will bring a certain performance consumption. If the shared variables are not frequently written, the probability of locking is also very small. There is no need to append bytes to avoid locking each other.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325382109&siteId=291194637