The architect who has been in Meituan for 7 years will show you the interpretation of the Disruptor series of concurrency frameworks

Preface

The best way to understand Disruptor is to compare it with something that is easy to understand and has a similar purpose. The reference here is the blocking queue (BlockingQueue) in java.

Similarities and differences with BlockingQueue:
Same: Same purpose, both for transferring data between threads of the same process.
Different: multicast events to consumers; pre-allocated event memory; optional lock-free.

Key concept

  • Ring Buffer: The core of the past. Since 3.0 and above, the ring buffer has only been used as a container for the Disruptor to store and update data (events). For some advanced usage, you can completely replace the container provided by the user.
  • Sequence: Disruptor uses Sequence as a method to determine the location of specific components. Each consumer (EventProcessor) maintains a Sequence, as does the Disruptor itself. Most of the concurrent code since the movement of these Sequence values, so Sequence supports many of the current features of AtomicLong. In fact, the only difference between the two is that Sequence contains additional functions to prevent false sharing of Sequence and other values.
  • Sequencer: The real core of Disruptor. Both implementations of this interface (single producer and multi-producer) implement concurrent algorithms for fast and accurate data transfer between producers and consumers.
  • Sequence Barrier: Generated by the Sequencer, holding the sequencer's main release sequence and the index of the sequence of any independent consumer. It contains logic to determine whether there are events available for consumers to process.
  • Wait Strategy: The waiting strategy determines how a consumer waits for an event released by the producer to the Disruptor.
  • Event: The data unit passed by the producer to the consumer. Custom.
  • EventProcessor: The main loop for processing Disruptor events, with a sequence of consumers. There is a BatchEventProcessor that contains an efficient implementation of an event loop, which will call back the implementation of the EventHandler interface provided by the user when the event is available.
  • EventHandler: User implementation interface, representing a consumer of Disruptor.
  • Producer: The code for the user to call the Disruptor to join the team. There is no code representation in the framework.

Multicast event

This is the biggest behavioral difference between queue and Disruptor. An event in the queue can only be consumed by one consumer, and the time in the Disruptor will be released to all consumers. This is because the Disruptor intends to process independent parallel processing operations of the same data (Annotation: Similar to the topic mode of JMS). For example, the same data in LMAX needs to be logged, copied, and business logic operations. Of course, you can use WorkerPool to process different events in parallel in the Disruptor at the same time. But it should be noted that since this feature is not the primary job of Disruptor, using WorkerPool may not be the most efficient approach.
Looking at the above figure, the three consumers JournalConsumer, ReplicationConsumer and ApplicationConsumer will receive all available messages from Disruptor in the same order. This enables these consumers to work in parallel.

Consumer dependency graph

In order to support real-world applications of concurrent processing, it is necessary to support coordination among consumers. Looking back at the above figure, it is necessary to prevent further work by the business logic consumer before logging and copying the consumer to complete the work. We call this concept gating, and more accurately the superset of this behavior is called gating. Gating occurs in two places: the first is to ensure that producers cannot exceed consumers. This is achieved by calling RingBuffer.addGatingConsumers() to add the relevant consumers to the Disruptor. Second, the previously mentioned situation is achieved by constructing a SequenceBarrier containing a sequence from a component that must first complete its processing.
Looking back at Figure 1, there are three consumers listening to RingBuffer events. In this example, there is a dependency graph. ApplicationConsumer depends on JournalConsumer and ReplicationConsumer. This means that JournalConsumer and ReplicationConsumer can run in parallel with each other. The dependency can be observed from the SequenceBarrier of ApplicationConsumer and the Sequence of JournalConsumer and ReplicationConsumer. At the same time, attention is paid to the relationship between Sequencer and downstream consumers. One of its roles is to ensure that the release does not surround RingBuffer. In order to achieve this, the Sequence of downstream consumers cannot be less than the Sequence of RingBuffer. However, an interesting optimization occurs when using dependency graphs. Since ApplicationConsumer Sequence is guaranteed to be less than or equal to JournalConsumer and ReplicationConsumer (guaranteed by dependencies), Sequencer only needs to observe the Sequence of ApplicationConsumer. Broadly speaking, the Sequencer only needs to focus on the consumer sequence that relies on the leaf nodes of the tree species.

Event pre-allocation

One of the goals of Disruptor is that it can be used in a low-latency environment. In low-latency systems, it is necessary to reduce or eliminate memory allocation. In the Java system, the goal is to reduce the number of pauses caused by garbage collection (in a low-latency C/C++ system, heavy memory allocation may also cause problems due to the expropriation of the memory allocator).
To support this goal, users can pre-allocate the storage of events in the Disruptor. The EventFactory provided by the user will be called when each entry of RingBuffer in the Disruptor is constructed. When publishing new data to the Disruptor, there is an API for users to call to hold the constructed object, so that the object's methods can be called or the object properties can be updated. Under the correct implementation, Disruptor guarantees that these operations are concurrent and safe.

Optional lock-free

Another implementation detail created by the desire for low latency is the extensive use of lock-free algorithms in Disruptor. All memory visibility and correctness guarantees are implemented using memory barriers and/or CAS operations. There is only one scenario where locks are actually used, and that is to use BlockingWatiStrategy. This is done only to use Condition to allow consumer threads to perform park operations before waiting for new events to arrive. Many low-latency systems use busy-wait to avoid the jitter that may be caused by the use of Condition, but the busy-wait operation of some systems can cause a sharp drop in performance, especially when CPU resources are severely constrained. For example, a web server in a virtual environment.

Getting Started

Basic event production and consumption

Start with simple events:

public class LongEvent
{
    private long value;

    public void set(long value)
    {
        this.value = value;
    }
}

In order for the Disruptor to pre-allocate events, we need to provide an EventFactory to complete the construction:

import com.lmax.disruptor.EventFactory;

public class LongEventFactory implements EventFactory<LongEvent>
{
    public LongEvent newInstance()
    {
        return new LongEvent();
    }
}

After the events are defined, consumers need to be created to handle these events. Here is just a simple print:

import com.lmax.disruptor.EventHandler;

public class LongEventHandler implements EventHandler<LongEvent>
{
    public void onEvent(LongEvent event, long sequence, boolean endOfBatch)
    {
        System.out.println("Event: " + event);
    }
}

We also need an event production source. For example, suppose the data comes from some kind of I/O device, such as a network or file byte buffer (ByteBuffer).

import com.lmax.disruptor.RingBuffer;

public class LongEventProducer
{
    private final RingBuffer<LongEvent> ringBuffer;

    public LongEventProducer(RingBuffer<LongEvent> ringBuffer)
    {
        this.ringBuffer = ringBuffer;
    }

    public void onData(ByteBuffer bb)
    {
        long sequence = ringBuffer.next();  // Grab the next sequence
        try
        {
            LongEvent event = ringBuffer.get(sequence); // Get the entry in the Disruptor
                                                        // for the sequence
            event.set(bb.getLong(0));  // Fill with data
        }
        finally
        {
            ringBuffer.publish(sequence);
        }
    }
}

It can be found that the release of events is more relevant than using a simple queue. This is due to the need for event pre-allocation. Event release requires a (minimum) two-stage approach, where the slots in the ring buffer are declared first, and then the available data is released. At the same time, the release process needs to be wrapped with try/finally blocks. If a slot of the ring buffer is declared (by calling RingBuffer.next()) then the sequence must be issued. Failure to do so will result in corruption of the Disruptor. In particular, in the case of multiple producers, this will cause consumer congestion, which can only be solved by restarting.

Use the 3.x version of Translator

Disruptor3.0 provides a rich Lambda-style API, designed to help developers shield the complexity of directly operating RingBuffer, so the better way to publish messages above version 3.0 is through the event publisher (Event Publisher) or event translator ( Event Translator) API. as follows

import com.lmax.disruptor.RingBuffer;
import com.lmax.disruptor.EventTranslatorOneArg;

public class LongEventProducerWithTranslator
{
    private final RingBuffer<LongEvent> ringBuffer;
   
    public LongEventProducerWithTranslator(RingBuffer<LongEvent> ringBuffer)
    {
        this.ringBuffer = ringBuffer;
    }
   
    private static final EventTranslatorOneArg<LongEvent, ByteBuffer> TRANSLATOR =
        new EventTranslatorOneArg<LongEvent, ByteBuffer>()
        {
            public void translateTo(LongEvent event, long sequence, ByteBuffer bb)
            {
                event.set(bb.getLong(0));
            }
        };

    public void onData(ByteBuffer bb)
    {
        ringBuffer.publishEvent(TRANSLATOR, bb);
    }
}

Another advantage of this approach is that the translator code can be placed in a separate class for easier unit testing. Disruptor provides some different interfaces for translators (EventTranslator, EventTranslatorOneArg, EventTranslatorTwoArg, etc.). The reason for this is to allow the translator to be expressed as a static class, or to use a non-capturing lambda expression (when using java8) as a translation method parameter, which is passed by calling the translator on RingBuffer.
The last step is to unify the above steps. These components can be assembled manually, but it is still a bit complicated, so a DSL is introduced to simplify the construction. Although some complex options cannot be used through DSL, this method is suitable for most scenarios.

import com.lmax.disruptor.dsl.Disruptor;
import com.lmax.disruptor.RingBuffer;
import java.nio.ByteBuffer;
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;

public class LongEventMain
{
    public static void main(String[] args) throws Exception
    {
        // Executor that will be used to construct new threads for consumers
        Executor executor = Executors.newCachedThreadPool();

        // The factory for the event
        LongEventFactory factory = new LongEventFactory();

        // Specify the size of the ring buffer, must be power of 2.
        int bufferSize = 1024;

        // Construct the Disruptor
        Disruptor<LongEvent> disruptor = new Disruptor<>(factory, bufferSize, executor);

        // Connect the handler
        disruptor.handleEventsWith(new LongEventHandler());

        // Start the Disruptor, starts all threads running
        disruptor.start();

        // Get the ring buffer from the Disruptor to be used for publishing.
        RingBuffer<LongEvent> ringBuffer = disruptor.getRingBuffer();

        LongEventProducer producer = new LongEventProducer(ringBuffer);

        ByteBuffer bb = ByteBuffer.allocate(8);
        for (long l = 0; true; l++)
        {
            bb.putLong(0, l);
            producer.onData(bb);
            Thread.sleep(1000);
        }
    }
}

Use Java8

One of the design influences of the Disruptor API is that Java 8 will rely on the concept of functional interfaces as the type declaration of Java Lambdas. Most of the interface definitions in the Disruptor API meet the requirements of functional interfaces, so Lambda can be used instead of custom classes, which can reduce the need for repeated code (boiler place).

import com.lmax.disruptor.dsl.Disruptor;
import com.lmax.disruptor.RingBuffer;
import java.nio.ByteBuffer;
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;

public class LongEventMain
{
    public static void main(String[] args) throws Exception
    {
        // Executor that will be used to construct new threads for consumers
        Executor executor = Executors.newCachedThreadPool();

        // Specify the size of the ring buffer, must be power of 2.
        int bufferSize = 1024;

        // Construct the Disruptor
        Disruptor<LongEvent> disruptor = new Disruptor<>(LongEvent::new, bufferSize, executor);

        // Connect the handler
        disruptor.handleEventsWith((event, sequence, endOfBatch) -> System.out.println("Event: " + event));

        // Start the Disruptor, starts all threads running
        disruptor.start();

        // Get the ring buffer from the Disruptor to be used for publishing.
        RingBuffer<LongEvent> ringBuffer = disruptor.getRingBuffer();

        ByteBuffer bb = ByteBuffer.allocate(8);
        for (long l = 0; true; l++)
        {
            bb.putLong(0, l);
            ringBuffer.publishEvent((event, sequence, buffer) -> event.set(buffer.getLong(0)), bb);
            Thread.sleep(1000);
        }
    }
}

Note that some classes (such as handler, translator) are no longer needed. Also notice how the lambda used for publishEvent() refers to the parameters passed in. If you use the following code instead:

ByteBuffer bb = ByteBuffer.allocate(8);
for (long l = 0; true; l++)
{
    bb.putLong(0, l);
    ringBuffer.publishEvent((event, sequence) -> event.set(bb.getLong(0)));
    Thread.sleep(1000);
}

This creates a capturing lambda, which means that an object needs to be instantiated to hold the ByteBuffer bb variable, and the lambda is passed by calling publishEvent(). This will create extra unnecessary garbage, so if you need low GC pressure, you need to pass parameters to the lambda.
Using this method of reference can replace the anonymous lamdba, and it is possible to rewrite this example in this way.

import com.lmax.disruptor.dsl.Disruptor;
import com.lmax.disruptor.RingBuffer;
import java.nio.ByteBuffer;
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;

public class LongEventMain
{
    public static void handleEvent(LongEvent event, long sequence, boolean endOfBatch)
    {
        System.out.println(event);
    }

    public static void translate(LongEvent event, long sequence, ByteBuffer buffer)
    {
        event.set(buffer.getLong(0));
    }

    public static void main(String[] args) throws Exception
    {
        // Executor that will be used to construct new threads for consumers
        Executor executor = Executors.newCachedThreadPool();

        // Specify the size of the ring buffer, must be power of 2.
        int bufferSize = 1024;

        // Construct the Disruptor
        Disruptor<LongEvent> disruptor = new Disruptor<>(LongEvent::new, bufferSize, executor);

        // Connect the handler
        disruptor.handleEventsWith(LongEventMain::handleEvent);

        // Start the Disruptor, starts all threads running
        disruptor.start();

        // Get the ring buffer from the Disruptor to be used for publishing.
        RingBuffer<LongEvent> ringBuffer = disruptor.getRingBuffer();

        ByteBuffer bb = ByteBuffer.allocate(8);
        for (long l = 0; true; l++)
        {
            bb.putLong(0, l);
            ringBuffer.publishEvent(LongEventMain::translate, bb);
            Thread.sleep(1000);
        }
    }
}

Basic tuning options

Using the above method can work well in the widest range of deployment scenarios. However, if you can determine the hardware and software environment in which the Disruptor will run, you can adjust the parameters to improve performance. There are mainly the following two tuning methods: single vs. multiple producers and replacement waiting strategies.

Single vs. multiple producers

One of the best ways to improve the performance of concurrent systems is to follow the Single Writer Principle (Single Writer Principle https://mechanical-sympathy.blogspot.tw/2011/09/single-writer-principle.html, which applies to Disruptor. If you The situation is that only one thread will publish an event in the Disruptor, then this function can be used to obtain additional performance improvements.

public class LongEventMain
{
    public static void main(String[] args) throws Exception
    {
        //.....
        // Construct the Disruptor with a SingleProducerSequencer
        Disruptor<LongEvent> disruptor = new Disruptor(
            factory, bufferSize, ProducerType.SINGLE, new BlockingWaitStrategy(), executor);
        //.....
    }
}

OneToOne performance test (https://github.com/LMAX-Exchange/disruptor/blob/master/src/perftest/java/com/lmax/disruptor/sequenced/OneToOneSequencedThroughputTest.java) can show how much performance this technology can improve. The following test uses i7 Sandy Bridge MacBook Air.
Multi-producer

Run 0, Disruptor=26,553,372 ops/sec
Run 1, Disruptor=28,727,377 ops/sec
Run 2, Disruptor=29,806,259 ops/sec
Run 3, Disruptor=29,717,682 ops/sec
Run 4, Disruptor=28,818,443 ops/sec
Run 5, Disruptor=29,103,608 ops/sec
Run 6, Disruptor=29,239,766 ops/sec

Single producer

Run 0, Disruptor=89,365,504 ops/sec
Run 1, Disruptor=77,579,519 ops/sec
Run 2, Disruptor=78,678,206 ops/sec
Run 3, Disruptor=80,840,743 ops/sec
Run 4, Disruptor=81,037,277 ops/sec
Run 5, Disruptor=81,168,831 ops/sec
Run 6, Disruptor=81,699,346 ops/sec

Replacement waiting strategy

BlockingWaitStategy

The default waiting strategy of Disruptor is BlockingWaitStategy. A typical lock and condition variable is used inside BlockingWaitStategy to handle thread wakeup. BlockingWaitStategy is the slowest waiting strategy available, but it is also the most conservative in terms of CPU usage, and will also provide the most consistent behavior in the widest range of deployment options. However, again, understanding the deployment system can gain additional performance gains.

SleepingWaitStrategy

Similar to BlockingWaitStategy, SleepingWaitStrategy also tries to be conservative in CPU usage. This is achieved through a busy wait loop, but LockSupport.parkNanos(1) is called in the middle of the loop. In a typical Linux system, this will suspend the thread for about 60µs (annotation 1µs=1000ns). But its advantage is that the producer thread does not need to take any action besides increasing the counter of the response, and does not need to signal the cost of the condition variable (cost of signalling a condition variable). However, the average delay between producer and consumer transfer events will increase. This method works best when low latency is not required, but the impact on the producer thread is minimal. A common use case is asynchronous logging.

YieldingWaitStrategy

One of two waiting strategies that can be used in low-latency systems. This strategy optimizes latency by consuming CPU clock cycles. This strategy uses busy spin to wait for the correct sequence number to arrive. Inside the loop, Thread.yield() will be called to allow other queued threads to run. This strategy is recommended when high performance is required and the number of threads of the event handler EventHandler is less than the number of logical cores of the CPU (such as when using hyperthreading).

BusySpinWaitStrategy

This strategy has the highest performance, but also the highest deployment border restrictions. This waiting strategy should only be used for event handler threads less than the number of physical cores of the CPU.

Clear objects in the ring buffer

When using Disruptor to transmit data, the life cycle of the object may be longer than expected. In order to avoid this situation, it is necessary to clean up after the event is processed. If there is an event handler, cleaning up in this event handler is sufficient. If there is an event processing chain, then a specific handler may be needed at the end of the chain to clean up the object.

class ObjectEvent<T>
{
    T val;

    void clear()
    {
        val = null;
    }
}

public class ClearingEventHandler<T> implements EventHandler<ObjectEvent<T>>
{
    public void onEvent(ObjectEvent<T> event, long sequence, boolean endOfBatch)
    {
        // Failing to call clear here will result in the
        // object associated with the event to live until
        // it is overwritten once the ring buffer has wrapped
        // around to the beginning.
        event.clear();
    }
}

public static void main(String[] args)
{
    Disruptor<ObjectEvent<String>> disruptor = new Disruptor<>(
        () -> ObjectEvent<String>(), bufferSize, executor);

    disruptor
        .handleEventsWith(new ProcessingEventHandler())
        .then(new ClearingObjectHandler());
}

 

This concludes the article!

The last benefit from the editor

The following is a collection of interview materials compiled by the editor of a real question from a major factory, as well as a collection of materials compiled by the latest Java core technology in 2020. Friends who need to receive can click me to receive it for free . The world of programming is always open to all those who love programming. It is a free, equal, and shared world, and I have always believed in this way.  

Part of the profile picture:

 

If you like the editor’s sharing, you can like and follow it. The editor continues to share the latest articles and benefits

Guess you like

Origin blog.csdn.net/QLCZ0809/article/details/111488855