Actual algorithm (III): Analysis high performance queue data structures and algorithms behind Disruptor

Actual algorithm (III): Analysis high performance queue data structures and algorithms behind Disruptor

Disruptor is a memory message queue is a queue for message passing between threads

How do Disruptor is a high-performance? The underlying data structure depend on what?

"- producer consumer model" based on circular queue

In this producer - consumer model, a producer production data, the data center into a storage container, after the consumer withdraws from the consumption data center storage container

Center storage container which stored data is what the data structure used to achieve?

The most commonly used to achieve a data structure is a queue container storage center, the data support the FIFO queues, the data will produce long been long been consumption. There are two queues implemented idea, one is based on the queue list implemented chain, and the other is based on an order queue array implementation, different needs, choosing a different implementation.

Unbounded implement a queue, the queue size that is uncertain in advance for selection list to achieve, since lists support fast dynamic expansion. If you want to achieve a bounded queue, that queue size is determined in advance, when the queue is full of data, producers need to wait until consumer spending data, then there is space in the queue to put data into the position of producers. Unbounded amount of memory queue occupancy is not controllable, because they may lead to memory continues to grow OOM (Out Of Memory) error.

Acyclic queues in order to add, delete data project, involves data moving operations, result in poor performance, data movement circular queue just to solve the problem, so most of the scenes used in the order queue, the order of selection queue circular queue, circular queue memory is the prototype of a message queue, circular queue achieved by means of a simple "producer - consumer model", using "when the queue is full, the producer will wait in rotation," when the queue is empty after the consumer to wait in rotation

public class Queue {
  private Long[] data;
  private int size = 0, head = 0, tail = 0;
  public Queue(int size) {
    this.data = new Long[size];
    this.size = size;
  }

  public boolean add(Long element) {
    if ((tail + 1) % size == head) return false;
    data[tail] = element;
    tail = (tail + 1) % size;
    return true;
  }

  public Long poll() {
    if (head == tail) return null;
    long ret = data[head];
    head = (head + 1) % size;
    return ret;
  }
}

public class Producer {
  private Queue queue;
  public Producer(Queue queue) {
    this.queue = queue;
  }

  public void produce(Long data) throws InterruptedException {
    while (!queue.add(data)) {
      Thread.sleep(100);
    }
  }
}

public class Consumer {
  private Queue queue;
  public Consumer(Queue queue) {
    this.queue = queue;
  }

  public void comsume() throws InterruptedException {
    while (true) {
      Long data = queue.poll();
      if (data == null) {
        Thread.sleep(100);
      } else {
        // TODO:...消费数据的业务逻辑...
      }
    }
  }
}

"- consumer model producer" lock-based concurrency

If we only write data to a queue producer, a consumer reads the data from the queue, the above code is no problem, if there are multiple producers in concurrent write data queue, the queue of concurrent multiple consumers the consumption data, will not work

  • Data is written multiple producers could cover
  • Consumers may read more duplicate data

Two threads simultaneously added to queue data corresponding to two threads of execution Queue class add () function, assumed that the size of the queue size is 10, the tail current index pointing to 7, head subscript 3 points, i.e., queue there is also a free space, this time a thread calls add () function, add data to a queue value 12, the thread 2 calls add () is a function to add data to the queue 15, the end may be a data added successfully, another data may be overwritten

Queue in add () function:

public boolean add(Long element){
if((tail + 1) % size == head) return false;
data[tail] = element;
tail =(tail + 1) % size;
return true;
}

The third row to the Data [tail] assignment, only to line 4 increments the value of a tail, and processing assignment operation is not a atomic operation, an atomic operation refers not interrupted thread scheduling mechanism operation; once this operation is started it runs to completion, the middle will not have any context switch (to switch to another thread), it will result in: when a thread 1 and thread 2 simultaneously execute add () function, the thread 1 to complete the implementation of the third line statement, the data[7](tail=7)value is set to 12, before the thread 1 has not been performed to the fourth-line statements, i.e. without the tail + 1, before the thread 2 performs the first three would data[7] = 15i.e. thread 2 into the data coverage data thread 1

The easiest way is to lock the code, the same time allowing only one thread executes add () function, we can also reduce the size of the lock with CAS (compare and swap compare and swap) operation,

"- consumer model producer" lock-based concurrency

Disruptor is a queue and change "producer - consumer model," the realization of ideas

Queue supports only two operations, adding and removing data and the read data, the code add () function and poll () function respectively, and using another implementation Disruptor idea, for producers, it is to before adding data queue, free storage unit can be used to apply for and applying a continuous bulk n memory cells, when applying to the set of contiguous memory locations, the subsequent element added to the queue, can not be locked, and since the memory this unit is a thread exclusive, application process of the memory cell is locked required

For consumers, go to apply a number of continuous readable storage unit, the application process is the need to lock, after completion of the subsequent application of a read operation may not lock

Disadvantages are: A if producer to apply a set of contiguous memory locations, the memory cells labeled 3-6, the producer index B is followed by the application storage unit 7-9, it is not completely in 3-6 before writing data, data 7-9 is unreadable

Extended summary

Suppose there are eight tables is used to store user information, this time for each user ID field of the table can not be generated from the growing mode, otherwise it will cause the user ID values ​​between different tables repeated, we need to implement an ID generator generate a unique ID number for all users table, how to design a high-performance, support for concurrent, ID generator capable of generating a globally unique ID it?

Bulk locking generation ID, without using the lock

Published 75 original articles · won praise 9 · views 9161

Guess you like

Origin blog.csdn.net/ywangjiyl/article/details/104893392
Recommended