Knowledge -CopyOnWrite thought very hard-core technology

"Today's talk a very hard-core technical knowledge, give you an analysis of what is thought CopyOnWrite, and embodied in Java and contracting, including in Kafka kernel source code is how to use this idea to optimize the performance of concurrency.

This CopyOnWrite during the interview, the interviewer is likely to be a killer blow to kill the candidate, is also likely to be unique Cheats candidate to win the Offer, is a relatively high level of knowledge.

1, the lower reading and writing problems caused little scene?

We can imagine now that we have a memory of ArrayList, the ArrayList default must be thread-safe, if multiple threads concurrently read and write ArrayList this may be a problem.

Well, the question is, how should we let this become ArrayList thread safe?

There is a very simple way, access to the ArrayList are plus thread control synchronization.

For example, must be in synchronized ArrayList access to this section of the code, this is the case, you can at the same time let a thread to operate it, or read-write lock is ReadWriteLock ways to control, you can.

We assume that a read-write lock ReadWriteLock way to control access to the ArrayList.

Thus a plurality of read requests can be performed simultaneously in the read data from ArrayList, but between the read and write requests exclusive, write and write requests are mutually exclusive.

We can see that the code is probably something like this:

public Object  read() {
   lock.readLock().lock();
   // 对ArrayList读取
   lock.readLock().unlock();
}

public void write() {
   lock.writeLock().lock();
   // 对ArrayList写
   lock.writeLock().unlock();
}
复制代码

Think about it, similar to the above What's wrong with it?

The biggest problem, in fact, is that write locks and read locks are mutually exclusive. Assuming that the write operation is very low frequency, high frequency read, write less reading, more scenes.

So occasionally performs a write operation when it is not it will add a write lock, this time over a large number of read operations will be blocked is not live, can not perform?

This is the biggest problem of read-write locks may encounter.

2, the introduction CopyOnWrite ideas to solve the problem

This time is necessary to introduce CopyOnWrite ideas to solve the problem.

His idea is, do not add any read-write locks, lock all to get rid of me, a lock is a problem, there is a lock mutex lock may result in poor performance, you blocked my request, cause my requests the card can not be executed.

So how could he guarantee the safety of concurrent multi-threading it?

Very simple, as the name suggests, the use of "CopyOnWrite" the way, the English translated into Chinese, probably ** "when writing data using a copy of a copy to perform." **

When you read the data, in fact, does not lock does not matter, we are about a After reading, did not influence each other.

The main problem is in writing, the time to write since you can not lock up, then you have to adopt a strategy.

If you say that your underlying ArrayList is an array to store your data list, then when for example you want to modify this array of data, you must first copy of a copy of the array.

Then you can write your data to be modified in a copy of the array's, but in the process you are actually in operation a copy of it.

In this case, the read operation is not possible while the normal execution? The write operation to the read operation is not affected by any of it!

We see the figure below, together understand what this process:

The key question is, now that the writer thread to modify the copy of the array is over, now how to make the reader thread to perceive this change?

The key point here, draw the focus! Here we must cooperate on the use of the volatile keyword.

After I wrote the previous article, we explained to the use of the volatile keyword, the core is to make a thread variable is written to changes immediately so that other threads can read the most recent value of this variable references, this is the most volatile central role .

So once wrote thread to get a modified copy of the array, it can be written with a volatile manner, to assign this array to copy the modified volatile reference to an array of variables.

As soon assigned to the volatile variables, will immediately be visible to the reader thread, we can see the latest array.

Here is the JDK CopyOnWriteArrayList source.

We look at writing data, he is how to copy a copy of the array, and then modify the copy, then assigned by volatile variable manner, to update the edited copy of the array to go back, let other threads immediately visible.

// 这个数组是核心的,因为用volatile修饰了
  // 只要把最新的数组对他赋值,其他线程立马可以看到最新的数组
  private transient volatile Object[] array;

  public boolean add(E e) {

      final ReentrantLock lock = this.lock;
      lock.lock();

      try {
          Object[] elements = getArray();
          int len = elements.length;

          // 对数组拷贝一个副本出来
          Object[] newElements = Arrays.copyOf(elements, len + 1);

          // 对副本数组进行修改,比如在里面加入一个元素
          newElements[len] = e;

          // 然后把副本数组赋值给volatile修饰的变量
          setArray(newElements);
          return true;


      } finally {
          lock.unlock();
      }
  }
复制代码

Then I thought, because it is to be updated by a copy, in case if multiple threads must also update it? That engage in out multiple copies will be any problems?

Of course, not multiple threads simultaneously update, and this time is to see the source above, the addition of a mechanism to lock the lock, that is, at the same time only one thread can update.

Then the update, the read operation will have any effect on you?

Absolutely not, because the read operation is very simple reading of the array only, does not involve any locks. And as long as he updated to volatile modified variable assignment, then read the thread can be seen immediately after the array of the latest changes, which is volatile guaranteed.

This perfect solution to read how much to write questions we said before.

If the mutex lock used to read and write, then write lock can cause a lot of blocking reads, affect concurrent performance.

But if the CopyOnWriteArrayList, is to use space for time, based on an updated copy of the updated time, to avoid the lock, and then finally be assigned to ensure volatile variable visibility, when the update does not have any effect on the reader thread!

3, the use of CopyOnWrite source of ideas in Kafka

In the kernel source in Kafka, there is such a scenario, the client at the time of writing data to Kafka, the client would write a message to the local memory buffer, and then forming a disposable Batch after retransmission buffer in memory to Kafka server up, which helps to enhance the throughput.

Ado, we look:

This time Kafka buffer memory with what data structure? We look at the source code:

private final ConcurrentMap<topicpartition, deque<="" span="">

         batches = new CopyOnWriteMap<TopicPartition, Deque>();
复制代码

This data structure is the core of the data structures used to store the message written in the buffer memory, to understand the data structures required for many Kafka kernel source in concept to explain, here to not start.

However, the concern that he is himself a CopyOnWriteMap realized, this CopyOnWriteMap use is the CopyOnWrite thought.

Let's look at the source code to achieve this CopyOnWriteMap:

  // 典型的volatile修饰普通Map
  private volatile Mapmap;

  @Override
  public synchronized V put(K k, V v) {

      // 更新的时候先创建副本,更新副本,然后对volatile变量赋值写回去
      Mapcopy= new HashMap(this.map);
      V prev = copy.put(k, v);
      this.map = Collections.unmodifiableMap(copy);
      return prev;
  }

  @Override
  public V get(Object k) {

      // 读取的时候直接读volatile变量引用的map数据结构,无需锁
      return map.get(k);

  }
复制代码

So Kafka The core data structure has been adopted here CopyOnWriteMap ideas to achieve, because the Map of key-value pairs, in fact, not so frequent updates.

That is TopicPartition-Deque this key-value pairs, update frequency is very low.

But his get operation is a read request frequency, the frequency will be read out as a data structure TopicPartition Deque corresponding to enqueue dequeue this queue and other operations, so for this map, the high frequency is the get operation.

This time, Kafka on the use of CopyOnWrite idea to achieve this Map, avoid updating key-value when staying high frequency blocking read operation, to achieve the effect of lock-free, concurrent threads to optimize performance.

I believe you read this article, as well as ideas for CopyOnWrite application scenarios, including the realization of the JDK, as well as the source code used in Kafka, there is the experience of a personal up.

If you can make it clear that his thoughts and reflected in the JDK, and also in conjunction with the underlying source of the well-known open source projects Kafka to further elaborate the interviewer, the interviewer the impression that you are certainly a big plus in the interview.

Original Self: Huperzine architecture notes micro-channel public number

Guess you like

Origin juejin.im/post/5cf62428f265da1b6d401199