Principle analysis of HikariCP database connection pool in SpringBoot 2.0

As a background service development, we are dealing with the database every day in our daily work. We have been performing various CRUD operations, and we will use the database connection pool. According to the development history, there are the following types of well-known database connection pools in the industry: c3p0, DBCP, Tomcat JDBC Connection Pool, Druid, etc., but the most popular one is HiKariCP recently.

HiKariCP is known as the fastest database connection pool in the industry. Since SpringBoot 2.0 adopted it as the default database connection pool, its development momentum has been unstoppable. Why is it so fast? Today we will focus on the reasons.

One, what is a database connection pool

Before explaining HiKariCP, let's briefly introduce what is a database connection pool (Database Connection Pooling), and why there is a database connection pool.

Fundamentally speaking, the database connection pool, like our commonly used thread pool, is a pooled resource. It creates a certain number of database connection objects and stores them in a memory area when the program is initialized. It allows the application to reuse an existing database connection. When SQL needs to be executed, we obtain a connection directly from the connection pool instead of re-establishing a database connection. When the SQL is executed, the database connection is not true. Turn it off, but return it to the database connection pool. We can control the initial number of connections, minimum connection, maximum connection, maximum idle time and other parameters in the connection pool by configuring the parameters of the connection pool to ensure that the number of access to the database is within a certain controllable range, preventing system crashes, and ensuring Good user experience. The database connection pool diagram is as follows:

Principle analysis of HikariCP database connection pool in SpringBoot 2.0

Therefore, the core function of using the database connection pool is to avoid frequent creation and destruction of database connections and save system overhead. Because database connections are limited and expensive, creating and releasing database connections are very time-consuming. Frequent such operations will take up a lot of performance overhead, which will result in a slowdown in website response speed and even server crashes.

2. Comparative analysis of common database connection pools

Here is a detailed summary of the comparison of various functions of common database connection pools. We focus on the analysis of the current mainstream Alibaba Druid and HikariCP. HikariCP is completely superior in performance to the Druid connection pool. The performance of Druid is slightly worse due to the different lock mechanism, and Druid provides richer functions, including monitoring, SQL interception and parsing. The focus of the two is different. HikariCP pursues the ultimate high performance.

Principle analysis of HikariCP database connection pool in SpringBoot 2.0

The following is the performance comparison chart provided by the official website. In terms of performance, the order of the five database connection pools is as follows: HikariCP>druid>tomcat-jdbc>dbcp>c3p0:

Principle analysis of HikariCP database connection pool in SpringBoot 2.0

3. Introduction to HikariCP database connection pool

HikariCP claims to be the best database connection pool in history, and SpringBoot 2.0 sets it as the default data source connection pool. Compared with other connection pools, the performance of Hikari is much higher. So, how is this done? By viewing the introduction of HikariCP official website, the optimization of HikariCP is summarized as follows:

1. Bytecode streamlining: optimized code, the amount of bytecode after compilation is very small, so that the CPU cache can load more program code;

HikariCP has also made great efforts to optimize and streamline the bytecode, using a third-party Java bytecode to modify the class library Javassist to generate delegated dynamic proxy. The implementation of dynamic proxy is in the ProxyFactory class, which is faster, compared to JDK Proxy Less bytecode is generated, and a lot of unnecessary bytecode is streamlined.

2. Optimize the proxy and interceptor: reduce the code, for example, HikariCP's Statement proxy has only 100 lines of code, only one-tenth of BoneCP;

3. Custom array type (FastStatementList) instead of ArrayList: avoid range check every time get() of ArrayList, avoid scanning from beginning to end when calling remove() (because the connection characteristic is that the connection is released after the connection is obtained) ;

4. Custom collection type (ConcurrentBag): improve the efficiency of concurrent read and write;

5. Other optimizations for BoneCP defects , such as the study of method calls that take more than one CPU time slice.

Of course, as a database connection pool, it cannot be said that it will be respected by consumers soon. It also has very good robustness and stability. Since its launch in 15 years, HikariCP has withstood the test of the broad application market and has been successfully promoted by SpringBoot2.0 as the default database connection pool. It is trustworthy in terms of reliability. Secondly, with its small amount of code, small amount of cpu and memory, its execution rate is very high. Finally, there is basically no difference between Spring configuration HikariCP and druid, and migration is very convenient. These are the reasons why HikariCP is so popular at present.

Streamlined bytecode, optimized proxy and interceptor, custom array type.

Four, HikariCP core source code analysis

4.1 How FastList optimizes performance issues

 First, let's take a look at the steps to perform database operation standardization:

  1. Obtain a database connection through the data source;

  2. Create Statement;

  3. Execute SQL;

  4. Get SQL execution results through ResultSet;

  5. Release the ResultSet;

  6. Release Statement;

  7. Release the database connection.

All database connection pools currently perform database operations strictly according to this order. In order to prevent the final release operation, various database connection pools will save the created Statement in the array ArrayList to ensure that when the connection is closed, you can Turn off all Statements in the array in turn. In processing this step, HiKariCP believes that there is room for optimization in some method operations of ArrayList, so the simplified implementation of the List interface is optimized for several core methods in the List interface, and the other parts are basically the same as ArrayList.

The first is the get() method. Each time ArrayList calls the get() method, a rangeCheck will be performed to check whether the index is out of range. This check is removed in the implementation of FastList because the database connection pool meets the validity of the index and can guarantee that it will not cross the range At this time, rangeCheck is an invalid calculation overhead, so there is no need to check out of bounds every time. Eliminating frequent invalid operations can significantly reduce performance consumption.

  • FastList get() operation
public T get(int index)
{
   // ArrayList 在此多了范围检测 rangeCheck(index);
   return elementData[index];
}

The second is the remove method. When creating a Statement through conn.createStatement(), you need to call the add() method of ArrayList to add it to the ArrayList. This is no problem; but when closing the Statement through stmt.close(), you need to Call the remove() method of ArrayList to remove it from the ArrayList, while the remove(Object) method of ArrayList traverses the array from the beginning, while FastList traverses from the end of the array, so it is more efficient. Suppose a Connection creates 6 Statements in sequence, namely S1, S2, S3, S4, S5, S6, and the order of closing Statements is generally reversed, from S6 to S1, while the remove(Object o) method of ArrayList is the order Traverse search, reverse deletion and sequential search, the search efficiency is too slow. Therefore, FastList optimizes it and changes it to reverse search. The following code is the data removal operation implemented by FastList. Compared with the remove() code of ArrayList, FastList removes the check range and the steps of traversing check elements from beginning to end, and its performance is faster.

Principle analysis of HikariCP database connection pool in SpringBoot 2.0

  • FastList delete operation
public boolean remove(Object element)
{
   // 删除操作使用逆序查找
   for (int index = size - 1; index >= 0; index--) {
      if (element == elementData[index]) {
         final int numMoved = size - index - 1;
         // 如果角标不是最后一个,复制一个新的数组结构
         if (numMoved > 0) {
            System.arraycopy(elementData, index + 1, elementData, index, numMoved);
         }
         //如果角标是最后面的 直接初始化为null
         elementData[--size] = null;
         return true;
      }
   }
   return false;
}

Through the above source code analysis, the optimization points of FastList are still very simple. Compared with ArrayList, only minor adjustments such as rack checking, expansion optimization, etc. are removed. When deleting, the array is traversed to find elements and other minor adjustments, so as to pursue the ultimate performance. Of course, FastList's optimization of ArrayList, we cannot say that ArrayList is not good. The so-called positioning is different and the pursuit is different. As a general container, ArrayList is more secure and stable. It checks the rangeCheck before operation and throws exceptions directly to illegal requests, which is more in line with the fail-fast mechanism, while FastList pursues the ultimate performance.

Let's talk about ConcurrentBag, another data structure in HiKariCP, and see how it improves performance.

4.2 ConcurrentBag realization principle analysis

The current mainstream database connection pool implementation methods are mostly implemented with two blocking queues. One is used to store idle database connections in the queue, and the other is used to store busy database connections in the queue busy; when a connection is obtained, the idle database connection is moved from the idle queue to the busy queue, and when the connection is closed, the database connection is moved from busy to idle. This scheme delegates the concurrency problem to the blocking queue, which is simple to implement, but the performance is not very satisfactory. Because the blocking queue in the Java SDK is implemented with locks, and lock contention in high concurrency scenarios has a great impact on performance.

HiKariCP does not use the blocking queue in the Java SDK, but instead implements a concurrent container called ConcurrentBag by itself, which has better performance than LinkedBlockingQueue and LinkedTransferQueue in the implementation of connection pool (multi-threaded data interaction).

There are 4 most critical attributes in ConcurrentBag, namely: the shared queue sharedList used to store all database connections, the thread local storage threadList, the number of threads waiting for database connections, waiters, and the tool handoffQueue for allocating database connections. Among them, handoffQueue uses the SynchronousQueue provided by the Java SDK, and SynchronousQueue is mainly used to transfer data between threads.

  • Key attributes in ConcurrentBag
// 存放共享元素,用于存储所有的数据库连接
private final CopyOnWriteArrayList<T> sharedList;
// 在 ThreadLocal 缓存线程本地的数据库连接,避免线程争用
private final ThreadLocal<List<Object>> threadList;
// 等待数据库连接的线程数
private final AtomicInteger waiters;
// 接力队列,用来分配数据库连接
private final SynchronousQueue<T> handoffQueue;

ConcurrentBag guarantees that all resources can only be added through the add() method. When the thread pool creates a database connection, it is added to ConcurrentBag by calling the add() method of ConcurrentBag, and removed through the remove() method. The following is the specific implementation of the add() method and remove() method. When adding, the connection is added to the shared queue sharedList. If there is a thread waiting for the database connection at this time, then the connection is allocated to the waiting through the handoffQueue Thread.

  • ConcurrentBag's add() and remove() methods
public void add(final T bagEntry)
{
   if (closed) {
      LOGGER.info("ConcurrentBag has been closed, ignoring add()");
      throw new IllegalStateException("ConcurrentBag has been closed, ignoring add()");
   }
   // 新添加的资源优先放入sharedList
   sharedList.add(bagEntry);

   // 当有等待资源的线程时,将资源交到等待线程 handoffQueue 后才返回
   while (waiters.get() > 0 && bagEntry.getState() == STATE_NOT_IN_USE && !handoffQueue.offer(bagEntry)) {
      yield();
   }
}
public boolean remove(final T bagEntry)
{
   // 如果资源正在使用且无法进行状态切换,则返回失败
   if (!bagEntry.compareAndSet(STATE_IN_USE, STATE_REMOVED) && !bagEntry.compareAndSet(STATE_RESERVED, STATE_REMOVED) && !closed) {
      LOGGER.warn("Attempt to remove an object from the bag that was not borrowed or reserved: {}", bagEntry);
      return false;
   }
   // 从sharedList中移出
   final boolean removed = sharedList.remove(bagEntry);
   if (!removed && !closed) {
      LOGGER.warn("Attempt to remove an object from the bag that does not exist: {}", bagEntry);
   }
   return removed;
}

At the same time, ConcurrentBag obtains an idle database connection through the borrow() method provided, and recovers resources through the return() method. The main logic of borrow() is:

  1. Check whether there is an idle connection in the thread local storage threadList, if so, return an idle connection;
  2. If there is no idle connection in the thread local storage, get it from the shared queue sharedList;
  3. If there are no free connections in the shared queue, the requesting thread needs to wait.
  • Borrow() and requite() methods of ConcurrentBag
// 该方法会从连接池中获取连接, 如果没有连接可用, 会一直等待timeout超时
public T borrow(long timeout, final TimeUnit timeUnit) throws InterruptedException
{
   // 首先查看线程本地资源threadList是否有空闲连接
   final List<Object> list = threadList.get();
   // 从后往前反向遍历是有好处的, 因为最后一次使用的连接, 空闲的可能性比较大, 之前的连接可能会被其他线程提前借走了
   for (int i = list.size() - 1; i >= 0; i--) {
      final Object entry = list.remove(i);
      @SuppressWarnings("unchecked")
      final T bagEntry = weakThreadLocals ? ((WeakReference<T>) entry).get() : (T) entry;
      // 线程本地存储中的连接也可以被窃取, 所以需要用CAS方法防止重复分配
      if (bagEntry != null && bagEntry.compareAndSet(STATE_NOT_IN_USE, STATE_IN_USE)) {
         return bagEntry;
      }
   }
   // 当无可用本地化资源时,遍历全部资源,查看可用资源,并用CAS方法防止资源被重复分配
   final int waiting = waiters.incrementAndGet();
   try {
      for (T bagEntry : sharedList) {
         if (bagEntry.compareAndSet(STATE_NOT_IN_USE, STATE_IN_USE)) {
            // 因为可能“抢走”了其他线程的资源,因此提醒包裹进行资源添加
            if (waiting > 1) {
               listener.addBagItem(waiting - 1);
            }
            return bagEntry;
         }
      }

      listener.addBagItem(waiting);
      timeout = timeUnit.toNanos(timeout);
      do {
         final long start = currentTime();
         // 当现有全部资源都在使用中时,等待一个被释放的资源或者一个新资源
         final T bagEntry = handoffQueue.poll(timeout, NANOSECONDS);
         if (bagEntry == null || bagEntry.compareAndSet(STATE_NOT_IN_USE, STATE_IN_USE)) {
            return bagEntry;
         }
         timeout -= elapsedNanos(start);
      } while (timeout > 10_000);
      return null;
   }
   finally {
      waiters.decrementAndGet();
   }
}

public void requite(final T bagEntry)
{
   // 将资源状态转为未在使用
   bagEntry.setState(STATE_NOT_IN_USE);
   // 判断是否存在等待线程,若存在,则直接转手资源
   for (int i = 0; waiters.get() > 0; i++) {
      if (bagEntry.getState() != STATE_NOT_IN_USE || handoffQueue.offer(bagEntry)) {
         return;
      }
      else if ((i & 0xff) == 0xff) {
         parkNanos(MICROSECONDS.toNanos(10));
      }
      else {
         yield();
      }
   }
   // 否则,进行资源本地化处理
   final List<Object> threadLocalList = threadList.get();
   if (threadLocalList.size() < 50) {
      threadLocalList.add(weakThreadLocals ? new WeakReference<>(bagEntry) : bagEntry);
   }
}

The borrow() method can be said to be the most core method in the entire HikariCP. It is the method we will eventually call when we get a connection from the connection pool. It should be noted that the borrow() method only provides an object reference and does not remove the object. Therefore, it must be replaced by the return() method when using it, otherwise it will easily cause memory leaks. The requite() method first changes the database connection status to unused, and then checks whether there is a waiting thread, and if there is, it is assigned to the waiting thread; otherwise, the database connection is saved in the thread local storage.

The ConcurrentBag implementation uses the queue-stealing mechanism to obtain elements: first try to obtain elements belonging to the current thread from ThreadLocal to avoid lock competition, and if there are no available elements, obtain them again from the shared CopyOnWriteArrayList. In addition, ThreadLocal and CopyOnWriteArrayList are both member variables in ConcurrentBag, and they are not shared between threads, which avoids false sharing. At the same time, because the connection in the thread local storage can be stolen by other threads, and the idle connection is obtained in the shared queue, the CAS method is needed to prevent duplicate allocation. 

Five, summary

As the default connection pool of SpringBoot2.0, Hikari is currently widely used in the industry. For most businesses, it can be quickly accessed and used to achieve efficient connections.

Reference

  1. https://github.com/brettwooldridge/HikariCP

  2. https://github.com/alibaba/druid

Author: vivo Game Technology team

Guess you like

Origin blog.51cto.com/14291117/2606509