Application use Druid connection pool frequent disconnection problem analysis

Some time ago, an application that used the Druid connection pool often reported broken links and reported errors. The whole troubleshooting and analysis process is very interesting. Here we analyze the configuration of the Druid connection pool, database layer, and load balancing layer, record the analysis process of the entire problem, and sort out the configuration of the Druid connection pool and the connection keep-alive and recycling mechanism.


1. Problem background

The application applies for a connection through the database connection pool, and then connects to the database agent through load balancing and then accesses the database. This is a typical architecture, as shown in the following figure:

insert image description here

However, after the system goes online, the application always has sporadic disconnection errors, and the following error messages often appear:

discard connection
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 72,557 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.

According to the preliminary judgment of the error log , the link with the DB must have been disconnected. This error occurs only when trying to use a disconnected link. However, according to Druid's connection checking function, such a problem should not occur. Next, learn about the basic configuration of the Druid connection pool and the connection keep alive and recycling mechanism.

2. Druid connection pool

2.1 Druid connection overview

Druid is an open source database connection pool . It combines the advantages of C3P0, DBCP, Proxool and other DB pools. At the same time, it adds log monitoring, which can monitor the DB pool connection and SQL execution well.

insert image description here

  • In druidDataSource, there is a reentrant lock and two derived conditions: one monitors whether the connection pool is empty, and one monitors whether the connection pool is not empty.
  • There are two threads in druidDataSource, one generates connection CreateConnectionThread, and one recycles connection DestoryConnectionThread . These locks and conditions are used when creating, acquiring, and recycling.
  • Init is called every time the Connection is obtained, and inited is used internally to identify whether the DataSource has been initialized OK.
  • Every time a Connection is acquired, it needs to be locked to ensure thread safety, and all operations are executed after locking.
  • If there is no connection in the connection pool, call empty.signal() to notify CreateThread to create a connection, wait for the specified time, and check whether there is an available connection after being woken up.
2.2 Druid parameter configuration instructions

1) Basic properties

  • name: The significance of configuring this attribute is that if there are multiple data sources, they can be distinguished by name during monitoring. If not configured, a name will be generated in the format: "DataSource-" + System.identityHashCode(this).
  • url: The url to connect to the database, different databases are different. For example: mysql: jdbc:mysql://10.20.153.104:3306/druid2, oracle: jdbc:oracle:thin:@10.20.149.85:1521:ocnauto
  • username: username to connect to the database
  • password: the password to connect to the database
  • driverClassName: This item can be configured or not. If not configured, druid will automatically identify the dbType according to the url, and then select the corresponding driverClassName

2) Connection pool size

  • initialSize: The number of physical connections established during initialization. Initialization occurs when the init method is called explicitly, or when getConnection is first called. The default value is 0
  • maxActive : The maximum number of connection pools. The default value is 8
  • minIdle : The minimum number of connection pools. The default value is 0
  • maxWait: The maximum waiting time when obtaining a connection, in milliseconds. After maxWait is configured, the fair lock is enabled by default, and the concurrency efficiency will decrease. If necessary, you can use the unfair lock by configuring the useUnfairLock property to true. The default value is -1

3) Connection detection

  • testOnBorrow: Execute validationQuery to check whether the connection is valid when applying for a connection. Doing this configuration will reduce performance. The default value is true
  • testOnReturn: Execute validationQuery to check whether the connection is valid when returning the connection. Doing this configuration will reduce performance. The default value is false
  • testWhileIdle : It is recommended to configure it as true, which will not affect performance and ensure security. Check when applying for a connection. If the idle time is greater than timeBetweenEvictionRunsMillis, execute validationQuery to check whether the connection is valid. The default value is false
  • timeBetweenEvictionRunsMillis : There are two meanings: 1) The Destroy thread will detect the connection interval, and if the connection idle time is greater than or equal to minEvictableIdleTimeMillis, the physical connection will be closed. 2) Judgment basis for testWhileIdle. The default value is 60s
  • maxEvictableIdleTimeMillis : If the idle time of the connection is greater than this value, the connection will be closed regardless of the minidle. The default is 7 hours
  • minEvictableIdleTimeMillis : If the connection idle time is greater than this value and the number of idle connections in the pool is greater than minidle, the connection will be closed. The default is 30 minutes
  • maxPoolPreparedStatementPerConnectionSize: To enable PSCache, it must be configured to be greater than 0. When it is greater than 0, poolPreparedStatements will be automatically triggered and changed to true. In Druid, there is no problem that PSCache under Oracle occupies too much memory. You can configure this value to be larger, such as 100. The default value is -1
  • PhyTimeoutMillis: The physical connection is opened for more than this timeout period, and the physical connection will be closed when it is no longer used. It is generally not recommended to open
  • validationQuery : The sql used to check whether the connection is valid, the requirement is a query statement, usually select 'x'. If validationQuery is null, testOnBorrow, testOnReturn, testWhileIdle will not work. The default value is null
  • validationQueryTimeout: unit: second, the timeout period for checking whether the connection is valid. The bottom layer calls the void setQueryTimeout(int seconds) method of the jdbc Statement object. The default value is -1
  • keepAlive : For connections within the minIdle number in the connection pool, and the idle time of the connection is greater than keepAliveBetweenTimeMillis but less than minEvictableIdleTimeMillis, validationQuery will be executed to maintain the validity of the connection. The default value is false
  • keepAliveBetweenTimeMillis : When KeepAlive is turned on, when the idle time of the connection exceeds this value, a query will be executed using validationQuery to check whether the connection is available. The default value is 120s

4) Cache statements

  • poolPreparedStatements: Whether to cache preparedStatement, that is, PSCache. PSCache greatly improves the performance of databases that support cursors, such as oracle. It is recommended to close it under mysql. The default value is false
  • sharePrepareStatements
  • maxPoolPreparedStatementPerConnectionSize: To enable PSCache, it must be configured to be greater than 0. When it is greater than 0, poolPreparedStatements will be automatically triggered and changed to true. In Druid, there is no problem that PSCache under Oracle occupies too much memory. You can configure this value to be larger, such as 100. The default value is -1
2.3 Druid connection pool usage

Using the druid connection pool mainly uses DruidDataSourceFactory to create a DataSource data source object , and then calls its getConnection method to obtain the database connection object. After getting the connection object, the difference from other database connections is that when the close method of the connection is called, the bottom layer is no longer It is to close and destroy the connection object, but to put the connection object into the connection pool, so that when a new request arrives, it can be used directly.

import com.alibaba.druid.pool.DruidDataSourceFactory;
import javax.sql.DataSource;
import java.io.InputStream;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.util.Properties;
public class druidtest {
public static void main(String[] args) throws Exception {
//加载配置文件
        InputStream is = druidtest.class.getClassLoader().getResourceAsStream("druid.properties");
        Properties prop = new Properties();
        prop.load(is);
//根据配置文件内容,创建出数据源对象
        DataSource dataSource = DruidDataSourceFactory.createDataSource(prop);
//通过数据源对象获取数据库连接
        //如果连接池中的连接已经被用完,则会等待一定的时间(所配置的时间)
        //如果等待超时,就会抛出异常
        Connection con = dataSource.getConnection();
//执行 sql 语句,获取并打印结果集
        String sql = "select e_id,e_name,e_age from employee";
        PreparedStatement pst = con.prepareStatement(sql);
        ResultSet rs = pst.executeQuery();
        while(rs.next()) {
            System.out.println(
                    rs.getInt("e_id") + "\t" +
                    rs.getString("e_name") + "\t" +
                    rs.getInt("e_age"));
        }
//释放资源
        rs.close();
        pst.close();
//这里的关闭连接,并没有关闭和销毁连接而是把连接对象,放入到连接池中,供后续访问时直接拿去使用
        con.close();
}
}

Pay attention to thecon.close(), the closing connection here does not close and destroy the connection but puts the connection object into the connection pool for direct use in subsequent visits.

2.4 Connection Keepalive and Recovery Mechanism
2.4.1 Connection Keep Alive

In order to prevent a database connection from being closed by other lower-level services if it has not been used for a long time, the KeepAlive option is defined in druid , which is similar in mechanism to that in TCP. The keep-alive mechanism can ensure that the connection in the connection pool is a real and valid connection. If the connection is unavailable due to special circumstances, the keepAlive mechanism will expel the invalid connection. The keep-alive mechanism is initiated by the daemon thread DestroyConnectionThread. After startup, the daemon thread will enter an infinite loop and call the DestoryTask thread cyclically according to the heartbeat interval timeBetweenEvictionRunsMillis. The default time is 60s.

1) Turn on KeepAlive

// 一个连接在连接池中最小生存的时间
dataSurce.setMinEvictableIdleTimeMillis(60 * 1000);单位毫秒
// 开启keepAlive
dataSource.setKeepAlive(true);

2) Two member variables in DruidDataSource

// 存放检查需要抛弃的连接
private DruidConnectionHolder[] evictConnections;
// 用来存放需要连接检查的存活连接
private DruidConnectionHolder[] keepAliveConnections;

If KeepAlive is turned on, when the idle time of a connection exceeds keepAliveBetweenTimeMillis, this connection will be put into the keepAliveConnections array, and then a query will be executed using validationQuery.

if (keepAlive && idleMillis >= keepAliveBetweenTimeMillis) {
                        keepAliveConnections[keepAliveCount++] = connection;
}
if (keepAliveCount > 0) {
     // keep order
     for (int i = keepAliveCount - 1; i >= 0; --i) {
                DruidConnectionHolder holer = keepAliveConnections[i];
                Connection connection = holer.getConnection();
                holer.incrementKeepAliveCheckCount();

                boolean validate = false;
                try {
                    this.validateConnection(connection);
                    validate = true;
                } catch (Throwable error) {
                    if (LOG.isDebugEnabled()) {
                        LOG.debug("keepAliveErr", error);
                    }
                    // skip
                }

If the execution of validationQuery fails this time, close the link and discard it.

2.4.2 Data source shrinkage

When the Druid data source is initialized, a scheduled DestroyTask will be created. The main purpose of this task is to close the connection that has been idle for a while and meets the closing condition .

1) If the current connection survival time is greater than the configured physical connection time, it will be placed in evictConnections

if (phyConnectTimeMillis > phyTimeoutMillis) {
    evictConnections[evictCount++] = connection;
    continue;
}

2) Idle time > minimum eviction time

                    if (idleMillis >= minEvictableIdleTimeMillis) {
                        if (checkTime && i < checkCount) {
                            evictConnections[evictCount++] = connection;
                            continue;
                        } else if (idleMillis > maxEvictableIdleTimeMillis) {
                            evictConnections[evictCount++] = connection;
                            continue;
                        }
                    }
        if (evictCount > 0) {
            for (int i = 0; i < evictCount; ++i) {
                DruidConnectionHolder item = evictConnections[i];
                Connection connection = item.getConnection();
                JdbcUtils.close(connection);
                destroyCountUpdater.incrementAndGet(this);
            }
            Arrays.fill(evictConnections, null);
        }

As can be seen from the code logic, the selection logic for idle connections to be closed is as follows:

  • For connections with idle time > minEvictableIdleTimeMillis, only poolingCount-minIdle will be closed, and subsequent connections will not be affected;
  • Idle connections that are > maxEvictableIdleTimeMillis will be closed directly;
  • timeBetweenEvictionRunsMillis is the running interval of the scheduled task;
  • minEvictableIdleTimeMillis is the minimum idle time that can close the connection
2.5 Druid connection life cycle

The life cycle of a Druid connection is viewed from two dimensions: one is the application user, including the application, use, and closure of the connection; the other is the connection pool managed by Druid itself, including the creation and recycling of the connection, and the keep-alive mechanism . Specifically as follows:

insert image description here

1) Client connection management

  • The client initiates a connection request to apply for a connection from the Druid connection pool. If there are not enough connections in the connection pool, it will call CreateThread to create a connection;
  • After the client gets the connection, it accesses the database for operation;
  • After the connection operation is completed, release the database resources and close the connection. This step is usually done actively by the application. After the connection is closed, it will be recycled and returned to the Druid connection pool.

2) Druid connection pool management

  • The Druid connection pool sets the minimum number of connections minIdle and the maximum number of connections maxActive . The minimum number of connections supports the warm-up function. The application does not need to be re-initialized every time it applies for a connection, and the performance can be improved under high concurrency;
  • The connection pool will regularly keep the connection alive. The KeepAlive cycle is controlled by timeBetweenEvictionRunsMillis (default value 60s). When the idle time of the connection is found to exceed keepAliveBetweenTimeMillis (default value 120s), it will actively initiate link keep alive, usually by sending SQL to the database. Query, this SQL statement can be customized, usually "select 1 from dual"
  • In order to prevent connection leaks, idle connections will be reclaimed regularly. If the idle time of the connection is greater than minEvictableIdleTimeMillis (the default is 30 minutes) and the number of idle connections in the connection pool is greater than minIdle, the connection will be closed; if the idle time of the connection is greater than maxEvictableIdleTimeMillis (the default is 7 hours) ) then directly close the connection
  • It can be seen from the above that if there is no connection keep-alive, when minIdle is set, some connections within the minimum connection will be closed due to idle connection timeout; of course, if KeepAlive is set and when the keep-alive detection frequency and keepAliveBetweenTimeMillis are less than minEvictableIdleTimeMillis, There will be no idle connections being closed.

3. Problem Analysis

Back to the problem of application disconnection, based on the settings of the Druid connection pool and the timeout setting of the entire link of the application accessing the database, taking the MySQL database as an example, the following configuration can be obtained:

insert image description here

3.1 Apply JDBC url connection configuration

Both connectTimeout and socketTimeout in the JDBC url connection configuration belong to the timeout at the TCP level

  • connectTimeout : Indicates the timeout period for the database driver to establish a TCP connection with the database server. After the timeout, the following exception information may appear
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    ...
Caused by: java.net.SocketTimeoutException: connect timed out
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    ...
  • socketTimeout : After sending data through the TCP connection, the timeout for waiting for a response. It usually occurs when the execution time of sql exceeds the socket timeout setting. After the timeout, a similar error message will appear:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 3,080 milliseconds ago.  The last packet sent successfully to the server was 3,005 milliseconds ago.
    ...
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
3.2 Database layer timeout setting

Taking MySQL as an example, the timeout parameter wait_timeout for idle connections is set at the database layer . The default value is 7 hours , and idle connections will be disconnected automatically after the timeout. In the actual process, the application connects to the database through load balancing or Druid. If the load balancing does not enable session persistence or Druid does not have a connection keep-alive mechanism, the client connection will be actively killed at the database layer after the idle time exceeds 7 hours. But since the Druid connection pool has been used, the database connection on the application side can be well managed by Druid.

3.3 Druid connection configuration

Judging from the error message of the application link disconnection, it is not disconnected after more than 7 hours, but about 300s, so the problem of active link disconnection at the database layer is ruled out. Let's analyze the configuration of the Druid side again, and the parameters related to the connection are as follows:

datasource.druid.validationQuery=SELECT 1 from dual
datasource.druid.validationQueryTimeout=2000
datasource.druid.testWhileIdle=true
datasource.druid.minIdle=50
datasource.druid.maxActive=100
datasource.druid.minEvictableIdleTimeMillis=300000
datasource.druid.timeBetweenEvictionRunsMillis=600000
datasource.druid.keepAlive=true
datasource.druid.keepAliveBetweenTimeMillis=300s
  1. Druid's KeepAlive switch is turned on, and the connection keep-alive mechanism is in effect. By default, druid uses the mysql.ping protocol to check, and use the check statement "SELECT 1 from dual" to also update the idle time of the session, there is no problem here
  2. Druid's link keep-alive detection cycle is timeBetweenEvictionRunsMillis is 600s, and the default value is 60s. The application has adjusted the detection cycle considering performance. If the idle connection exceeds 600s, it can be kept alive again if the detection period is met, but if the connection is closed for less than 600s, it is not caused by Druid, that is, the link is disconnected in about 300s in this problem .
3.4 Session persistence configuration for load balancing

One thing that is easy to overlook is the session retention of load balancing . Session retention refers to a mechanism on the load balancer. While doing load balancing, it also ensures that the access requests related to the same user will be allocated to the same server. . There is a time limit for session retention. Taking F5 as an example, the default is 5 minutes, that is, if the connection is detected to be idle for more than 5 minutes, it will actively disconnect it . The problem seems to be found here. The Druid connection pool’s keep-alive is 10 minutes and the idle connection detection on the load balancing side is 5 minutes. When a connection is idle for more than 5 minutes but less than 10 minutes, it will be load balanced. While killing it, the application will of course report a broken link error when using this connection. Finally, the detection time of session retention in load balancing is adjusted to avoid similar problems.

4. Summary

When the application uses the Druid connection pool to access the database, it needs to adjust the appropriate configuration according to the business TPS and concurrency, so as to use the Druid connection pool to realize the connection creation, keep alive and release management. When encountering a problem like a broken link, it is necessary to conduct an investigation and analysis from each point from end to end to locate the final cause. For example, the configuration of load balancing this time is hard to think of. After the application has applied for a connection from Druid, the connection has exceeded the management scope of Druid and needs to be handled by the application itself, and it should be closed and returned to the connection pool in time. Otherwise, there will be more and more connections on the database side, and the number of idle connections exceeds a certain limit. After a period of time, it will also be disconnected by the database layer or load balancing layer, resulting in a broken link error, which requires additional processing by the application.


References:

  1. https://github.com/alibaba/druid/wiki/
  2. https://www.cnblogs.com/studyjobs/p/15888552.html
  3. https://www.jianshu.com/p/131998f9777d
  4. https://blog.csdn.net/qq_45533884/article/details/107392617

Please indicate the original address for reprinting: https://blog.csdn.net/solihawk/article/details/125612396
The article will be updated simultaneously on the official account "The Direction of the Shepherd". If you are interested, you can follow the official account, thank you!
insert image description here

Guess you like

Origin blog.csdn.net/solihawk/article/details/125612396