Druid source code analysis and learning (including Easter eggs for detailed monitoring design ideas)

    Druid is Alibaba's database connection pool tool. Yesterday, I suddenly wanted to learn Alibaba's druid source code, so I downloaded it and analyzed it. I just looked at it roughly for more than 2 hours. I haven't seen some knowledge points in the middle, and I don't understand it. Now I'm checking BAIDU to learn. Briefly summarize, continue to look at the code while summarizing, it is estimated that there are many mistakes, welcome to correct!

    Before I read it myself, I wanted to find the druid source code analysis, but I couldn't find any information on BAIDU. It just introduced how to configure it, so I could only read it myself. The introduction here, without going into details, focuses on the general direction and design ideas.

1. Born for monitoring, when and how?
    Simple database operations usually involve datasource, connection, preparedstatement, ResultSet, etc. If I want to monitor these, I must build some proxy classes.
   Our operations are all completed by the agent class, and in the process of completion, monitoring data is generated.
   Druid is called monitoring, and the monitoring function is a needle. As the saying goes, if there is no gap, we need to create a gap, so we need to build an agent class. The gap between the agent class and the agent is the gap. The proxy object must hold the proxy object.
    public interface PreparedStatementProxy extends PreparedStatement, StatementProxy
    as the implementation class: PreparedStatementProxyImpl, which holds a java.sql.PreparedStatement.
    Just look at its query method:
    public ResultSet executeQuery() throws SQLException {
.....
        return createChain().preparedStatement_executeQuery(this);//The filter chain is generated and executed by the filter chain.
    }
  FilterChainImpl has:
    public ResultSetProxy preparedStatement_executeQuery(PreparedStatementProxy statement) throws SQLException {
        if (this.pos < filterSize) {
            return nextFilter().preparedStatement_executeQuery(this, statement);
        }
        ResultSet resultSet = statement.getRawObject().executeQuery();
        return wrap(statement, resultSet);
    }

    The above method description: Before executing the query, it must be processed by the filter chain. After the processing is completed, it is executed by the statement. After the execution is completed, a ResultSet is obtained, and a proxy class that generates the final return is wrapped.

2. Let’s talk about statistical filters. It’s just a link in the filter chain.
   The most common one in this mode is the filter configured in web.xml. You configure several filters, all of which implement the dofilter() method. Then organize the filters into a container such as the List of the filterchain, then you can execute dofilter in a loop to process some stuff. It can be seen that the filterchain holds all the filters. After the filter is executed, it is necessary to tell the filterchain to execute the next one, so the fiter must also hold the filterchain. But the filter does not need to hold the filterchain object all the time, just temporarily hold it for a while, so the filterchain is passed in as a parameter of the method. This also shows that the filter can be a part of this chain, or it can be a part of another chain at the same time, anyone can do it.
This process is similar to the observer mode and the callback mode.
   Let's take a simple look at a statistical connection submission method in StatFilter: (StatFilter has many filtering methods, accounting for various database operations; of course, the same is true in logfilter, and WallFilter, which defends against SQL injection attacks, is estimated to be the same: )
    @Override
    public void connection_commit(FilterChain chain, ConnectionProxy connection) throws SQLException {
        chain.connection_commit(connection);
        JdbcDataSourceStat dataSourceStat = chain.getDataSource().getDataSourceStat();
        dataSourceStat.getConnectionStat().incrementConnectionCommitCount();
    }

  First, let the chain do one step (that is, the nextFilter starts to work, and all the filters are finished, and then the commit is done), and then the operation count of the commit of the data source is increased.
    public void connection_commit(ConnectionProxy connection) throws SQLException {
        if (this.pos < filterSize) {
            nextFilter().connection_commit(this, connection);//Let the next work
            return;
        }
        connection.getRawObject().commit();//After all the work is done, it is really submitted. This connection is also a proxy, allowing the real java.sql.connection inside to submit.
    }
    private Filter nextFilter() {
        Filter filter = getFilters().get(pos++);
        return filter;
    }

There is a private final List<Filter> filters = new ArrayList<Filter>() in DataSourceProxyConfig;//It is a common arraylist filter.

The above two points are combined, that is, a database operation was originally performed, and now it is performed for the proxy class. During the execution, the statistics are passed through one by one filter, and then the database operation is actually performed. Transparent to the end user.

3. How are the statistics recorded?
In the stat package, just find an object and look at it. For example: JdbcStatementStat
................................
    private final AtomicLong createCount = new AtomicLong(0); // Count of execute createStatement
    private final AtomicLong prepareCount = new AtomicLong(0); // Count of execute prepareStatement
    private final AtomicLong prepareCallCount = new AtomicLong(0); // Count of executing preCall
    private final AtomicLong    closeCount       = new AtomicLong(0);

......................................
wow, a whole bunch of stat counters, all from AtomicLong , which is thread-synchronized, and its incrementAndGet() method is called when the count is incremented. But TableStat is an ordinary int, huh, huh.

4. Talk about the execution of the top blue word
ResultSet resultSet = statement.getRawObject().executeQuery();
     PreparedStatementProxy represents an object that implements the java.sql.statement interface, so it uses that object to execute the query. What is that object like?
     By the way, it's this one: DruidPooledPreparedStatement. It can be seen that this guy is also proxying others, because it uses stat to query, but it just does some other things before and after the proxy.
    public ResultSet executeQuery() throws SQLException {
        checkOpen();
        incrementExecuteCount();
        transactionRecord(sql);
        oracleSetRowPrefetch();
        conn.beforeExecute();
        try {
            ResultSet rs = stmt.executeQuery();
            if (rs == null) {
                return null;
            }
            DruidPooledResultSet poolableResultSet = new DruidPooledResultSet(this, rs);
            addResultSetTrace(poolableResultSet);
            return poolableResultSet;
        } catch (Throwable t) {
            throw checkException(t);
        } finally {
            conn.afterExecute();
        }
    }

    Among them: addResultSetTrace is to put the query result in List<ResultSet> resultSetTrace;. Why?

5. See what several Holders do?
What's in DruidConnectionHolder? When new DruidPooledConnection, just use this holder.
    private final DruidAbstractDataSource       dataSource;
    private final Connection                    conn;
    private final List<ConnectionEventListener> connectionEventListeners = new CopyOnWriteArrayList<ConnectionEventListener>();
    private final List<StatementEventListener>  statementEventListeners  = new CopyOnWriteArrayList<StatementEventListener>();
    private PreparedStatementPool statementPool; //This is a pool of LRU algorithms. It is the PreparedStatementHolder below!!!!
    private final List<Statement>               statementTrace           = new ArrayList<Statement>(2);

What about in PreparedStatementHolder? When new DruidPooledPreparedStatement, just use this holder.
    private final PreparedStatementKey key;
    private final PreparedStatement    statement;


Holder is what it holds from the name. DruidConnectionHolder must hold Connection, and PreparedStatementHolder must hold PreparedStatement. DruidConnectionHolder of course also holds the PreparedStatement that belongs to this connection. Through several calling relationships, we can almost guess the design idea:
Generally, we use the connect object, then generate the statment object, and then execute SQL and the like. When we do some statistics and other operations before and after the execution of an object, then Do it with proxy objects, such as the previous filterchain used in proxy objects. However, if you want to keep some related stuff between them when calling other objects, such as all PreparedStatements under a connection, you need a holder object to help. Maybe you can put a bunch of other stuff on this object, but it's not clear, it's too messy.
Maybe holder can also be called a design pattern. Of course, so is the proxy. Remember that there is a handler called handler, which has behavioral significance! !
The PreparedStatement prepareStatement(String sql) method in DruidPooledConnection is to see if there is any in the pool (stmtHolder = holder.getStatementPool().get(key);), if not, new PreparedStatementHolder(key, conn.prepareStatement(sql)); , if there is any, it is taken from the memory container.

The relationship is probably like this: DruidPooledConnection-->DruidConnectionHolder-->ConnectionProxy-->filterChain---connection.

6. Talk about the LRU cache used above, which is the pool for storing PreparedStatementHolder.
  A LinkedHashMap, just implement the removeEldestEntry method yourself, and throw away the oldest when the capacity is reached.
    public class LRUCache extends LinkedHashMap<PreparedStatementKey, PreparedStatementHolder> {
        private static final long serialVersionUID = 1L;
        public LRUCache(int maxSize){
            super(maxSize, 0.75f, true);
        }
        protected boolean removeEldestEntry(Entry<PreparedStatementKey, PreparedStatementHolder> eldest) {
            boolean remove = (size() > dataSource.getMaxPoolPreparedStatementPerConnectionSize());
            if (remove) {
                closeRemovedStatement(eldest.getValue());
            }
            return remove;
        }
    }


7. I don't know much about the MOCK package
  . I checked it online and said that it is a fake object that is easy to test. It implements the relevant interface, so it can be used as what it fakes. Especially if the real thing is inconvenient to use, or is slow, or has other non-ideal situations.

8. Except for this, the main function packages under sqlPaser
  druid have been introduced. It is said to be a SQL parser, but I have no time to read it. I will add it next time I read it.

9.connectPool connection pool The
  connection pool is of course the highlight. Let me briefly mention it first. It mainly uses the ReentrantLock lock and the two conditions of notEmpty empty. The threads of the production connection and the consumption connection wait and wake up on the two conditions. The connection pool is determined by the data source, so it depends on the DruidAbstractDataSource and DruidDataSource classes in the pool package.
Wow, these two classes are very huge. First of all, I looked at the properties. There are many count, time, and some default values. Pay attention to the collection fields, such as Map<DruidPooledConnection, Object> activeConnections
private volatile DruidConnectionHolder[] connections; . There are also some threads in there.
9.1 Create a connection
  Let's first look at what the CreateConnectionThread in DruidDataSource does. First of all, there are some conditions, such as the following code (omitting the unimportant code), when there are too many connections, wait on the empty condition, that is, wait until it is empty before running, Don't rush to create a connection now, just wait!
// Prevent the creation of more than maxActive connections
                        if (activeCount + poolingCount >= maxActive) {
                            empty.await();
                            continue;
                        }
                try {
                    connection = createPhysicalConnection();
                    setFailContinuous(false);
                boolean result = put(connection);

The latter is to create a physical connection, and then put it. This put is likely to be placed in the pool, so take a closer look. Mainly the following sentences, the instructions are written in the back:
    holder = new DruidConnectionHolder(DruidDataSource.this, physicalConnectionInfo);//Generate a connection holder
            connections[poolingCount] = holder;//Isn't this a pool? It is an Array of DruidConnectionHolder.
            incrementPoolingCount();//Add 1 to the count in the pool.
            notEmpty.signal();//Send a non-empty signal, all threads waiting on a non-empty condition, you may have moved.
            notEmptySignalCount++;
9.2 Using Connections
    So who waits on a notEmpty condition? We checked and found that in the method DruidConnectionHolder takeLast(), wait when the number in poolingCount is 0. It just shows the thread that uses the connection. When there is no connection, just wait. What if there are connections in the pool? Just execute the following statement:
        decrementPoolingCount();//Reduce the count of connections in the pool, of course, take one less.
        DruidConnectionHolder last = connections[poolingCount];//The last one in the pool is just taken.
        connections[poolingCount] = null;//The last one becomes null.
    By the way, let's see who is using takeLast(), and find that it is getPooledConnection(), which is the main method to get the link from the name. Take a closer look at the call to getConnection() in getPooledConnection, where a filter chain is inserted again, and it is indeed born for statistics, which are all recorded. If you look carefully at the parameter of filterChain.dataSource_connect(), there is this, indicating that it has passed itself in, indicating that this filterChain does not belong to any datasource, it can be this data source or that data source. Which one to filter specifically is temporarily passed in.
    When we design the filter chain, if our function is to serve multiple people, it means that we need to pass in the service object. Instead of setpropety set to the relationship. How does one design complex code? Of course, there is a very abstract and clear idea in mind.
9.2 Reducing
connections There is also a DestroyConnectionThread() near the connection thread creation, let's take a look. In
the trace, there is destroyTask.run();----->shrink(true); See the name is shrink, maybe the connection is too idle If there is more, it will be reduced.
In the shrink() method, the key points are the following statements, and the analysis is placed later:
final int checkCount = poolingCount - minIdle;//The number in the pool - the minimum idle number, it seems to be one of the conditions for shrinking.
        for (DruidConnectionHolder item : evictList) {//Recyclables are placed here
            Connection connection = item.getConnection();
            JdbcUtils.close(connection);//Close these connections.
            destroyCount.incrementAndGet();
        }


9.3 Who is using the two threads in the initialization method init()
? Look it up and find that creating threads and shrinking threads are called by void init(). You can tell by the name that this is the main method of system startup.
void init(){
     initFromSPIServiceLoader();//load filters from SPI ServiceLoader, this spi will not be introduced. In another post on the analysis of dubbo, there is already an introduction to SPI. Anyway, put the configured filter in the filterchain (List<Filter>)
connections = new DruidConnectionHolder[maxActive];//New connection pool, the number is the maximum number of active connections maxActive.
                for (int i = 0, size = getInitialSize(); i < size; ++i) {//Put the connection in the connection pool
                    PhysicalConnectionInfo pyConnectInfo = createPhysicalConnection();
                    DruidConnectionHolder holder = new DruidConnectionHolder(this, pyConnectInfo);
                    connections[poolingCount] = holder;
                    incrementPoolingCount();
                }
            createAndLogThread();//If the name is the log, I will not read it
            createAndStartCreatorThread();//The thread that creates the connection has been working all the time. The pool is full and it is in the waiting state.
            createAndStartDestroyThread();//The thread of the shrinking pool has been working.

            initedLatch.await();//The main thread waits until the counter is 0.
    init = true;
}

There is a point of knowledge. CountDownLatch initedLatch = new CountDownLatch(2); is called the countdown synchronizer. The current synchronization number is 2. After it becomes 0, the main thread can run, otherwise it is waiting.
There are initedLatch.countDown(); in the threads that create the connection and shrink the pool, there are exactly two in total, then the main thread waits for the above two threads to run before running, and then sets the init status flag to true. It seems that understanding is correct.

Just look at a create thread, which has a countDown method:
    protected void createAndStartCreatorThread() {
        if (createScheduler == null) {
            String threadName = "Druid-ConnectionPool-Create-" + System.identityHashCode(this);
            createConnectionThread = new CreateConnectionThread(threadName);
            createConnectionThread.start();
            return;
        }
        initedLatch.countDown();//If there is createScheduler, directly -1;
    }
    public class CreateConnectionThread extends Thread {
        public void run() {
            initedLatch.countDown();//If not, it will be -1 in it;


10. Summary:

Have you seen the source code? How exactly do we improve? Can we make such good stuff too?
1. First of all, you will have enough basic knowledge, such as multi-threading, things in the concurrent package, and even rarely used containers. It shows that the author has read a very thick information book or read a lot of java source code.
2. In-depth study of foreign source code. I've found some similar treatments in this product and in that product, but not necessarily exactly the same. For example, the asynchronous to synchronous conversion in hadoop is similar to that of dubbo, and some people use fastfail in spikes.
3. After reading the two source codes, you should be able to abstract the way to deal with the problem, so that you can apply it immediately when you encounter a similar situation.
[Easter egg] After  
basically reading the main code, I have a deeper understanding of the sentence born for monitoring. Let's choose one of them and experience the filter chain mode in depth. It is an easter egg in this post. This stuff is really used in many places, so how do we use it? When using it, there should be three objects, namely the object to be processed, the filter, and the filter chain. In the previous post, I made a point that the essence of the object is the combination of various relationships, and the combination is the most important. So how should the inclusion relationship, reference relationship, and main methods between these two objects be designed, and why?
A filter is a basic unit with the following characteristics: it does not refer to a filter chain because it can belong to different filter chains. It doesn't refer to the handle object because it can handle this object as well as that object. So in these three, the filter method of the filter will pass in the other two objects, but will not be generated in other properties and methods. The dofilter parameters in the filter in the web are chain, req and res.
The filter chain is generally configurable. The filter is a basic unit, so the chain should give the user the opportunity to configure it. The filter may be added in the future, but this is not a function for the user, and can be added through the spi mechanism.
The filter chain is a combiner, its characteristics: when generating a filter, the basic filter must be passed in, and it is a stable reference relationship, but because there are multiple filters on the chain, there must be a container to put them . So the filter chain holds a container, and when init, put in one by one filter. The filters are executed one by one in a string, so there is also a positioning information, and now that one is executed. Thinking about it further, some objects processed by the filter chain are processed to the second, some are processed to the fifth, and some are processed to the third. Here comes the problem of multithreading. Check the code (protected int pos = 0;if (this.pos < filterSize) donext), it seems that the pos positioning count is not a threadlocal, and then look at public FilterChainImpl(DataSourceProxy dataSource), the data source is passed in the constructor, So a data source will not pass in the count pointer, only the filter container. Looking back, I want to check the source code of the filter in the web. What is the container inside and what is the location? Is it a threadlocal variable?
(To make up, I found it later in tomcat: private ApplicationFilterConfig[] filters = new ApplicationFilterConfig[0]; private int pos = 0; in the properties of org.apache.catalina.core.ApplicationFilterChain, it also includes functions such as beforeFilter and afterFilter, The author must have learned these filtering codes, and I am too young, and I have seen too few things)
It's a bit complicated. Let's look at FilterChainImpl, which does not hold a filter unit, and the important methods, getFilters, nextFilter, are all from the dataSource that constructs the chain. Kind of like what? Like in the past, you went to a restaurant to eat, but now you bring the food and let the restaurant process it. It makes sense, because the filter chain rule does not necessarily hold the basic unit, just like the bubble sort does not necessarily hold the sorting element, it can be abstracted separately. Since the filter chain contains non-threaded pos, each filter chain is one-time use, otherwise the pos pointer will be messed up. Then look up all the new FilterChainImpl() in the project, we found that there are 4 in the pool package, and the others are in the proxy package, which just proves the above-mentioned gaps, create a proxy, and insert it between the proxy and the proxy. Filter chains, or whatever else you want. For example, there is a createChain() method in ClobProxyImpl, and createChain must be called in each clob operation, which means that each operation is a new filter chain, then the pos count mentioned above is only used by its own method, and there will be no sharing. Conflict happened.

Let's think about it again. There are three main elements (the processed object, the filter, and the filter chain). The filter chain is split, leaving only the filter rules and counters. The filter container is handed over to the object to be processed, and it is processed with incoming materials. The filtering chain is created temporarily by each proxy object, to be precise, each specific proxy object needs to be created temporarily using the filtering method. Another way of thinking, let's look at a filter, see that the interface and the object are all formed, and find that there are a lot of methods for dealing with connect, a lot of methods for dealing with resultset, there are a lot of methods for dealing with statemente, and there are a lot of methods for dealing with .. ...

Wow, so many, it's all here, every filter is huge, comprehensive, heavyweight, only one copy. A data source holds several filters, and the filter chain is not slender, but chunky. And any simple operation of any of your objects (of course, the proxy object) will generate a filter chain. So you shouldn't hold filters in your filter chain, it makes perfect sense to hold every simple count pos, the filter chain is a lightweight object.

Think about it further, if we think of a function at the beginning, such as such a monitoring function, we can definitely know that using the filter method, can we organize the relationship between these objects in this way? And is there any possibility of further reconstruction of such an organization?
We may have a batch of filters related to connections, a batch of filters related to resultsets; from another perspective, we have filters for statistics, and filters for logs. We also have filter chain rules. We have a lot of objects to filter, which can be broken up first. For example, the rules of the filter chain are separated from the filter chain container. Try to combine again, you can combine by connection (connection statistics, connection log together), you can combine functionally (connection log and resultset log together). Then we plug in either the entire statistic or the entire log. If there is another combination, we can only connect statistics and logs, and we can only query logs and statistics. But if you want to connect statistics and query logs, then Druid is currently not supported. It's a bit like dismantling into pieces and building blocks.
For another example, the filter container is now held by the datasource. Then the granularity of the filter is just different data sources. Some filters are for connect, some are for resultset, and the subordinate objects of all data sources are the same filter container. Are there any personalized requirements in this regard? For example, for clob, I only want to log and do not want statistics. Is there any parameter that can be configured when creatingChain?

In addition, when looking at FilterChainImpl, in addition to a large number of methods to be filtered, there are several wrap methods. wrap is exactly the proxy object that generates the filter gap according to the original object, and at the same time passes all the conditions for generating the filterchain to it, so that the proxy object can execute each method and the new filterchain can come out. In each specific method of the proxy object, when the filterchain is called for processing, the proxy itself is given to the filterchain. During the execution of the filterchain, the proxy object is obtained by the getRawObject of the proxy object to perform the final business. If the proxy object executes three different methods in a row, it is the first time that a new filterchain is created. After a chain is processed, it will recycleFilterChain(chain); reset the pos count, and the latter two methods will still be held by this proxy object. filterchain, but the count becomes 0.

Let's think about it again: I have an original object, generate a proxy object through FilterChainImpl, and then pass the proxy object to the specific method of FilterChainImpl for use. The method is filtered and then the original object is taken out for use. Another analogy: I have a stainless steel cup, and let others wrap it in mud to become a mud cup, and then let others do the processing such as firing, coloring, and painting. In the end, it became a beautiful ceramic water cup. Of course, the drinking function is still realized by the original stainless steel cup, because it is not a stainless steel kitchen knife. Of course, kitchen knives can also be wrapped in mud, fired, colored, painted, etc., and finally become a ceramic knife. Claying and firing, coloring, and painting are of course different businesses, but you can open a shop to do it.

To summarize again:
Regarding druid, configure a data source, which holds a bunch of filter parts, holds a connection pool, and there are two main threads running in the connection pool. When I do any operation on any object, wrap it for me, and then generate or reset a filter chain and filter it before performing the operation.


Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326819254&siteId=291194637