Join query optimization based on deep reinforcement learning

Krishnan, S., et al. (2018). "Learning to optimize join queries with deep reinforcement learning."

How to optimize SQL connections is a big problem that the database community has been studying for decades. A study published by Berkeley RiseLab shows that deep reinforcement learning can be successfully applied to optimize SQL connections.

This paper shows how to use deep reinforcement learning techniques to overcome this challenge that has existed for decades. The author expresses the connection ordering problem as a Markov Decision Process (MDP), and then builds an optimizer using a deep Q network (DQN) to effectively order the connections. Then based on the Join Order Benchmark (recently proposed workload, specifically for stress testing connection optimization) the method of this article was evaluated.

 

background

The database community has conducted nearly 40 years of research on SQL query optimization, which can be traced back to the dynamic programming method of System R. The core of query optimization is the problem of join ordering. Although this problem has been around for a long time, there are still many research projects trying to better understand the performance of the connection optimizer in multi-join queries, or to solve the problem of large join queries that are ubiquitous in enterprise-level "data lakes".

 

RiseLab laboratory

 

The AMPLab of UC Berkeley University was once one of the world's top laboratories in the field of big data. In the past six years, it has launched a number of major technological innovations, such as Apache Spark , Apache Mesos and Alluxio . Now it has been closed and replaced by RISELab experiments. room. RISELab will focus on providing SRDS, a secure and real-time decision stack. The existing well-known jobs include Ray, Drizzle, etc.

Traditional dynamic programming method

 

Let us first review the traditional dynamic programming (DP) method.

Suppose a database contains three tables: Employees, Salaries and Taxes. The following query is used to find the "total tax of all Manager 1 employees":

SELECT SUM(S.salary * T.rate)

FROM Employees as E, Salaries as S, Taxes as T

WHERE E.position = S.position AND

                T.country = S.country AND

                E.position = ‘Manager 1’

This query contains three relational connections. In the following example, we use J(R) to represent the cost of accessing the basic relationship R, and J(T1,T2) to represent the cost of connecting T1 and T2. For simplicity, we assume that the physical execution layer has only one access method, one connection method, and one symmetric connection cost (ie J(T1,T2)=J(T2,T1)).

Briefly introduce the connection of left deep tree, right deep tree and dense tree

Left-Deep: All right subtrees must be a relationship, and there is no requirement for the left subtree

Right-Deep: All left subtrees must be a relationship, right subtree has no requirement

Bushy: Except for LD and RD, they are all Bushy

Zig-Zag (VLDB'93): The characteristic of zig-zag is that at least one of the subtrees is a relation, so it contains both the left deep subtree and the right deep subtree.

 

The classic "left-deep" DP method first calculates the best cost of accessing three basic relationships, and we put the results in a table:

Remaining Relations

Joined Relations

Best

{E, S}

{T}

J(T), i.e., scan cost of T

{E, T}

{S}

J(S)

{T, S}

{E}

J(E)

Then, it enumerates all binary relations based on this information. For example, when calculating the optimal cost of {E,S} connection, it will look up the previous related calculation results:

Best({E, S}) = Best({E}) + Best({S}) + J({E}, S)

This results in a new line:

Remaining Relations

Joined Relations

Best

{E, S}

{T}

J(T), i.e., scan cost of T

{E, T}

{S}

J(S)

{T, S}

{E}

J(E)

{T}

{E, S}

Best({E}) + Best({S}) + J({E}, S)

This algorithm traverses other sets of binary relations and finally finds the best cost to join all three tables. This requires taking the smallest value among all possible "left-deep" combinations of binary relations and basic relations :

Remaining Relations

Joined Relations

Best

{E, S}

{T}

J(T), i.e., scan cost of T

{E, T}

{S}

J(S)

{T, S}

{E}

J(E)

{T}

{E, S}

Best({E}) + Best({S}) + J({E}, S)

{E}

{T, S}

Best({T}) + Best({S}) + J({T}, S)

{S}

{E, T}

Best({E}) + Best({T}) + J({E},T)

{}

{E, S, T}

minimum { Best({E,T}) + J(S) + J({E,T}, S),
Best({E,S}) + J(T) + J({E,S}, T),
Best({T,S}) + J(E) + J({T,S}, E) }

This is the dynamic programming method. Assuming there are N relations, the space complexity and time complexity of this algorithm are O(N!) , which is why it is usually only used for join queries with less than 10 relations.

Insufficient existing methods

The connection order selection problem is an NP-hard problem, and there are many heuristic methods to help efficiently find a better connection order. These heuristic methods are easy to understand under the linear cost model (the join cost and the size of its input relationship are linear). However, in many real systems, there are a large number of nonlinear cost models, for example, the intermediate results exceed the memory size. Causes disk overflow, or the size of the relationship exceeds a certain threshold, causing the underlying physical execution to change (such as using Sort Merge Join or Hash Join), etc. Therefore, it is easy to find the failure of the classic heuristic method. As shown in the figure below, Index-Mostly on the left indicates that all data is basically in memory. At this time, the performance of the left deep tree search method is acceptable, but when the scene Slightly more complicated, such as the two on the right, the middle one is a mixed Hash scene, the data will be partially overwritten to disk, and the right is a scene where the Hash table is reused. In these two scenarios, the left deep tree search method performs poorly. .

 

 

Other methods involved in the article:

QuickPick (British National Conference on Databases'2000): It randomly generates 1000 feasible connection plans, and then uses an estimator to evaluate the best plan among the 1000 as output. QuickPick-1000 can quickly give a connection plan, and because it does not limit the search space, it may give a poor plan.

IK-KBZ (VLDB'86): It treats Left-Deep Tree as a sequence to find the optimal plan in polynomial time. It assumes a relatively simple cost model (the cost function needs to satisfy the Adjacent Sequence Interchange (ASI) )), through such a cost model to calculate a rank value, each time a relationship is selected for join, the rank value of all remaining relationships is calculated, and then the lowest rank is selected greedily. It is more suitable for acyclic query graphs, and the quality of graphs for circular queries is poor. The intuition is to minimize the intermediate result of the previous join as much as possible.

 

method

 

Connect and sort through reinforcement learning

Express the connection ordering problem as MDP

  1. Query graph G=(V, E): where v i represents a vertex that is a relationship, and e i represents an edge, that is, the connection condition between the relationships. status
  2. Connection c= v i , v j : means that two relations perform a join action
  3. The next query graph G': the new query graph (including the intermediate result of the connection) generated after the connection c is applied The next state
  4. J c : the cost of connecting c reward

The connection sorting can be simply expressed as (G, c, G', J). The whole sorting process is to continuously select two relations from the query graph to join until only one relation remains.

First consider the greedy algorithm, assuming there are three relations E, P, S, the following query

 

 

The connection cost of each relationship is as shown above. The greedy algorithm will consider the current optimal at each step, so it will choose  the connection order SPE , the total cost is 140, and the optimal connection order is EPS  , The price is 110. The time complexity of this algorithm is O(N^3), which is obviously better than the dp method in time efficiency, but the solution obtained is often poor.

Reinforcement learning is similar to the greedy algorithm. The difference is that the greedy algorithm considers the choice with the least current cost, while reinforcement learning considers the choice with the least long-term return cost, that is, for each choice, choose the long-term benefit for the entire sorting process. The biggest one. In the above example, reinforcement learning will know that the long-term cost of selecting EP is the smallest at the beginning, so the optimal solution can be obtained. The difficulty is to train a function that can evaluate the long-term cost of each current choice. In Q-learning (a popular RL technology), we call this function the Q function Q(G, c), which intuitively describes the long-term cost of each connection: we make an analysis of all subsequent connections after the current connection decision The cumulative cost of taking the best action.

Q(G,c)=J(c)+\min{_c’}Q(G’,c’)

 

If we can access the real Q function, we can perform greedy connection sorting:

Algorithm 1

  1. Start with the initial query graph;
  2. Find the connection with the lowest Q(G,c);
  3. Update the query graph and repeat.

According to Bellman's optimality principle, our algorithm can be proved to be optimal. The fascinating thing about this algorithm is that its computational complexity is O(n^3). Although it is still very high, it is far lower than the exponential runtime complexity of dynamic programming.

 

Figure 1: Using a neural network to approximate the Q function. The output means "If we connect c on the current query graph G, how much is the cost of minimizing the long-term connection plan?"

Of course, we don't actually have access to the real Q function, we need to approximate it. To this end, we use a neural network (NN) to learn the Q function, and call it a deep Q network (DQN). This technology is exactly the same as AlphaGo, a technology used to learn the abilities of expert Go gamers. All in all, our goal is to train a neural network that receives (G,c) as input and outputs an estimated Q(G,c), as shown in Figure 1.

Deep reinforcement learning optimizer DQ

Now introduce our deep reinforcement learning optimizer DQ.

 

Data collection​​

To learn the Q function, we first need to observe the past execution data. DQ can accept a series of (G, c, G', J) from any underlying optimizer. For example, we can run the classic left-deep dynamic programming (as shown in the background section) and calculate a series of "connection trajectories" from the DP table. The tuple in the complete trajectory looks like (G,c,G',J)=({E,S,T}, join(S,T), {E,ST},110), which represents The step of starting the query graph (state) and connecting S and T together (action).

We have used J to represent the estimated cost of the connection, but if the data is collected from a real database execution, we can also use the actual runtime.

When the number of connection relations does not exceed 10, DQ uses busy dp to collect training data, otherwise it uses greedy algorithm to collect data.

 

 

Characterization of states and actions

Since a neural network is used to represent Q(G,c), we need to feed the state G and action c as fixed-length feature vectors to the network. DQ's characterization scheme is very simple: we use 1-hot vectors to encode (1) query the set of all attributes in the graph, including all attributes in the pattern, (2) connect the participating attributes on the left side, (3) connect Properties on the right. as shown in picture 2.

 

Figure 2: Query and its corresponding characterization. We assume a database containing three tables, Employees, Positions, and Salaries. The figure shows partial connections and full connections. The final feature vector of (G,c) is the concatenation of A_G (attribute of the query graph), A_L (attribute on the left) and A_R (attribute on the right).

Although this scheme is very simple, it is expressive enough. It should be noted that our solution (and learning network) assumes a fixed database, because it needs to know the exact attribute set and table set.

 

The above figure (a) is the selection rate of the predicate in the query graph feature, and the figure (b) is the physical operator feature in the join feature.

 

Neural network training and planning

By default, DQ uses a simple two-layer fully connected network and uses standard stochastic gradient descent for training. After training, DQ can accept plain-text SQL query sentences, parse them into abstract syntax trees, characterize the trees, and call the neural network every time a candidate connection gets a score (that is, in step 2 of Algorithm 1 Call neural network). Finally, the feedback from the actual execution can be used to periodically readjust the DQ.

Algorithm 1

  1. Start with the initial query graph;
  2. Find the connection with the lowest Q(G,c);
  3. Update the query graph and repeat.

 

result

For evaluation, this article uses the Join Order Benchmark. This database consists of 21 tables from IMDB and provides 33 query templates and 113 queries. The size of the connection relationship in the query ranges from 5 to 15.

The experimental results mainly answer the following three questions:

How efficient is DQ in making plans? How good is it? under what conditions?

How efficient is DQ's planning in terms of operating plans and required data?

Is DQ technology suitable for actual scenarios, systems and workloads?

In order to solve the first two problems, we conducted experiments on an independent DQ. The last question was evaluated through the end-to-end experiment of DQ integrated Postgres and SparkSQL.

 

Note : 1) The above figure is a comparison of the average plan sub-optimality of different cost models. 2) CostModel1: Simulate the memory database and encourage the use of index connections (Hash connections also exist), CostModel2 only considers Hash connections and nested loop connections with memory budget. CostModel3 considers reusing the built hash table. 3) The experiment performed 4 rounds of cross-validation to ensure that only queries that did not appear in the training workload were DQ evaluated (for each case, we trained on 80 queries and tested 33 of them). We calculate the average suboptimality of the query, that is, "cost (algorithm plan) / cost (best plan)". The lower the number, the better.

Conclusion : 1) In all cost models, DQ can approach the optimal solution without exponential prior knowledge. For fixed dynamic programming, this is not the case: for example, left-deep produces a good plan in CM1, but it does not work so well in CM2 and CM3. Similarly, the right-deep plan is not competitive in CM1, but if you use CM2 or CM3, the right-deep plan suddenly becomes less bad. 2) DQ is more adaptable to changes in workload, data or cost models.

 

Note : The above figure is the optimizer delay of all 113 JOB queries, grouped by the number of relations in the query. Error bars indicate ± standard deviations around the average. A total of 5 trials were carried out.

Conclusion : 1) In large connections, DQ has achieved great speedup. For the largest connection, DQ is 10,000 times faster than exhaustion, 1000 times faster than zig-zag, and faster than left-deep and right-deep enumeration 10 times faster. 2) If GPU or TPU acceleration is used, it will show greater advantages

 

Note: The above figure shows the performance change of the execution plan given by DQ as the number of training queries increases. The dotted line is the performance of the QuickPick algorithm.

Conclusion: When the number of training queries exceeds 30 (which basically covers all relationships), DQ is better than QP.

 

In fact, DQ can use small connection query data for training, and then test on large connection queries.

 

Note: 1) The above figure shows the sub-optimality of DQ plans under different types of training data sets (the smaller the Mean Relative Cost, the better). 2) The test set has up to 15 relational join queries.

Conclusion: DQ only uses connection query data with no more than 9 relationships for training, and it can achieve near-optimal results, but the effect of smaller connection query data will be significantly reduced because of insufficient data set coverage.

 

Note: 1) The above figure shows the sub-optimality of the DQ plan (the smaller the better) as the correlation of the training data set changes. 2) The so-called training data set correlation refers to the correlation between the training data set and the test data set. 3) R80 means that the connection relations and connection predicates of 80 training data are randomly generated; R80wp means that the connection relations of 80 training data are randomly generated, but the connection predicate comes from the predicate of JOB; WK80 means that 80 pieces of data are JOB 80 queries sampled in, T80 means that the training set covers all 33 query templates of JOB. From left to right, the data becomes more and more relevant.

Conclusion: 1) Even randomly generated data has good results, so DQ does not need to know the workload in advance. 2) As the data correlation increases, the performance of DQ will be better, which means that the stronger the data correlation, DQ can achieve the same effect with less training data.

 

Note: 1) The upper left picture is an experiment comparing DQ and Postgre's optimizer, including running time and optimization delay. 2) In the experiment, the optimizer of Postgre was configured as busy dp when collecting data, and Postgre was still the default Left-deep dp during comparison. 3) The picture on the right is the improvement brought by limiting the DQ scheme to LD and EX (the left picture).

Conclusion: 1) In terms of running time, some DQ queries are significantly faster than Postgre, with an average performance improvement of about 14%. 2) In terms of optimization latency, DQ optimization latency is about 3 times faster than Postgre for large-scale connection queries. 3) Even if the space of the DQ scheme is limited to LD, the execution time can be improved. The reason may be that DQ eliminates the inconsistency of the optimizer's evaluation error.

 

Note: 1) The above figure is an experiment comparing DQ and Spark's optimizer, including running time and optimization delay, and running TPC-DS. 2) The Spark optimizer in the experiment is bushy dp

Conclusion: 1) In terms of running time, DQ is not worse than Spark bushy dp. 2) In terms of optimizing delay, the dot in the lower right corner is an 18-relation join. DQ has a clear advantage (250 times). Mainly because the number of join relations in the TPC-DS query set is small.

 

Note: 1) The above figure compares the performance of different variants of Postgre and DQ on query Q10c. 2) DQ refers to offline learning using only the data generated by Postgre's evaluator; DQ+FineTuned refers to online learning using real data on the basis of DQ; Online-DQ refers to online learning using only real data.

Conclusion: 1) Using real runtime to fine-tune the offline learning model will make the model more accurate. When nearly 100 actual runtime data is fine-tuned, the execution plan given by DQ runs 3 times faster than the original and 3.5 times faster than Postgre. It shows that the model will perform better and better when deployed in actual scenarios. 2) Only use real data for learning. Because there is less data, the convergence will be slow, so the performance in the figure is poor.

 

Data sensitivity

 

Note: 1) The above table shows the sensitivity of the QP and DQ methods to the data (that is, the stability of the model when the training data set changes), the smaller the better. 2) In this experiment, 5 different data sets are used to train 5 DQ models, and 20 fixed queries are tested.

Conclusion: Compared with a completely random QP algorithm, DQ is less sensitive to data.

 

Base error sensitivity

 

Note: 1) The above table is a comparison of the sensitivity of each algorithm to the base error (due to the error caused by the base estimate of the database optimizer), the smaller the value, the better. 2) N is to randomly select N relationships and multiply their base by a random coefficient. 3) For DQ, it will use noisy data for training, and then use real base for testing. For methods such as dynamic programming, the base will be directly used for dynamic programming. 4) The value in the table is the logarithm of the number of physical IOs.

Conclusion: It can be seen that DQ is not more sensitive to base errors than KBZ.

 

Abatement research

 

Note: 1) The above table is the effect  of subtracting some features from the model input features ( f G + f c ), such as removing the entire f G  or  removing the selection rate information from f G on the query.

Conclusion: It can be seen that the absence of these features will lead to a greater Loss value, so these features help to improve the performance of DQ.

 

in conclusion

The use of reinforcement learning for connection order selection optimization has great advantages in large connection query scenarios, which can obtain better query plans at a faster speed, and as the model is deployed in actual applications, the model itself has an adaptive optimization capability It also makes more and more benefits over time.

But the current work also has some shortcomings: if this article only considers inner joins based on relational foreign keys, it needs to be extended to more general scenarios.

Guess you like

Origin blog.csdn.net/Fei20140908/article/details/109668569