PieCloudDB Database's new generation optimizer "Daqi": designed for cloud-native and distributed scenarios

Recently, the PostgreSQL China Technology Conference kicked off in Hangzhou. As an annual event in the field of PostgreSQL technology, PostgreSQL China Technology Conference has been held for 12 consecutive years, providing an open platform for cooperation, sharing and mutual assistance for all small partners who love database technology. And this conference, around the themes of safety and reliability, breakthrough, evolution, etc., has convened many industry experts and technical experts to discuss technologies and collide ideas here.  

As a Day-1 quasi-unicorn in the field of cloud data and data computing in China, Guo Feng, a technical expert of Tuoshupai, was invited to attend this conference and deliver a keynote speech. 

In his speech, Guo Feng introduced a new optimizer built by PieCloudDB Database - "Daqi". The name "Daqi" originated from a game popular among young people: "Red Dead Redemption". The mantra of an NPC character named "Daqi" in the game is "I have a plan", which "coincidentally coincides with" the main function of the optimizer.

The optimizer is an important component in the database system, which is "responsible" for parsing, optimizing, and generating execution plans for user query requests, so that query results can be returned at the fastest speed and with the highest efficiency. The optimizer achieves the purpose of optimizing query performance by generating an optimal query execution plan. The quality of the execution plan often results in hundreds of performance differences. The optimizer "Daqi" built by PieCloudDB has implemented a large number of optimization features, and as the "think tank" of the database system, it helps PieCloudDB improve performance.

Similar to PostgreSQL, the query optimization process of PieCloudDB is generally divided into four stages: preprocessing stage, scanning/joining optimization stage, optimization stage other than scanning/joining, and post-processing stage. "Daqi" has made a lot of optimizations in these four processing stages. 

In the preprocessing stage, the optimizer "Daqi" will convert the query tree into a simpler and more efficient equation through logically equivalent changes. Since some statistical information has not been obtained to help calculate the cost information, at this stage, some proven rules are generally used to perform operations such as distribution constraints, simplified expressions and connection trees to eliminate useless connections. 

  • Convert IN, EXISTS and other types of subqueries into semi-joins 

PieCloudDB divides subqueries into sublinks (SubLink) and subqueries (SubQuery) based on the location and role of subqueries. Because sub-connections appear in constraints such as WHERE/ON, they are often accompanied by predicate verbs such as ANY/ALL/EXISTS. If the executor handles it in the way of sub-connection, it will affect the query efficiency. And because of the generation of sub-plans (SubPlan), the optimization space is limited. Therefore, in the preprocessing stage of query optimization, PieCloudDB will convert sub-joins into semi-join or anti-join as much as possible, so as to have more room for optimization. 

Take the following SQL query as an example:

SELECT … FROM foo WHERE EXISTS (SELECT 1 FROM bar WHERE foo.a = bar.c);

Among them, EXISTS is a subquery, and PieCloudDB will turn it into a Semi-Join in the preprocessing stage:​​​​

SELECT ... FROM foo *SEMI JOIN* bar ON foo.a = bar.c;
  • boost subquery 

The clauses that appear after the FROM keyword are subquery statements. If optimization is not performed, when executing this type of sub-query, a separate plan will be made first, a sub-query scan will be generated, and then connected with the parent query, often the optimal solution cannot be found, resulting in a larger query cost. 

Take the following example as an example. If no optimization is done, bar and baz will be JOIN connected first. Since there is no connection condition, bar and baz will be Cartesian product directly, and then JOIN connection with foo outside:

​​​​​​​​​​​​​​SELECT * FROM foo JOIN (SELECT bar.c FROM bar JOIN baz ON TRUE) AS sub ON foo.a = sub.c;

After the upgrade, bar and baz are on the same level, and the join of foo and bar can be generated first, and then join with baz, so that the cost is lower:​​​​​​​​

SELECT * FROM foo JOIN (bar JOIN baz ON TRUE) ON foo.a = bar.c;
  • Convert outer join to inner join/anti join 

Outer joins have many restrictions on predicate pushdown and connection order search, so "Daqi" will try to convert outer joins into inner joins (Inner Join) or anti joins (Anti Join) in the preprocessing stage. 

In the following SQL statement, the result of LEFT JOIN will produce some NULL-terminated tuples, and the equal sign in the WHERE condition is a strict constraint, which means that if the input is NULL, the output must also be NULL or FALSE. If column on bar is NULL, then bar.d = 42 filters out to FALSE. That is to say, the NULL-filled tuple generated by LEFT JOIN will be filtered out by the WHERE condition. At this time, LEFT JOIN becomes semantically INNER JOIN. ​​​​​​​​

SELECT ... FROM foo LEFT JOIN bar ON (...) WHERE bar.d = 42;

In this case, PieCloudDB "Daqi" optimizer automatically recognizes such queries, and takes advantage of such optimization opportunities to convert outer joins into inner joins. ​​​​​​​​

SELECT ... FROM foo INNER JOIN bar ON (...) WHERE bar.d = 42; 

For outer joins in some cases, the PieCloudDB optimizer will convert outer joins into anti-joins in the preprocessing stage. Take the following SQL statement as an example:​​​​​​​​

SELECT * FROM foo LEFT JOIN bar ON foo.a = bar.c WHERE bar.c IS NULL; 

Same as the previous example, LEFT JOIN will also generate a lot of NULL-filled tuple result sets. At this time, the WHERE condition only takes bar.c as NULL results. At this time, semantically, LEFT JOIN is an anti-join. PieCloudDB will automatically detect this optimization opportunity in the preprocessing stage, and convert the outer connection to the inner connection:​​​​​​​​

SELECT * FROM foo *ANTI JOIN* bar on foo.a = bar.c; 

In addition to these optimizations, in the preprocessing stage, the optimizer "Daqi" also implements multiple optimizations, including: 

  • distribution constraints

  • build equivalence class 

  • Collect external connection information 

  • Eliminate useless connections 

  • simplified expression 

      wait…

The scan/join optimization phase is arguably the most complex phase of optimizer processing. At this stage, the optimizer "Daqi" will be driven by cost to process the FROM and WHERE parts of the query statement, and will also take into account the information of ORDER BY. 

At this stage, the processing of the optimizer "Daqi" can be mainly divided into two steps. First, a scan path is generated for the base table, and the cost of the scan path and the size of the result set are calculated to obtain the cost of subsequent join operations. In the second step, "Daqi" will search the entire connection sequence space to generate the optimal connection path for the connection operation. The complexity of this step is very high (n! level), PieCloudDB uses two algorithms of dynamic programming and genetic algorithm to process, and selects the algorithm according to the GUC value. If an outer join is involved in the query statement, considering the restriction of the outer join on the connection order, the order of the connection cannot be switched at will like the inner join, which will increase the complexity of this step. 

Compared with the second stage, although this stage deals with many things, its complexity is relatively low. At this stage, "Daqi" will first process Group By, aggregation, window function, DISTINCT, then process set operations, and finally process ORDER BY. Each of the above operations will generate one or more paths, and "Daqi" will filter these paths according to the cost, and add LockRows, Limit, and ModifyTable to the filtered paths. 

After the first three stages, "Daqi" has generated a rough query plan. In the post-processing stage, "Daqi" will convert the selected optimal path into a query plan, and make some adjustments to the optimal plan. 

In addition to the above-mentioned optimization features, PieCloudDB optimizer "Daqi" has made a lot of optimizations and improvements to complex query scenarios, and realized many high-level distributed and cloud-native features

On the basis of the above optimization features, the "Daqi" optimizer has expanded many optimization features for distribution. First of all, "Daqi" introduced the concept of Motion, so that data can be moved between different execution nodes (Executor). Using Motion, "Daqi" can generate distributed query plans. These query plans are divided into smaller units and distributed to different execution nodes for parallel execution. With parallel execution, many complex queries can be further optimized. For example, for aggregation operations, taking advantage of distribution, performance can be improved through multi-stage aggregation between execution nodes. 

Take this query as an example to explain multi-stage aggregation. For this SQL query, "Daqi" will generate such a query plan:

Due to the existence of a deduplication operation, "Daqi" will perform three-stage aggregation. In the first stage, PieCloudDB will first perform a local aggregation on each execution node with a and b as the group key. Here A partial deduplication operation is completed. Then, use Motion to do a reshuffle operation, and then perform an aggregation operation at this time to complete the global deduplication. Finally, complete the final aggregation operation according to the group key to get the query result. 

Because PieCloudDB's storage engine "Jianmo" adopts the design of object storage, combined with this design, PieCloudDB optimizer "Daqi" realizes more advanced optimizations, including aggregation pushdown, block skipping, precomputation, etc. Here is an introduction to aggregation and push-down. Regarding other cloud-native features of "Daqi", please pay attention to other content that will be launched one after another. 

PieCloudDB implements Aggregate Pushdown, which can effectively reduce data transmission and processing in the query execution plan and improve query efficiency . In analytical scenarios, aggregation operations such as SUM, AVG, MAX, and MIN are common operations that can be used to aggregate data in database tables. When most data warehouses process aggregation operations, they usually need to complete table scans and join operations first, and then calculate aggregate functions. In the case of a very large amount of data, such query performance will be relatively low. 

The aggregation pushdown optimization strategy implemented by the PieCloudDB optimizer "Daqi" can greatly reduce the amount of data that needs to be processed by the connection operation by pushing the aggregation operation down to execute before the connection operation. After testing, in some cases, The performance will be improved by hundreds or even thousands of times. 

Take the following SQL query as an example. This query needs to join the t1 table and the t2 table. On this basis, group by t1.a to obtain the average value of t2.c. In the case of no aggregation pushdown optimization , the connection operation of t1 and t2 will be completed first, and then the aggregation operation will be performed according to the grouping of t1.a. At this time, if the t1 and t2 tables are both large, the cost of the join operation is very high, which will have a certain impact on performance . Under the optimization of aggregation and pushdown , the aggregation operation will be performed before the connection. At this time, if t2.b is very aggregated, the amount of data will be reduced a lot. If the connection operation is performed at this time, the performance will be greatly improved .

The query optimizer is one of the most important and complex components of a database system. As a cloud-native virtual data warehouse, PieCloudDB will continue to polish the optimizer "Daqi", and continuously drive performance improvement on the premise of ensuring the efficient and stable operation of the database system. 

On March 14, the PieCloudDB "Cloud on Cloud" version based on the new generation of cloud-native data warehouse virtualization was officially released. You are welcome to log in to  www.openpie.com  for a free trial. 

Guess you like

Origin blog.csdn.net/OpenPie/article/details/129752091