Alipay engineers how to get relational database "brain" - the query optimizer

Abstract: This article will understand OceanBase design ideas in terms of the query optimizer and after nearly a decade of time to extract the philosophy of engineering practice.

2509688-fdfb18e5cdb16c88.png

Foreword

The query optimizer is the core module of relational database systems, is important and difficult database kernel development, but also a measure of the maturity of the entire database system "litmus test."

Query optimization theory was born the following year ago over forty, academia and industry is already formed (System-R of Bottom-up and optimization framework Volcano / Cascade's Top-down optimization framework) a relatively perfect query optimization framework but around the core query optimization problem has remained unchanged - how to use limited resources as far as possible the system to select a "good" execution plan for the query.

In recent years, new storage structures (such as LSM storage structure) and the emergence of popular distributed database to further increase the complexity of query optimization, this article practical experience OceanBase database of past nearly ten years, and together we discuss query optimization challenges and solutions in practical application scenarios.

The query optimizer Introduction

SQL is a Structured Query Language, it only tells the database "what you want", but it does not tell the database "How to get" this result, this "how to get" process to the "brain" of the database query optimizer decided. In the database system, there is usually a query are many ways to get results, get each method is called a "Plan of Implementation." Given a SQL, query optimizer will first enumerate equivalent execution plans.

Second, the query optimizer will cost model based on statistics and execution plan for each compute a "price", the price here usually refers to the execution time or execution plan execution plan in the implementation of the system resources (CPU + IO + NETWORK) the footprint. Finally, the query optimizer chooses a "minimum cost" implementation plan in a number equivalent plan. The following figure shows the basic components of the query optimizer and execution process.

2509688-f9365079e297b5a8.png

Challenges faced by the query optimizer

Since the birth of the query optimizer has been the difficulty of the database, the challenges it faces mainly in the following three aspects:

A challenge: precise statistics and cost model

Statistics and query optimizer cost model is the base module, which is mainly responsible for calculating the cost to execute the plan. Accurate statistics and cost model has been a problem to be solved by the database system, the main reasons are as follows:

1, statistics: in the database system, statistical information gathering main two problems. First, the statistics are collected by sampling, so there must be a sampling error. Secondly, statistical information gathering is a certain lag, that is in a SQL query optimization when the statistical information it uses statistical information system before a moment.

2, selectivity and intermediate results calculated estimate: Choose rate calculation has been the difficulty of the database system, academia and industry have been selected to make the rate calculation becomes more accurate methods of research, such as dynamic sampling, multi-column histogram and other plans, but this problem has not been resolved, such as computing connection predicate selectivity is currently no good solution.

3, cost model: the current mainstream database systems are basically the cost of using a static model, such as static buffer hit ratio, static IO RT, but these values ​​are the changes in the system as the load varies. If you want a very accurate cost model, it is necessary to use a dynamic cost model.

Challenge: Massive space plan

Plan space complex queries is very large, and in many scenes, the optimizer can not even enumerate all the equivalent execution plans. The following figure shows the number of equivalent logic star query plan (the Cartesian product does not contain program logic), the optimizer actually get on the program orthogonal sub-spaces have a physical implementation of a cost-based optimization and rewriting the Distributed . In such a massive plan space, how to efficiently enumerate implementation plan has been a difficult query optimizer.

2509688-2eeb35779cd0614a.png

Three challenges: efficient program management mechanism

Program management and mechanism for sharing plan caching mechanism planned evolution mechanism.

1, the plan cache mechanism: Program cache is parameterization, optimization once / if the cache is always optimized, and may be divided into three kinds as shown in the plan cache method. Each plan caching method has its own advantages and disadvantages, different business needs would choose a different plan caching method. In ant / Ali lot of high concurrency, low latency business scenario, it will select a parametric optimization + + caching strategy, we need to solve the problem (parametric query optimization) different parameters corresponding to different programs, will become clear later discuss.

2509688-7c59ffdcf63a88c8.png

2, planned evolution mechanism: evolution refers to the plan for the new generation plans for verification, to ensure that the new plan will not cause performance rollback. In the database system, the new plan for some reason (such statistics are refreshed, schema version upgrade) was born all the time, and the optimizer for a variety of inaccurate statistics and cost model is not always a hundred percent guarantee that the new way the resulting plan is always the best, so we need a mechanism to ensure the evolution of a new generation of programs will not cause performance rollback.

OceanBase query optimizer engineering practice

Here we look at OceanBase challenges faced by the query optimizer based on how their business model and the characteristics of the framework to solve.

Statistics from the dimension and the cost model see, OceanBase invention is based on the selected access path group table memory structure LSM-TREE. From the perspective of spatial planning point of view, because OceanBase native is a distributed relational database system, a problem that is bound to face is distributed optimization program. From the perspective of program management perspective, OceanBase set of comprehensive program management mechanism.

1. Based on LSM - TREE access path selection table group of

Group table access path selection method refers to a method to optimize selection index, its essence is to evaluate the cost of each index and selecting a minimum of the cost index to access the database tables. For an index path, its price is mainly composed of two parts, the cost of scanning and indexing of the cost back to the table (if an index is not required for a query, back to the table, then there is no return to the price table).

In general, the cost index path depends on many factors, such as the number of scanning lines / back table, the number of columns of projection, the number of predicate like. To simplify our discussion, in the following analysis, we have the number of lines from this dimension to introduce the cost of the two parts.

  • The price index scan

    The price index scan with the scan line is proportional to the number, and the number of lines scanned predicate is determined by the part of the query, the index scan these predicates defined start and end positions. In theory, the more the number of lines scanned, the execution time will be longer. The price index scan is sequential IO.

  • The cost back to the table

    The cost back to the table with the number of rows back to the table is a positive correlation, while the number of lines back to the table is also determined by the predicate query, in theory, the more the number of rows back to the table, the execution time will be longer. Scan back to the table of random IO, so the price will be back to the table row is always higher than the price index sequential scan line.

In a traditional relational database, the number of rows and rows of scanning the index back to the table are calculated predicate selectivity obtained (or by some of the more advanced methods such as dynamic sampling) by the optimizer statistics are maintained.

As a simple example, a given joint index (a, b) and query predicates a> 1 and a <5 and b <5, then the predicate a> 1 and a <5 defines the index scan start and end positions, if the number of rows satisfying these two conditions have 1w row, then the cost of scanning the index line sequential scanning is 1w, if the predicate b <5 the selectivity is 0.5, then the cost of random scan table is 5k back row.

So the question is: the traditional method of calculating the number of lines and the cost of suitability based on LSM-TREE storage engine?

LSM-TREE data storage engine into two parts (as shown below), static data (baseline data) and dynamic data (incremental data). Wherein the static data is not modified, read-only, stored on disk; all increments modification operations (add, delete, change) is recorded in the dynamic data stored in the memory. Static data and incremental data regularly merged to form a new baseline data. In LSM-TREE storage engine, a query for operations which require static data and dynamic data combined to form the final query result.

2509688-ac8a2063ce2dc73b.png

In one example of LSM-TREE baseline data storage engine under consideration are deleted. In this figure, the baseline has 10w rows of data, incremental data maintained in this 10w rows of data deletion. In this scenario, the number of rows in this table are 0 rows in the traditional storage engines based Buffer-Pool, the scan will soon, that is to say the number of lines and the cost of matching. But in the LSM-TREE storage engine, scanning will be very slow (10w + 10w baseline data increased consolidated data), that is, the number of lines and the cost is not matched.

The reason for this is that the nature of the engine based on the storage of the LSM-TREE, the conventional number of lines based on the calculated dynamic sample rate and selection information is not sufficient to reflect the actual number of rows required during computational expense.

As a simple example, in a traditional relational database, we insert 1w line, and then delete 1k rows, the computational cost of time will be used 9k line to calculate, in the LSM-TREE scenario, if the front 1w line is in baseline data inside, then there will be an extra 1k memory line, when calculating the price we need to go with a 11k computing.

2509688-b39c0b48d3f75d9b.png

In order to address the concept of computing LSM-TREE storage engine and the number of rows in the table the true cost of the implementation of several inconsistent behavior, OceanBase proposed "logical line" and "physical lines", and the calculation method thereof. Wherein the logical line may be understood as the number of rows in the traditional sense, it is mainly used for physical row number of rows LSM-TREE characterize this storage engine when a real need to access computational cost.

Consider the example of the figure above, in the drawing, a logical row is row 0, and the physical row lines is 20w. To start a given index scan / end position, for baseline data because OceanBase baseline data maintenance of block-level statistics, it can be quickly calculated the number of rows baseline. For incremental data, obtained by sampling the dynamic add / delete / diverted number, the final combination of both can be obtained logical and physical row lines. The following figure shows a method of calculating the logical and physical row lines OceanBase.

2509688-23947f89bd3e2633.png

Compared to the traditional method of access path group table, OceanBase physical and logical lines based on the line method has the following two advantages:

One advantage: real-time statistics

Since taking into account the incremental data and baseline data, the equivalent of statistical information in real time, and statistics gathering traditional method is to have some lag (usually a table of add / delete / modify operations to a certain extent, It will trigger re-gather statistical information).

Two advantages: Solving the predicate dependency on the indexed column

Consider an index (a, b) and query a = 1 and b = 1, a problem with conventional methods in the calculation of the selectivity of the query condition when bound to consider a and b if a dependency exists, then using the corresponding a method (multi-column sampling histogram or dynamic) to improve the accuracy of calculation of selectivity. OceanBase current estimates will default scenario method can resolve dependencies of a and b.

2.OceanBase Distributed Scheduling

OceanBase native had distributed the property, then it must be a problem to be solved is distributed optimization program. Many people think that the distributed plan optimization is difficult, no start, then distributed optimization program with local optimization in the end what is the difference? Distributed optimization plan is to modify the existing query optimization framework to do optimization?

In my opinion, the existing query optimization framework is fully capable of handling distributed optimization program, but the distributed plan optimization will greatly increase the search space program, the main reasons are as follows:

1, in a distributed scene, distributed algorithm is chosen operator, and distributed algorithm Spaces operator space than the local algorithm to be much larger. The following figure shows a distributed Hash Join algorithm in a distributed scenario can be selected.

2509688-e0253448386fd4f8.png

2, in the distributed scenario, in addition to the sequence of the physical property, also increases the physical properties of the partition information. Partition information includes physical information about how to partition and partition. Partition information which determines the operator distributed algorithms can be used.

3, in a distributed scenario, partition cutting / parallelism optimization / partition (between) parallel and other factors will increase the complexity of distributed optimization program.

OceanBase the current two-stage way to do distributed optimization. In the first stage, OceanBase generate an optimal plan based on local tables are all local assumptions. In the second stage, OceanBase started parallel optimization, heuristic rules to select a distributed algorithm locally optimal plan operator. The following figure shows an example OceanBase distributed two-phase plan.

2509688-407ab216e4041715.png
2509688-cfbcd0c7a0b6f9ae.png

OceanBase two-stage distributed optimization program optimization method can reduce space, reduce the complexity of optimization, but because in the first stage of optimization of distributed information without considering the child's operator, it may lead to sub-optimal plan generation. OceanBase currently being distributed to achieve a stage plan optimization:

1, the dynamic programming algorithm in the Bottom-up System-R, the enumeration of all operators all distributed implementation and maintenance of physical properties of the operator.

2, the dynamic programming algorithm Bottom-up of the System-R, for each subset of enumeration, retaining the least costly available / Interesting order / there Interesting zoning plan.

Distributed stage plan optimization plan could result in space is growing rapidly, it is necessary to have some Pruning rules to reduce the space program or the same program with local optimization in a larger space when using a genetic algorithm or heuristic rules to resolve this problem.

3.OceanBase program management mechanism

OceanBase based on Ant / Ali real business scenarios, to build a sound plan and the plan caching mechanism evolved mechanisms.

OceanBase plan caching mechanism

As shown below, OceanBase parametric scheme used in the way cache. Here involves two questions: Why did you choose and why choose parameterized cache?

1, parameterization: In ant / Ali lot of real-world business scenarios for each parameter cache a plan is unrealistic. Consider a scene to query the order information in accordance with the order number in ant / Ali high concurrency scenarios, for each order number into a plan is impractical and does not need, because with the order number of the index can solve all the parameters of the scene.

2, plan cache: Cache is planned for performance reasons, for ants / Ali lot of real business scenarios, if the hit program, the performance of a query will be hundreds of us, but if the plan does not hit, then the performance will probably a few ms. For high concurrency, low latency scenarios, this performance advantage is very important.

2509688-90e071f4a1b7b419.png

OceanBase way to use parameterized plan cache, but in many ant real business scenarios, using all the parameters are not the best choice with a plan. Consider a domain ant merchant business scenarios, this scenario dimension to the merchant to record the sum of each billing information, the merchant can do some analysis and query based on that information. This scenario would certainly there is a problem the size of the account, as shown below, Taobao may contribute 50% of the orders, LV probably contributed only 0.1% of the orders. Consider the query "statistics a merchant's sales over the past year", if Taobao such a large group of US businesses, the main direct table scan would be a reasonable plan for LV This small businesses, then take the index would be a reasonable s plan.

2509688-5d8cf83694031efa.png

To solve the problem of the different parameters corresponding to different programs, OceanBase for adaptive matching program as shown below. This method histogram and feedback to monitor the implementation of the plan for each cache if there are problems require different parameters corresponding to different programs. When present, the adaptive matching program will be selected to achieve the whole of the space into a plurality of spaces program (a program corresponding to each space) objects are achieved by gradual merging spatial selectivity.

2509688-b20f81729bfe5228.png

OceanBase plan evolved mechanisms

In many ants / Ali high concurrency, low latency business scenarios, OceanBase must ensure that the new generation of programs will not lead to performance back. The following figure shows OceanBase the evolution of the new plan. Unlike traditional database system uses evolution of regular tasks and background processes manner, OceanBase will use real traffic to evolution, such a benefit is timely updated comparative advantages of the program. For example, when a new business a better index, a traditional database systems do not immediately use the index, you need to use in order to verify the evolution after evolution timed task starts, but OceanBase timely use of the program.

2509688-c09fa7d339621a39.png

to sum up

Achieve OceanBase query optimizer based on their own business scenarios and architecture features, such as LSM-TREE storage structure, Share-Nothing distributed architecture and large-scale operation and maintenance of stability. OceanBase to build OLTP and OLAP based on the integration of the query optimizer. From OLTP perspective, we base ourselves on ants / Ali real business scenarios, carrying a perfect business needs. From OLAP point of view, we have standard commercial databases, to further polish our HTAP optimization capabilities.



Author: his life chestnuts

Read the original

This article Yunqi community original content may not be reproduced without permission.

Guess you like

Origin blog.csdn.net/weixin_34206899/article/details/90920467