To solve the analysis bottleneck of MySQL, OceanBase has done these things

About the Author

Dragon Burn

OceanBase Solutions Architect. This article is compiled based on a live speech at the OB Cloud offline tour salon Shenzhen station on July 15 (an excerpt).

When using the database on a daily basis, in addition to OLTP business scenarios, there will also be some additional data analysis type requirements. However, MySQL's analysis capabilities are weak. When you type a complex SQL, you often don't get the result until it times out. SQL that is partial to analysis can easily become slow SQL, affecting the performance of the entire database. If you build an analysis instance separately, it will increase the operation and maintenance costs, migration costs, and labor costs, and it is difficult to guarantee stability.


OceanBase has an enterprise-level optimizer, accurate information statistics and reliable CBO capabilities, and supports vectorization engines and parallel execution engines. At the same time, OceanBase's HTAP capability can use the same data and the same engine to help enterprises meet business needs at both the TP and AP levels.

picture

HTAP hybrid transaction and real-time analytical processing

OceanBase's HTAP capability uses one system to carry both OLTP and OLAP business models, which can help enterprises implement some complex correlation query, aggregation query, and sorting query scenarios. Operation and maintenance personnel do not need to maintain a separate set of analytical database products. This real-time analysis capability is mainly based on OceanBase's enterprise-level optimizer, underlying storage optimization, parallel execution engine and vectorization engine, and is implemented all in one using one system and one data.


OceanBase provides multiple resource isolation methods:

1. Isolation capability between tenants, that is, dividing TP and AP services into two tenants to carry them respectively.

2. The slave replica of the cluster can be used under the same tenant to host analysis requests. OceanBase provides read-only addresses, allowing some analysis scenarios to use weakly consistent reads to obtain results, eliminating the pressure on the primary copy.

3. Fine-grained resource isolation based on different SQL or users under the same tenant. OceanBase has capabilities similar to Oracle resource manager and currently supports CPU and IO isolation.  

picture

OceanBase HTAP architecture

OceanBase's HTAP architecture is shown in the figure below. In terms of resource isolation, OceanBase can perform tenant isolation of APs and TPs, as well as user-level or even SQL-level isolation. Through a SQL engine based on cost model, reliable statistical information, efficient query optimization and query rewriting, OceanBase transforms complex SQL into a pattern that can be efficiently executed, making full use of the underlying stand-alone engine, vectorization engine and parallel execution engine to realize the entire HTAP capability . The storage engine supports row storage, column storage and mixed row and column storage. 

picture

picture

Parallel execution

When a carriage cannot be pulled by one horse, multiple horses must be used to pull it. The optimizer uses the coordinator QC to split and schedule a query, that is, split a larger task into multiple small tasks, start multiple threads to process these small tasks in parallel, and finally integrate them to provide the user with the result data. Currently, the types of parallelism supported by OceanBase include parallel query, parallel DDL and parallel DML. There are several ways to enable parallel execution:

1. Use something like oracle parallel hint to specify it at the SQL level.

2. Specify the parallelism of a table when creating the table

3. Parallel mode will be automatically enabled for partitioned tables

4. Use OceanBase 4.0 and later versions to support autoDOP, which automatically senses the query type, turns on parallel mode and automatically adjusts the degree of parallelism.

picture

vectorization engine

At the storage level, the internal database micro-blocks are column storage, and the row storage is between micro-blocks. The data is projected, filtered and scanned, and returned row-by-row on the traditional model, or in batches on the vectorized engine. OceanBase implements vectorization based on the following methods:

1. Batch interface and efficient memory layout

2. SIMD acceleration and prefetch acceleration

3. Cache-aware algorithm
4. Coding-based Filter pressure calculation

picture

picture

HTAP helps run batch business

Taking the AP scenario in the figure below as an example, the task is divided into three steps.

Step 1: Insert multiple table association query results into temporary table TMP1;

Step 2: Insert the correlation and aggregation query results of temporary tables TMP1 and E tables into temporary table TMP2;

Step 3: Insert the association results of temporary table TMP2 and table F/table G into the final table.

Each step produces a large result set (1 billion INSERT INTO SELECT operations). There are two main challenges in this process. The first is that the amount of reading and writing is very large. The second is that if the statistical information of the temporary table is not collected in time, it will cause great difficulty for the optimizer to select the optimal execution plan.

OceanBase has already collected statistical information during batch writing, so it can generate optimal plans based on accurate statistical information and obtain analysis results faster. 

picture

Guess you like

Origin blog.csdn.net/OceanBaseGFBK/article/details/132672207