An official bug solution that will be written into the MySQL source code

Author : Zhou Xinjing, graduated from Zhejiang University, is currently involved in CDB / CynosDB database kernel team TXSQL cloud database kernel development work, participated in a series of hot-line updates and performance optimization, and fixes multiple MySQL official bug.

1 background

InnoDB's adaptive hash index (Adpative Hash Index, hereinafter referred to as AHI) is an index structure built on the B-tree index structure to further reduce the BTree query cost.

When searching for a record in the B-tree, you need to descend from the root node to the leaf node, and you need to use binary search for positioning in each node. The improvement of AHI lies in that it builds a hash index on the row records of the leaves frequently accessed by the BTree index, so that when performing B-tree queries, it is possible to locate the record position on the leaf node through AHI, avoiding the B-tree root The descending process from node to leaf node reduces CPU overhead.

Since the construction of AHI is an adaptive and dynamic process, it needs to be cleaned or rebuilt according to the change of query load access mode, page swapping and elimination, etc., so in essence, AHI is also a cache, and the specific construction logic There are also many articles explaining on the Internet, which are not the focus of this article.

What this article is going to discuss is a little-known AHI construction lock conflict problem and corresponding optimization.

2 questions

When TXSQL version 5.7 was running sysbench, we observed a very interesting phenomenon.

The experimental environment is like this. Two 96-core machines are used as sysbench client and mysql server respectively. We configure the buffer pool size to be 200GB and generate a 120GB sysbench table.

As shown in the figure below, when we execute 128 concurrent oltp_read_only loads, we observe that QPS first has a rising slope. During this period, we found that the system has a large number of read IOs and is filling the buffer pool, which is a normal state.

Then a sharp drop suddenly appeared after 100s, and the system QPS began to rise slowly after 400s, until it reached a peak after 800s.

Use the perf tool to capture the state of the system at the time of the QPS drop, and the result is as follows:

Analyzing the stack, it can be found that a large amount of CPU is spent on the lock competition of the hash table of AHI.

After careful analysis, it is not difficult to find that at this time, most pages have basically not established AHI, and then multiple threads need to establish AHI indexes for the pages at the same time, and this construction process requires X locks on the same AHI hash table, which causes a lot of waiting .

From the perspective of QPS change, there can be an analysis as shown in the figure below:

3 optimization

We noticed that for a BTree index, its AHI construction occurs after the BTree leaf node is positioned, and the corresponding call chain is as follows:

btr_cur_search_to_nth_level→ btr_search_info_update→ btr_search_info_update_slow→ btr_search_build_page_hash_index

In btr_search_info_update_slow, a decision is made based on statistical information, and btr_search_build_page_hash_index is called to add the records of the current page to the hash table of AHI. This process requires an exclusive X lock of the hash table.

Since only one thread can modify the hash table, it is quite unwise for other concurrently constructing AHI threads to wait for the X lock of this hash table, because the block lives the critical path of the query, and only one thread is doing the construction work. .

At the same time, we noticed that AHI is only an auxiliary cache, and its practical BTree can also handle queries correctly.

So naturally, we can think of the following optimization methods:

1. When we analyze the BTree query path and decide to build an AHI index for a page, we first check whether the lock of the hash table corresponding to the BTree is held by other threads to write the lock;


2. If the write lock is held, we cancel the AHI index construction task for the page this time, wait for the next time the page is accessed again, and then try to build again, fallback to the normal BTree query.

4 specific implementation

From an implementation point of view, it is actually very simple: when btr_search_info_update_slow judges to establish an AHI index for a page of records based on statistical information, we add a conditional judgment: if there is currently a concurrent AHI construction thread that holds the X lock of the hash table, we Just return directly.

The code is only a few lines, roughly as follows:

Someone may worry that skipping this way will affect the correctness of the code?

The answer is no, because we have not cleared any statistical information about AHI on this page, but postponed the construction time, that is, postponed until the hash table lock conflict is not serious.

5 effect

After applying the above optimization, we re-execute the above experiment and get the following result chart:

Among them, the red line (enable AHI+Contention Avoidance optimization) is the result of our realization of the above optimization. After about 100s of warm-up, the performance is stable and the lock bottleneck disappears.

6 sources of inspiration

In fact, there is already a similar optimization on the original AHI query path:

Before executing AHI query in btr_cur_search_to_nth_level, if it is found that the hash table of AHI is locked by other thread X, directly fallback to BTree query.

The optimization considerations here are similar: instead of waiting for the X lock of the AHI hash table, it is better to go directly to the btree search. The cost is likely to be lower than waiting for the X lock, and the concurrency is higher.

7 summary

This optimization is currently online in the latest version of TXSQL5.7, which will effectively alleviate the lock competition problem created by AHI. Possible scenarios include but are not limited to: system startup, AHI switch just turned on, and active/standby switching, all pages have no AHI yet Record, high concurrency may lead to a lot of AHI construction work.

At the same time, we verified that this problem exists in the latest versions of official MySQL 5.7 and 8.0, so we have also contributed this optimization idea to the official, https://bugs.mysql.com/bug.php?id=100512 , Is currently being evaluated and I believe it will be integrated into the main line soon.

The mobile operation and maintenance applet is free for a limited time!

The mobile phone operation and maintenance applet-Tencent Cloud database is now online. From then on, you can view instance information, receive health reports, slow SQL analysis and check abnormalities in your mobile phone. You can finally go home without carrying your computer!

Go to the cloud and go to Tencent Cloud. The annual lowest price of Double Eleven is coming: MySQL high-availability version 1C2G is as low as 99 yuan/year! A gift package worth more than 11,000 yuan is waiting for you to receive. The gameplay is simple and straightforward. If you miss it, you will have to wait another year!

↓↓Click to go directly to the Double 11 venue~ 

[ Tencent programmer video account exchange salon event preview]

Saturday, November 21 at 2 p.m.  Shenzhen

If you want to participate, you can add WeChat: journeylife1900

(Remarks: video number)

Guess you like

Origin blog.csdn.net/Tencent_TEG/article/details/109733239