MySQL this bug, pit many people?

1. Description of the problem

Recently, there is an important line of customers Mysql table after upgrading from 5.6 Dao 5.7, "Duplicate key" error occurred during insertion of the master, but all appear on both the primary and RO instance.

In one table, for example, before the migration by "show create table" command to view the auto increment id as 1,758,609, 1,758,598 after migration into an actual migration of the new table created by selecting the maximum value of the auto-increment is 1,758,609 max.

User uses Innodb engine, and operation and maintenance, according to introduce students before encountered similar problems, reboot to return to normal.

2, the kernel troubleshooting

Since the user feedback on the normal access 5.6 to 5.7 after the handover error. So, first of all I suspected 5.7 kernel out of the question, so the first reaction is to search from the official bug list to see if there are similar problems, avoid duplication repairer. After a search, they found the official had a similar bug, briefly outline the bug.

Background 1

Innodb engine auto increment related parameters and data structures

Main parameters include: innodb_autoinc_lock_mode locking mode for controlling acquisition from value-added, auto_increment_increment, auto_increment_offset increasing interval and for controlling auto-increment the starting offset.

Structure mainly includes: data dictionary structure, save the current auto increment value and the protection key entire table; transaction structure, the number of lines within a transaction process is stored; Handler structure, save loop iteration Information Services internal multiple rows.

Background 2

mysql Innodb engine and access to autoincrement and modify processes

(1) data dictionary structure (dict_table_t) swapped out when changing the value stored for autoincrement and recovery. When swapping out the autoincrement save global mapping table, and then out of memory dict_table_t. In other fashionable to return to dict_table_t structure by global mapping lookup table. Related functions as dict_table_add_to_cache and dict_table_remove_from_cache_low.

(2) row_import, table truncate the process of updating autoincrement.

(3) handler for the first time open when the current table will query the largest auto-increment value and use value plus 1 data_dict_t structure to initialize the table column in the autoinc maximum value.

(4) insert process. Related to autoinc modified stack as follows:

ha_innobase::write_row:write_row的第三步中调用handler句柄中的update_auto_increment函数更新auto increment的值
    handler::update_auto_increment: 调用Innodb接口获取一个自增值,并根据当前的auto_increment相关变量的值调整获取的自增值;同时设置当前handler要处理的下一个自增列的值。
        ha_innobase::get_auto_increment:获取dict_tabel中的当前auto increment值,并根据全局参数更新下一个auto increment的值到数据字典中
            ha_innobase::dict_table_autoinc_initialize:更新auto increment的值,如果指定的值比当前的值大,则更新。
        handler::set_next_insert_id:设置当前事务中下一个要处理的行的自增列的值。复制代码

(5) update_row. For the "INSERT INTO t (c1, c2) VALUES (x, y) ON DUPLICATE KEY UPDATE" statement, whether unique row index column points to the existence of, we need to promote the value of auto increment.

Related code is as follows:

    if (error == DB_SUCCESS
        && table->next_number_field
        && new_row == table->record[0]
        && thd_sql_command(m_user_thd) == SQLCOM_INSERT
        && trx->duplicates)  {
        ulonglong    auto_inc;
                ……
        auto_inc = table->next_number_field->val_int();
        auto_inc = innobase_next_autoinc(auto_inc, 1, increment, offset, col_max_value);
            error = innobase_set_max_autoinc(auto_inc);
                ……
    }复制代码

From our point of view the actual business processes, our only mistake may involve insert and update process.

BUG 76872 / 88321: "InnoDB AUTO_INCREMENT produces same value twice"

(1) Overview bugs: When autoinc_lock_mode greater than 0 and auto_increment_increment greater than 1, immediately after the system restart multiple threads on the table insert operation generates "duplicate key" error.

(2) Analysis: set max (id) + 1 after the restart will autoincrement value of innodb. At this time, when first inserted, write_row process will call handler :: update_auto_increment to set autoinc relevant information.

Autoincrement first retrieve the current value (i.e., max (id) + 1) by ha_innobase :: get_auto_increment, and according to the following parameters to modify a autoincrement autoincrement value next_id. When auto_increment_increment is greater than 1, max (id) + 1 be no greater than next_id.

handler :: update_auto_increment after obtaining the value returned by engine layer in order to prevent possible when certain engine calculated from the value without taking into account the current auto increment parameters, according to the parameters will be re-calculated again from value-added of the current line, due to internal Innodb is considered globally since the value of the parameter, so handle layer Innodb returned increment id calculated also next_id, is about to be inserted into a row next_id increment id's.

a handler layer will end write_row autoincrement value when the value according to the following set next_id current row. If during write_row next autoincrement table has not been set, there is also another thread insertion process, then it gets to self-appreciation will also next_id. This resulted in duplicate.

(3) The solution: Consider the internal engine parameters obtained from additional global autoincrement, insert after the first restart since this thread gets the value of not max (id) + 1, but next_id, then a autoincrement set according to the following next_id value. Since this process is to protect the lock, then other threads get autoincrement when he does not get to duplicate values.

By the above analysis, the case where the bug only 0 and auto_increment_increment> 1 will occur at autoinc_lock_mode>. The actual online business these two parameters are set to 1, so you can rule out the possibility of the bug causing problems online.

3, site analysis and verification reproduce

Since the official bug failed to solve our problems, it would have to support themselves, from the phenomenon of error to begin the analysis.

(1) analysis of the law and autoincrement max id due to user table set ON UPDATE CURRENT_TIMESTAMP column, so you can put all the error of max id table, autoincrement and recently updated several records crawled out to see if there is anything law. Information crawl as follows:

file

At first glance, this error is very regular, update time this column is inserted or last modified time, with auto increment value and max id, the phenomenon is much like the last batch of transactions updated only increment id line, auto increment value is not updated.

Lenovo to introduce the official document of auto increment usage, update operations that can be updated only increment id but does not trigger auto increment propulsion. According to this idea, I try to reproduce the user's site. Reproduction method is as follows:

file

While binlog, we also see that there are update auto-increment operation. Figure:

file

However, as is ROW binlog format, we can not determine which is the core problem resulted from additional changes or the user's own update due. So we contacted the customer to confirm the results of the user is determined not updated auto-increment operation. Then the auto-increment in the end is how come?

(2) analyze the user's table and continue sql statement analysis, found that users had a total of three types of tables (hz_notice_stat_sharding, hz_notice_group_stat_sharding, hz_freeze_balance_sharding), three table has a primary key increment.

But both have appeared in front of autoinc wrong, except hz_freeze_balance_sharding table without error. Is the user two ways to access these tables are not the same?

Grab the user's sql statement, and sure enough, the first two tables are used in replace into operation, with a final table update operations. Is the problem due to replace into the statement?

Search official bug, I discovered a suspected bug.

bug #87861: “Replace into causes master/slave have different auto_increment offset values”

the reason:

(1) Mysql to replace into + insert statement is actually achieved by delete, but in the ROW binlog format, will update the record type binlog log. Insert statement will be updated simultaneously autoincrement, update does not.

(2) replace into delete + insert manner according to operation on the Master, autoincrement is normal. After copying to the slave ROW-based format, a slave machine operating in accordance with the update playback, only updates values ​​in a row increment key, not updated autoincrement. Thus where max (id) autoincrement will appear larger than on the slave machine. In this case the ROW binlog insert operation mode for recording all column values, and does not redistribute the self-energizing id during playback on a Slave, and therefore will not be given. But if the slave cut master, meet Insert operation "Duplicate key" error will occur.

(3) Since users are migrating from 5.6 to 5.7, and then insert directly on 5.7 equivalent of slave master cut, and therefore error.

4. Solution

The business side of the possible solutions:

(1) binlog format statement to or mixed

(2) 用Insert on duplicate key update代替replace into

Kernel side possible solutions:

(1) In ROW format if they replace into the statement, logevent statement format of the record, the original statement recorded binlog.

(2) In the ROW format statement logevent replace into a delete event is recorded and an insert event.

5, experience

(1) autoincrement autoinc_lock_mode of these two parameters and auto_increment_increment easily result in duplicate key, use of the process to try to avoid to modify dynamically.

(2) when confronted with the problem line, first of all should do on-site analysis, a clear failure scenario, the user's SQL statement, such as the scope of failure information, and information relating to the configuration examples, binlog even the instance data and so do a backup to prevent the loss of date. Accurate when the only way to match the scene looking for the official bug, if the official is not bug-related, but also through independent analysis of the existing trail.

Original link: cloud.tencent.com/developer/a...

Wen source network for study purposes, if infringement contact deleted.

I will be high-quality technical articles and lessons learned are gathered in my public No. [Java] in the circle.

For the convenience of everyone to learn, I put together a set of learning materials covering the Java virtual machine, spring framework, Java threads, data structures, design patterns, etc., free to love Java classmates! More learning exchange group, and more communication problems can progress faster ~

file

Guess you like

Origin juejin.im/post/5e8a931d51882573a033879a