Online non-locking table change fields and indexes for billion-level large tables

Insert picture description here

background

In your daily work, you often need to make changes to the table structure of the database, which generally involve ALTER operations such as adding and deleting fields and modifying field attributes. However, in the case of large tables, especially tens of millions and billions of large tables, if they are not handled properly. These operations often lead to huge hidden dangers of locking tables, especially in the production environment. Once the table structure is changed for a long time, the table will be locked for a long time, which will cause the data generated by the user to not be changed to the table normally for a long time, which will lead to The service function is abnormal and the result will be catastrophic.

Generally, to implement this Alter type of change, we may have the following ideas:

1. Stop the server and make changes to the table structure during the period of the stop, which can naturally prevent the user from being affected. However, many scenes are not allowed to stop serving. And if the amount of data in the table reaches hundreds of millions, it may take more than ten hours or even longer to stop the service, which is extremely unrealistic;

2. Execute in the early morning, and make changes in a time period with fewer users to minimize the impact on users. But if there is a lock table, if there are users using the service, the service will be unavailable;

3. Use to change the table, but the disadvantage is that during the period of copying data to the new table, if the user performs an update or delete operation during this period, and the data occurs in the part that has been copied, then this part of the data will not be sensed, resulting in loss The user's operating data is too risky;

4. The disadvantage of using stored procedures is that it takes a long time to execute and may affect the user's DDL operations. Because in order to prevent too many data rows from being locked each time the loop is modified, we need to control the number of rows of data that are updated each time, and the granularity cannot be too large, otherwise it is very likely that the data rows that the user is operating will be locked.

So, for the above actual needs, is there no good tool to solve our pain points? In fact, in the industry, there is a relatively mature tool. For large watch scenarios, Alter changes can be made online without the risk of locking the watch. In addition, it has some other advantages, let's start exploring.

1. What is pt-osc

pt-online-schema-change is a member of Percona-toolkit. By improving the native ddl, it can modify the table structure online without locking the table. In Percona's official website, regarding the pt-osc tool, it also specifically mentioned that the ALTER table does not appear to be locked.

How does the ps-osc tool solve the problem of avoiding table locks and sensing user update and delete actions mentioned above?

The main steps of pt-osc are as follows:

1. Create a new table exactly the same as the original table, naming it as'_official table name_new';

2. Use the alter statement to change the content to be changed on the newly created new table, avoiding the alter operation on the original table;

3. Create 3 triggers in the original table, insert, update, and delete, which are mainly used for copying data from the original table to the new table. If the user has a DDL operation, the trigger can detect the DDL operation that occurred during this period. Data is also written into the new table to ensure that the data in the new table is up to date and will not lose the user’s new operating data;

4. Copy data to the new table by block, and the copy process holds S lock on the data row;

5. Rename, rename the original table to the old table, named "_official table name_old", rename the new table to the official table, you can decide whether to delete the old table after the execution is completed through configuration;

6. Delete 3 triggers;

2. Installation of pt-osc

Installation steps in linux system:

--下载安装包
wget  http://szxge1-sw.artifactory.cd-cloud-artifact.tools.huawei.com/artifactory/CommonComponent/common/tool/percona-toolkit-3.1.0.tar.gz
--解压安装包
tar -zxvf percona-toolkit-3.1.0.tar.gz
--安装依赖环境
yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker
yum -y install perl-Digest-MD5
cd percona-toolkit-3.1.0
perl Makefile.PL
--编译
make
make install
yum install mariadb
--安装Mysql
yum install perl-DBD-MySQL

Three, the use of pt-osc

The pt-osc tool is very simple to use. You can directly enter the pt-osc format command on the Linux command line to execute it directly.

Take the example of adding a field named MARK to Mysql database:

pt-online-schema-change --user="root" --password="*****" --host="数据库IP" --port=3306 --alter "ADD COLUMN MARK TINYINT NULL DEFAULT 1 COMMENT 'mark source region is 1';" D=my_test,t=t_test --no-drop-old-table --execute --print --no-check-replication-filters --charset=utf8 --no-check-unique-key-change --max-load="Threads_running=100" --critical-load="Threads_running=300" --recursion-method=none;

In the above statement:

1. User and password are the user name and password for the database to perform the change operation respectively, which require high authority;

2. Host is the IP address of the database;

3. Port is the port number of the database;

4. A specific alter statement is followed by alter;

5. D is the database name;

6. t is the name of the table to be changed;

7, no-drop-old-table is not to delete

8. charset, character set, use utf8;

9. Max-load, when copying data, the tool will monitor the number of threads running in the database. If it is greater than the configured Threads_running value, the copy will be suspended until it is less than this value. In this way, it prevents great pressure on the database and affects the normal use of existing network services;

10. Critical-load, the default is 50. After each block, check SHOW GLOBAL STATUS. The difference from max-load is that if the load is too high, it will stop directly instead of suspending. You can configure the threshold according to your own database situation;

Note: In the change statement following -alter, the column name cannot be added with symbols, otherwise an error will be reported. For example, -alter "ADD COLUMN MARK TINYINT NULL DEFAULT 1 COMMENT'mark source region is 1';", if a symbol is added to the MARK field, an error will occur. The `symbol after COMMENT has no effect.

The following is the information printed out when using the pt-osc tool to actually execute a job. For security reasons, some log information is hidden and ignored.

Insert picture description here

[root@ttt ~]#  `pt-online-schema-change --user="root" --password="*****" --host="数据库IP" --port=3306 --alter "ADD COLUMN MARK TINYINT NULL DEFAULT 1 COMMENT 'mark source region is 1';" D=my_test,t=t_test --no-drop-old-table --execute --print --no-check-replication-filters --charset=utf8 --no-check-unique-key-change --max-load="Threads_running=100" --critical-load="Threads_running=300" --recursion-method=none;`
No slaves found.  See --recursion-method if host EulerOS-BaseTemplate has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
analyze_table, 10, 1
copy_rows, 10, 0.25
create_triggers, 10, 1
drop_triggers, 10, 1
swap_tables, 10, 1
update_foreign_keys, 10, 1
Altering `my_test`.`t_test`...
Creating new table...
CREATE TABLE `my_test`.`_t_test_new` (
      `ID` int(11) NOT NULL AUTO_INCREMENT COMMENT '递增ID',
      .............建表语句数据................
Created new table my_test._t_test_new OK.
Altering new table...
ALTER TABLE `my_test`.`_t_test_new` ADD COLUMN MARK TINYINT NULL DEFAULT 1 COMMENT 'mark source region is 1';
Altered `my_test`.`_t_test_new` OK.
2020-10-14T11:14:48 Creating triggers...
2020-10-14T11:14:48 Created triggers OK.
2020-10-14T11:14:48 Copying approximately 346697 rows...
INSERT LOW_PRIORITY IGNORE INTO `my_test`.`_t_test_new` (`id`, ..建表语句信息.... FROM `my_test`.`_t_test_new` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) AND ((`id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 31340 copy nibble*/
SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `my_test`.`t_test` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) ORDER BY `id` LIMIT ?, 2 /*next chunk boundary*/
2020-10-14T11:14:53 Copied rows OK.
2020-10-14T11:14:53 Analyzing new table...
2020-10-14T11:14:53 Swapping tables...
RENAME TABLE `my_test`.`t_test` TO `my_test`.`_t_test_old`, `my_test`.`_t_test_new` TO `my_test`.`t_test`
2020-10-14T11:14:53 Swapped original and new tables OK.
Not dropping old table because --no-drop-old-table was specified.
2020-10-14T11:14:53 Dropping triggers...
DROP TRIGGER IF EXISTS `my_test`.`pt_osc_my_test_t_test_del`
DROP TRIGGER IF EXISTS `my_test`.`pt_osc_my_test_t_test_upd`
DROP TRIGGER IF EXISTS `my_test`.`pt_osc_my_test_t_test_ins`
2020-10-14T11:14:54 Dropped triggers OK.
Successfully altered `my_test`.`t_test`.

Fourth, performance comparison

A lot of the advantages and good characteristics of pt-osc have been introduced above. So what is the actual effect? In the test environment, a special test was made to let everyone have a more intuitive experience.

In the test library, a large table with 16 million data is prepared. The goal is to add a field to the large table, and use stored procedures and pt-osc tools to test.

4.1 Using stored procedures

First use the stored procedure to test, in order to prevent locking the table, only update 200 rows each time. The entire change takes 90 minutes from start to finish. In fact, during the execution of the stored procedure, if the user happens to be also operating the data row that the stored procedure is changing in the DDL, the user's data may be locked and the user cannot change successfully.

4.2 Use pt-osc tool

pt-osc takes about 7 minutes from the start of execution to the completion of the change, which is very fast. In the process of execution, the service of the test environment connects to the database and executes multiple tasks that will operate the table. During the entire process, the tasks can be executed normally without exception.

5 Conclusion

The above-mentioned advantages of ps-osc can gracefully help us implement changes under the requirements of non-stop service in the live network environment, and ensure that the database will not be affected by table locks, overloads, etc. during the change, thereby ensuring that the business can be Normal operation.

Insert picture description here

Guess you like

Origin blog.csdn.net/liuxingjiaoyu/article/details/112916561