pt-online-schema-change use exception handling and precautions

pt-online-schema-change recently used pt-online-schema-change to do online DDL for large online tables, and found several problems.
The statement I use is as follows:
pt-online-schema-change --user=root --password="xxxxx" --host=192.168.xx.xx D=M_xx,t=T_xx --alter "ADD Fxxxxx'" - -charset=utf8 --no-check-replication-filters --alter-foreign-keys-method=auto --recursion-method=none --print --execute
During execution, a library was interrupted, error The information is as follows:
Copying `table_01005`.`T_xxx`: 19% 16:30 remain
Copying `table_01005`.`T_xxx`: 21% 16:21 remain
Copying `table_01005`.`T_xxx`: 22% 16:58 remain
2014- 11-04T18:20:25 Dropping triggers...
DROP TRIGGER IF EXISTS `table_01005`.`pt_osc_table_01005_T_xxx_del`;
DROP TRIGGER IF EXISTS `table_01005`.`pt_osc_table_01005_T_xxx_upd`;
DROP TRIGGER IF EXISTS `table_01005`.`pt_osc_table_01005_T_xxx_ins`;
2014-11-04T18:20:28 Dropped triggers OK.
2014-11-04T18:20:28 Dropping new table...
DROP TABLE IF EXISTS `table_01005`.`_T_xxx_new`;
2014-11-04T18:20:30 Dropped new table OK.
`table_01005`.`T_xxx` was not altered.
2014-11-04T18:20:25 Error copying rows from `table_01005`.`T_xxx` to `table_01005`.`_T_xxx_new`: Threads_running=199 exceeds its critical threshold 50。

Then it is interrupted, this Threads_running is the number of active threads. According to this error message, I checked the documentation of percona,
simply checked the bug list according to the error message, and found a bug. According to the phenomenon, it is a bug of the tool itself. It also happens to be the same as the version we are using, so
I will upgrade the version, come back after finishing, and then reproduce it again, showing that it is not as simple as a bug. Or look at the error message again, suggesting that Threads_running exceeds the warning threshold. Since it is a threshold,
can it be set? Find the official website document and look carefully. There is a parameter to pay attention to, --critical-load. The documentation explains:
--critical-load
type: Array; default: Threads_running=50
Examine SHOW GLOBAL STATUS after every chunk, 
and abort if the load is too high. The option accepts a comma-separated list of MySQL status variables and thresholds. 
An optional =MAX_VALUE (or :MAX_VALUE) can follow each variable. If not given, 
the tool determines a threshold by examining the current value at startup and doubling it.
See --max-load for further details. These options work similarly, 
except that this option will abort the tool’s operation instead of pausing it,
 and the default value is computed differently if you specify no threshold. 
 The reason for this option is as a safety check in case the triggers on the
 original table add so much load to the server that it causes downtime. 
 There is probably no single value of Threads_running that is wrong for 
 every server, but a default of 50 seems likely to be unacceptably high
 for most servers, indicating that the operation should be canceled immediately.

The general meaning is as follows:
Before and after each chunk operation, the change of the specified state quantity will be counted according to show global status. The default is to count Thread_running.
The purpose is to be safe and prevent triggers on the original table from causing excessive load. This is also to prevent online DDL from affecting online.
If the set threshold is exceeded, the operation will be terminated and the online DDL will be interrupted. The exception that is prompted is the error message above.

和这个参数有的类似的还有一个--max-load :
--max-load
type: Array; default: Threads_running=25
Examine SHOW GLOBAL STATUS after every chunk, and pause if any status variables are higher than their thresholds. 
The option accepts a comma-separated list of MySQL status variables. An optional =MAX_VALUE (or :MAX_VALUE) can 
follow each variable. If not given, the tool determines a threshold by examining the current value and increasing it by 20%.

For example, if you want the tool to pause when Threads_connected gets too high, you can specify “Threads_connected”,
 and the tool will check the current value when it starts working and add 20% to that value. If the current value is 100, 
 then the tool will pause when Threads_connected exceeds 120, and resume working when it is below 120 again. If you want to
 specify an explicit threshold, such as 110, you can use either “Threads_connected:110” or “Threads_connected=110”.

The purpose of this option is to prevent the tool from adding too much load to the server. If the data-copy queries are 
intrusive, or if they cause lock waits, then other queries on the server will tend to block and queue. This will typically 
cause Threads_running to increase, and the tool can detect that by running SHOW GLOBAL STATUS immediately after each query finishes. 
If you specify a threshold for this variable, then you can instruct the tool to wait until queries are running normally again. This will 
not prevent queueing, however; it will only give the server a chance to recover from the queueing. If you notice queueing, it is best to decrease the chunk time.

--max-load 选项定义一个阀值,在每次chunk操作后,查看show global status状态值是否高于指定的阀值。该参数接受一个mysql status状态变量以及一个阀值,
如果没有给定阀值,则定义一个阀值为为高于当前值的20%。
注意这个参数不会像--critical-load终止操作,而只是暂停操作。当status值低于阀值时,则继续往下操作。
是暂停还是终止操作这是--max-load和--critical-load的差别。

参数值为列表形式,可以指定show global status出现的状态值。比如,Thread_connect 等等。
格式如下:--critical-load="Threads_running=200"  或者--critical-load="Threads_running:200"。

根据参数要求,修改了Threads_runnings的阀值,成功执行了。所以文档还是很重要的,在使用一个工具之前,文档是必须要看的,
而不只是简单的使用就完事,必须要对使用过程中可能出现的异常错误进行处理。

几个注意事项:

The tool refuses to operate if it detects replication filters. See --[no]check-replication-filters for details.
The tool pauses the data copy operation if it observes any replicas that are delayed in replication. See --max-lag for details.
The tool pauses or aborts its operation if it detects too much load on the server. See --max-load and --critical-load for details.
The tool sets its lock wait timeout to 1 second so that it is more likely to be the victim of any lock contention, and less likely to disrupt other transactions. See --lock-wait-timeout for details.
The tool refuses to alter the table if foreign key constraints reference it, unless you specify --alter-foreign-keys-method.
The tool cannot alter MyISAM tables on “Percona XtraDB Cluster” nodes.


pt-online-schema-change 在线DDL工具,虽然说不会锁表,但是对性能还是有一定的影响,执行过程中对全表做一次select。
这个过程会将buffer_cache中活跃数据全部交换一遍,这就导致活跃数据的请求都要从磁盘获取,导致慢SQL增多,file_reads增大。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324756670&siteId=291194637