MySQL OSC的实现

背景:产品需求经常变更的情况下,造成了线上数据库的表结构需要不停地进行变更,若直接 alter table(包括create index等)会导致锁表,后续对相关表的读写操作都会进入到 "Waiting for table metadata lock" 锁等待队列中,严重影响高负荷业务系统的运行。

注:关于5.5引入的metadata锁,即使是select操作也会产生metadata锁(保护查询过程中表结构不被破坏)。

下面我们分别讲述5.6和5.5以下如何正确进行OSC(Online Schema Change)的方法。


一、MySQL 5.6 官方 Online DDL

首先介绍下5.6官方引入的Online DDL。

通过设置 ALGORITHM 和 LOCK 可以自由决定 online ddl 过程中对性能和并发的倾向。

(1)执行算法

ALGORITHM [=] {DEFAULT|INPLACE|COPY}

DEFAULT:显示地指定该参数与不去指定效果相同。默认 old_alter_table : OFF 时优先尝试使用INPLACE,如果不被支持则转变为COPY

INPLACE:原地更新,避免重建表。相较COPY减少了IO和CPU消耗,为共享锁方式。性能较好但仅支持添加、删除索引的DDL操作

COPY:复制原始表。如果是大表复制到临时表会占用buffer pool(过大的话甚至会转为磁盘存储,性能更低),内存大量消耗,影响性能。


以添加索引为例,简析 INPLACE 和 COPY 方式的内部执行过程

(1)COPY方式

新建带新元素的临时表,同原表

S锁原表,从而禁止DML,而允许select

将原表数据拷贝到临时表

S锁升级为X锁,将临时表命名为原表名,(rename是修改数据字典,很快)


(2)INPLACE方式

原地更新方式。只能处理二级索引,若需要添加主键索引,即使采用INPLACE方式也会转化为COPY方式

创建二级索引的数据字典

原表加S锁

。。。。



(2)Locking Options for Online DDL

LOCK [=] {DEFAULT|NONE|SHARED|EXCLUSIVE}

DEFAULT:根据给定的ALGORITHM提供尽可能大的并发性:选取的支持优先级:NONE > SHARED > EXCLUSIVE

NONE:无锁,可支持其他事务的并发读写

SHARED:共享锁,支持其他事务的并发读,但堵塞写

EXCLUSIVE:排他锁,堵塞其他事务的读写(即使ALGORITHM中支持并发操作)



5.6 Online DDL执行过程:

1、Prepare阶段

0)语法检查,合理性、冲突检查

1)对原表创建临时frm文件

2)在原表上加表级排他meta data锁(Exclusive-MDL),禁止读写。(所以,一般在执行 online ddl前,需要查看是否有大查询的存在)

3)根据alter table类型确定执行方式:inplace(Online-rebuild、Online-norebuild)或者是copy

4)更新数据字典的内存对象,系统表中创建索引

5)分配row_log对象记录增量日志,增量日志用于记录:DDL操作过程中,记录DML操作对数据的修改。log大小由 innodb_online_alter_log_max_size 决定,操作过程中日志量过大,超过该值时,会导致DDL操作报错。

6)若执行方式为rebuild,则生成临时ibd文件,提交数据字典操作的事务,释放数据字典的锁


2、DDL执行阶段

1)降级Exclusive-MDL锁,允许读写

2)扫描原表的聚簇索引每条记录

3)遍历新表的聚簇索引和二级索引,逐一处理

4)根据记录构造对应的索引项

5)将构造的索引项插入sort_buffer块,注意排序操作可能需要用到tmpdir,过小会报错。

6)利用sort_buffer构造新的索引

7)若执行方式为rebuild,则还需要处理DDL执行过程中产生的增量,应用row_log,将新数据加入到ibd文件中


3、Commit阶段

1)升级Exclusive-MDL锁,禁止读写

2)前一次应用日志到本阶段升级Exclusive-MDL锁这段时间之间的row_log中,可能新产生了日志,再次应用之。

3)更新innodb的数据字典表

4)提交事务(刷事务的redo日志)

5)修改统计信息(数据字典、索引信息等)

6)rename临时ibd文件、frm文件

7)变更完成。


几个关键参数:

innodb_online_alter_log_max_size:DDL操作期间产生的日志,保存在内存中,大小由该参数控制,默认128M。可基于会话级别动态调整。

如果产生的日志大于该值,则会抛出如下错误:

Error:1799SQLSTATE:HY000(ER_INNODB_ONLINE_LOG_TOO_BIG)
Message: Creating index 'idx_aaa' required more than 'innodb_online_alter_log_max_size' bytes of modification log. Please try again.

tmpdir:DDL执行阶段,构造索引过程中排序时内存空间不足时,需要的临时空间



二、5.5及以前版本的OSC方法

5.6版本之前,在线变更表结构一般使用第三方工具,例如 OAK的oak-online-alter-table 或者 pt-online-schema-change等。

oak-online-alter-table

oak-online-alter-table采用的是copy的方式执行DDL,执行期间新增的DML产生的数据通过一个触发器同步到临时表。

使用oak-online-alter-table的注意点:

(1)主键必须为单列索引(联合索引为主键不可以,否则触发mysql的一个bug)

(2)不能存在外键

(3)不能存在触发器(对于已有触发器,先备份再删除,再执行oak ddl)。是因为OAK本身也有触发器,用于在DDL过程中将原表上新产生的DML操作传递到临时表中。

注:是否存在外键和触发器的检查SQL

Select * from information_schema.key_column_usage where Table_schema=@dbname and table_name=@tablename and Referenced_table_name is not null ;

Select * from information_schema.key_column_usage where Referenced_table_schema=@dbname and Referenced_table_name=@tablename;

(4)执行前检查是否存在大查询,导致Online DDL失败

(5)执行前预估执行时间,选择业务低谷期执行

(6)执行完之后,需要进行数据校验,检查原表和复制的临时表的数据一致性(是因为DDL如果改变表了字段类型,可能导致数据变化)



案例:如何使用OAK执行一次online ddl

1)使用sysbench创建测试表,表结构如下

mysql> use sysbench;
mysql> show tables;
+--------------------+
| Tables_in_sysbench |
+--------------------+
| sbtest1            |
+--------------------+
1 row in set (0.00 sec)
mysql> show create table sbtest1 \G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=20001 DEFAULT CHARSET=utf8mb4 MAX_ROWS=1000000
表数据2万行
mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
|    20000 |
+----------+
1 row in set (0.01 sec)

2)检查外键、触发器情况。均无。
mysql> select TRIGGER_SCHEMA,TRIGGER_NAME,EVENT_OBJECT_SCHEMA,EVENT_OBJECT_TABLE from information_schema.TRIGGERS where EVENT_OBJECT_SCHEMA='sysbench';
Empty set (0.00 sec)

mysql> 
mysql> Select * from information_schema.key_column_usage where Table_schema="sysbench" and table_name="sbtest1" and Referenced_table_name is not null;
Empty set (0.01 sec)

mysql> 
mysql> Select * from information_schema.key_column_usage where Referenced_table_schema="sysbench" and Referenced_table_name="sbtest1";
Empty set (0.05 sec)

3)使用OAK工具包内的oak-online-alter-table进行在线DDL操作(以增加表sbtest1的字段:last_update_time 和索引:lut 为例)

每次从原表中取出的行数: -c CHUNK_SIZE, --chunk-size=CHUNK_SIZENumber of rows to act on in chunks. Default: 1000

[root@237_12 ~]# oak-online-alter-table -uroot --ask-pass -S /tmp/mysqld.sock -d sysbench -t sbtest1 -g new_sbtest1 -a "add last_update_time timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,add key lut(last_update_time)" --sleep=300 --skip-delete-pass
-- Connecting to MySQL
Password: 
-- Table sysbench.sbtest1 is of engine innodb
-- Checking for UNIQUE columns on sysbench.sbtest1, by which to chunk
-- Possible UNIQUE KEY column names in sysbench.sbtest1:
-- - id
-- Table sysbench.new_sbtest1 has been created
-- Table sysbench.new_sbtest1 has been altered
-- Checking for UNIQUE columns on sysbench.new_sbtest1, by which to chunk
-- Possible UNIQUE KEY column names in sysbench.new_sbtest1:
-- - id
-- Checking for UNIQUE columns on sysbench.sbtest1, by which to chunk
-- - Found following possible unique keys:
-- - id (int)
-- Chosen unique key is 'id'
-- Shared columns: c, pad, k, id
-- Created AD trigger
-- Created AU trigger
-- Created AI trigger
-- Attempting to lock tables

-- Tables locked WRITE
-- id (min, max) values: ([1L], [20000L])
-- Tables unlocked
-- - Reminder: altering sysbench.sbtest1: add last_update_time timestamp...
-- Copying range (1), (1000), progress: 0%
-- + Will sleep for 0.3 seconds
-- Copying range (1000), (2000), progress: 5%
-- + Will sleep for 0.3 seconds
-- Copying range (2000), (3000), progress: 10%
-- + Will sleep for 0.3 seconds
-- Copying range (3000), (4000), progress: 15%
-- + Will sleep for 0.3 seconds
-- Copying range (4000), (5000), progress: 20%
-- + Will sleep for 0.3 seconds
-- Copying range (5000), (6000), progress: 25%
-- + Will sleep for 0.3 seconds
-- Copying range (6000), (7000), progress: 30%
-- + Will sleep for 0.3 seconds
-- Copying range (7000), (8000), progress: 35%
-- + Will sleep for 0.3 seconds
-- Copying range (8000), (9000), progress: 40%
-- + Will sleep for 0.3 seconds
-- Copying range (9000), (10000), progress: 45%
-- + Will sleep for 0.3 seconds
-- Copying range (10000), (11000), progress: 50%
-- + Will sleep for 0.3 seconds
-- Copying range (11000), (12000), progress: 55%
-- + Will sleep for 0.3 seconds
-- Copying range (12000), (13000), progress: 60%
-- + Will sleep for 0.3 seconds
-- Copying range (13000), (14000), progress: 65%
-- + Will sleep for 0.3 seconds
-- Copying range (14000), (15000), progress: 70%
-- + Will sleep for 0.3 seconds
-- Copying range (15000), (16000), progress: 75%
-- + Will sleep for 0.3 seconds
-- Copying range (16000), (17000), progress: 80%
-- + Will sleep for 0.3 seconds
-- Copying range (17000), (18000), progress: 85%
-- + Will sleep for 0.3 seconds
-- Copying range (18000), (19000), progress: 90%
-- + Will sleep for 0.3 seconds
-- Copying range (19000), (20000), progress: 95%
-- + Will sleep for 0.3 seconds
-- Copying range 100% complete. Number of rows: 20000
-- Ghost table creation completed. Note that triggers on sysbench.sbtest1 were not removed
[root@237_12 ~]#

此时模拟DDL操作期间原表有新数据插入
mysql> insert into sbtest1 values(99999,99999,"c99999","pad99999");
Query OK, 1 row affected (0.00 sec)

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
|    20001 |
+----------+
1 row in set (0.01 sec)

待online DDL操作完成之后,查看new_sbtest1表的数据量:
mysql> select count(*) from new_sbtest1;
+----------+
| count(*) |
+----------+
|    20001 |
+----------+
1 row in set (0.01 sec)

我们注意观察oak的输出日志:Copying range 100% complete. Number of rows: 20000

说明在执行DDL操作之前的原表数据是通过COPY操作复制到新表上去。

而从开始执行DDL到rename这个时间段内新的DML带来的数据变更通过触发器来同步到新表中去。

OAK触发器信息如下:

mysql> select TRIGGER_SCHEMA,TRIGGER_NAME,EVENT_OBJECT_SCHEMA,EVENT_OBJECT_TABLE from information_schema.TRIGGERS where EVENT_OBJECT_SCHEMA='sysbench';
+----------------+----------------+---------------------+--------------------+
| TRIGGER_SCHEMA | TRIGGER_NAME   | EVENT_OBJECT_SCHEMA | EVENT_OBJECT_TABLE |
+----------------+----------------+---------------------+--------------------+
| sysbench       | sbtest1_AI_oak | sysbench            | sbtest1            |
| sysbench       | sbtest1_AU_oak | sysbench            | sbtest1            |
| sysbench       | sbtest1_AD_oak | sysbench            | sbtest1            |
+----------------+----------------+---------------------+--------------------+
3 rows in set (0.00 sec)


4)数据一致性校验

1、查看表结构及索引信息

mysql> desc sbtest1;
+-------+------------------+------+-----+---------+----------------+
| Field | Type             | Null | Key | Default | Extra          |
+-------+------------------+------+-----+---------+----------------+
| id    | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| k     | int(10) unsigned | NO   | MUL | 0       |                |
| c     | char(120)        | NO   |     |         |                |
| pad   | char(60)         | NO   |     |         |                |
+-------+------------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

mysql> desc new_sbtest1;
+------------------+------------------+------+-----+-------------------+-----------------------------+
| Field            | Type             | Null | Key | Default           | Extra                       |
+------------------+------------------+------+-----+-------------------+-----------------------------+
| id               | int(10) unsigned | NO   | PRI | NULL              | auto_increment              |
| k                | int(10) unsigned | NO   | MUL | 0                 |                             |
| c                | char(120)        | NO   |     |                   |                             |
| pad              | char(60)         | NO   |     |                   |                             |
| last_update_time | timestamp        | NO   | MUL | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+------------------+------------------+------+-----+-------------------+-----------------------------+
5 rows in set (0.00 sec)

2、新、旧表数据总量的校验,上面的 3)已经展示过

3、对比类型为int的两组字段 id 和 k 的检验和

mysql> select sum(crc32(concat(ifnull(id,'NULL'),ifnull(k,'NULL')))) as sum_old from sbtest1;
+----------------+
| sum_old        |
+----------------+
| 42815177029049 |
+----------------+
1 row in set (0.02 sec)

mysql> select sum(crc32(concat(ifnull(id,'NULL'),ifnull(k,'NULL')))) as sum_new from new_sbtest1;
+----------------+
| sum_new        |
+----------------+
| 42815177029049 |
+----------------+
1 row in set (0.02 sec)

5)rename(该阶段虽然会存在锁表的情况,但只需要修改数据字典所以时间非常快)
mysql> use sysbench;
Database changed
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> rename table sbtest1 to old_sbtest1,new_sbtest1 to sbtest1;
Query OK, 0 rows affected (0.02 sec)

mysql> show tables;
+--------------------+
| Tables_in_sysbench |
+--------------------+
| old_sbtest1        |
| sbtest1            |
+--------------------+
2 rows in set (0.00 sec)

删除OAK的3个触发器,以及原表old_sbtest1
mysql> drop trigger sbtest1_AI_oak;
Query OK, 0 rows affected (0.00 sec)

mysql> drop trigger sbtest1_AU_oak;
Query OK, 0 rows affected (0.01 sec)

mysql> drop trigger sbtest1_AD_oak;
Query OK, 0 rows affected (0.00 sec)
mysql> drop table old_sbtest1;
Query OK, 0 rows affected (0.01 sec)

mysql> show tables;
+--------------------+
| Tables_in_sysbench |
+--------------------+
| sbtest1            |
+--------------------+
1 row in set (0.00 sec)

至此,使用OAK工具进行Online DDL操作完毕。




(1)官方手册:Online DDL Overview

https://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-overview.html

中文译本:http://blog.csdn.net/paololiu/article/details/53765818

(2)官方手册:pt-online-schema-change

https://www.percona.com/doc/percona-toolkit/2.1/pt-online-schema-change.html

(3)官方手册:oak-online-alter-table

http://openarkkit.googlecode.com/svn/trunk/openarkkit/doc/html/oak-online-alter-table.html

(4)pt-online-schema-change VS oak-online-alter-table:

http://www.cnblogs.com/gomysql/p/3777607.html

(5)Github提供的gh-ost:

http://www.oschina.net/news/76606/gh-ost-github-s-online-migration-tool-for-mysql

http://www.jianshu.com/p/70bc5c06b289



猜你喜欢

转载自blog.csdn.net/leonpenn/article/details/77506576
今日推荐