Ⅰ. Background
- mysqldump single-threaded backup, very slow
- Recovery is slow, one table is recovered one table at a time,
- If you have backed up 100G of data and want to restore one of the tables, you can't do it (all the tables are in one file)
So it is recommended to use mydumper backup
- Backup parallel, based on rows, even a table can be parallel, so strong
- Recovery is also parallel
- When restoring, only the specified table can be restored
perfect(*^__^*)
Ⅱ. Installation
yum install -y glib2-devel mysql-devel zlib-devel pcre-devel openssl-devel cmake gcc gcc-c++
cd /usr/local/src
git clone https://github.com/maxbube/mydumper
cd mydumper
cmake .
make -j 4
make install
export LD_LIBRARY_PATH="/usr/local/mysql/lib:$LD_LIBRARY_PATH"
Ⅲ. Parameter introduction
The parameters are much the same as mysqldump
-G --triggers
-E --events
-R --routines
--trx-consistency-only 等于--single-transaction
-t 开几个线程,默认4个
-o 备份到指定目录
-x 正则匹配
-c 压缩
-B 指定数据库
-T 指定表
-F --chunk-filesize 指定文件大小
--rows 100000 每10w行导出到一个文件
Ⅳ. Play with both hands
4.1 Backup
[root@VM_0_5_centos backup]# mydumper -G -E -R --trx-consistency-only -t 4 -c -B dbt3 -o /mdata/backup
另开一个会话看下show processlist;可以看到四个线程
([email protected]) [(none)]> show processlist;
+--------+------+------------------+------+---------+------+-------------------+----------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+--------+------+------------------+------+---------+------+-------------------+----------------------------------------------------------+
| 137488 | root | 172.16.0.5:53046 | NULL | Query | 0 | starting | show processlist |
| 137523 | root | 172.16.0.5:53546 | NULL | Query | 3 | Sending to client | SELECT /*!40001 SQL_NO_CACHE */ * FROM `dbt3`.`customer` |
| 137524 | root | 172.16.0.5:53548 | NULL | Query | 3 | Sending to client | SELECT /*!40001 SQL_NO_CACHE */ * FROM `dbt3`.`lineitem` |
| 137525 | root | 172.16.0.5:53550 | NULL | Query | 1 | Sending to client | SELECT /*!40001 SQL_NO_CACHE */ * FROM `dbt3`.`partsupp` |
| 137526 | root | 172.16.0.5:53552 | NULL | Query | 3 | Sending to client | SELECT /*!40001 SQL_NO_CACHE */ * FROM `dbt3`.`orders` |
+--------+------+------------------+------+---------+------+-------------------+----------------------------------------------------------+
5 rows in set (0.00 sec)
tips:
The mydumper parameter and the value it follows cannot be connected together, otherwise an error will be reported
option parsing failed: Error parsing option -r, try --help
4.2 Analyze backup content
Enter the backup directory
[root@VM_0_5_centos backup]# ll
total 1200340
ll
total 305044
-rw-r--r-- 1 root root 281 Jan 24 10:41 dbt3.customer-schema.sql.gz
-rw-r--r-- 1 root root 9173713 Jan 24 10:41 dbt3.customer.sql.gz
-rw-r--r-- 1 root root 401 Jan 24 10:41 dbt3.lineitem-schema.sql.gz
-rw-r--r-- 1 root root 221097124 Jan 24 10:42 dbt3.lineitem.sql.gz
-rw-r--r-- 1 root root 228 Jan 24 10:41 dbt3.nation-schema.sql.gz
-rw-r--r-- 1 root root 1055 Jan 24 10:41 dbt3.nation.sql.gz
-rw-r--r-- 1 root root 294 Jan 24 10:41 dbt3.orders-schema.sql.gz
-rw-r--r-- 1 root root 47020810 Jan 24 10:41 dbt3.orders.sql.gz
-rw-r--r-- 1 root root 264 Jan 24 10:41 metadata
篇幅有限未将所有表列出来
The discovery is based on the backup of each table and generates compressed files, so you can specify a table to restore when restoring
take a look
[root@VM_0_5_centos backup]# cat metadata
Started dump at: 2018-01-24 10:35:50
SHOW MASTER STATUS:
Log: bin.000001
Pos: 154
GTID:
Finished dump at: 2018-01-24 10:35:50
The metadata file records the binary log location (master-data=1)
Open compressed file
[root@VM_0_5_centos backup]# gunzip dbt3.customer-schema.sql.gz dbt3.customer.sql.gz dbt3-schema-create.sql.gz
[root@VM_0_5_centos backup]# cat dbt3-schema-create.sql
CREATE DATABASE `dbt3` /*!40100 DEFAULT CHARACTER SET utf8mb4 */;
[root@VM_0_5_centos backup]# cat dbt3-schema-create.sql
CREATE DATABASE `dbt3` /*!40100 DEFAULT CHARACTER SET utf8mb4 */;
[root@VM_0_5_centos backup]# cat dbt3.customer-schema.sql
/*!40101 SET NAMES binary*/;
/*!40014 SET FOREIGN_KEY_CHECKS=0*/;
CREATE TABLE `customer` (
`c_custkey` int(11) NOT NULL,
`c_name` varchar(25) DEFAULT NULL,
`c_address` varchar(40) DEFAULT NULL,
`c_nationkey` int(11) DEFAULT NULL,
`c_phone` char(15) DEFAULT NULL,
`c_acctbal` double DEFAULT NULL,
`c_mktsegment` char(10) DEFAULT NULL,
`c_comment` varchar(117) DEFAULT NULL,
PRIMARY KEY (`c_custkey`),
KEY `i_c_nationkey` (`c_nationkey`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
[root@VM_0_5_centos backup]# head -5 dbt3.customer.sql
/*!40101 SET NAMES binary*/;
/*!40014 SET FOREIGN_KEY_CHECKS=0*/;
/*!40103 SET TIME_ZONE='+00:00' */;
INSERT INTO `customer` VALUES
(1,"Customer#000000001","j5JsirBM9PsCy0O1m",15,"25-989-741-2988",711.56,"BUILDING","regular, regular platelets are fluffily according to the even attainments. blithely iron"),
In summary:
document | effect |
---|---|
-schema.sql | table structure for each table |
.sql | data files |
-schema-create.sql.gz | Create library |
4.3 Recovery
Recover using the myloader command
-d 恢复文件目录
-t 指定线程数
-B 指定库
[root@VM_0_5_centos mdata]# myloader -d /mdata/backup -t 4 -B test
tips:
4 threads on the SSD is nearly twice as fast as the source single thread (the performance improvement of the HDD disk may be affected to a certain extent)
Ⅴ, mydumper principle:
With the foundation of mysqldump here, we will not open glog for detailed analysis.
The core question: how to do parallelism? A table can be exported in parallel, but also to maintain consistency
step1:
session1 (main thread):
flush tables with read lock; the entire database is locked into read-only, other threads can only read, not write, for myisam
start transaction with consistent snapshot starts a consistent snapshot transaction, for innodb
show master status to get the binary file location point
step2:
The main thread creates a child thread that performs backup tasks and switches to transaction isolation level rr
session2:start transaction with consistent snapshot;
session3:start transaction with consistent snapshot;
session4:start transaction with consistent snapshot;
In this way, the content read by multiple threads is consistent
step3:
backup no-innodb
step4:
session1:unlock tables;
Back up innodb to the end of the backup
summary:
From the perspective of the whole process, the data seen by multiple threads is consistent, so select each table, and the data obtained is consistent, in fact, it uses the characteristics of mvcc (not talking about non-innodb)
Question:
How can a table be parallelized?
- The premise of single-table parallelism is that there must be a unique index in the table, and the unique index must be an integer, not a composite index
- First detect the unique index, shard the table according to the unique index and then back it up, cut it in advance, and calculate the interval first (not every interval is equal), it can be seen in show processlist;