Analysis of mydumper

Ⅰ. Background

  • mysqldump single-threaded backup, very slow
  • Recovery is slow, one table is recovered one table at a time,
  • If you have backed up 100G of data and want to restore one of the tables, you can't do it (all the tables are in one file)

So it is recommended to use mydumper backup

  • Backup parallel, based on rows, even a table can be parallel, so strong
  • Recovery is also parallel
  • When restoring, only the specified table can be restored

perfect(*^__^*)

Ⅱ. Installation

yum install -y  glib2-devel mysql-devel zlib-devel pcre-devel openssl-devel cmake gcc gcc-c++
cd /usr/local/src
git clone https://github.com/maxbube/mydumper
cd mydumper
cmake .
make -j 4
make install
export LD_LIBRARY_PATH="/usr/local/mysql/lib:$LD_LIBRARY_PATH"

Ⅲ. Parameter introduction

The parameters are much the same as mysqldump

-G --triggers
-E --events
-R --routines
--trx-consistency-only    等于--single-transaction
-t 开几个线程,默认4个
-o 备份到指定目录
-x 正则匹配
-c 压缩
-B 指定数据库
-T 指定表
-F --chunk-filesize 指定文件大小
--rows 100000   每10w行导出到一个文件

Ⅳ. Play with both hands

4.1 Backup

[root@VM_0_5_centos backup]# mydumper -G -E -R --trx-consistency-only -t 4 -c -B dbt3 -o /mdata/backup
另开一个会话看下show processlist;可以看到四个线程
([email protected]) [(none)]> show processlist;
+--------+------+------------------+------+---------+------+-------------------+----------------------------------------------------------+
| Id     | User | Host             | db   | Command | Time | State             | Info                                                     |
+--------+------+------------------+------+---------+------+-------------------+----------------------------------------------------------+
| 137488 | root | 172.16.0.5:53046 | NULL | Query   |    0 | starting          | show processlist                                         |
| 137523 | root | 172.16.0.5:53546 | NULL | Query   |    3 | Sending to client | SELECT /*!40001 SQL_NO_CACHE */ * FROM `dbt3`.`customer` |
| 137524 | root | 172.16.0.5:53548 | NULL | Query   |    3 | Sending to client | SELECT /*!40001 SQL_NO_CACHE */ * FROM `dbt3`.`lineitem` |
| 137525 | root | 172.16.0.5:53550 | NULL | Query   |    1 | Sending to client | SELECT /*!40001 SQL_NO_CACHE */ * FROM `dbt3`.`partsupp` |
| 137526 | root | 172.16.0.5:53552 | NULL | Query   |    3 | Sending to client | SELECT /*!40001 SQL_NO_CACHE */ * FROM `dbt3`.`orders`   |
+--------+------+------------------+------+---------+------+-------------------+----------------------------------------------------------+
5 rows in set (0.00 sec)

tips:

The mydumper parameter and the value it follows cannot be connected together, otherwise an error will be reported

option parsing failed: Error parsing option -r, try --help

4.2 Analyze backup content

Enter the backup directory

[root@VM_0_5_centos backup]# ll
total 1200340
ll
total 305044
-rw-r--r-- 1 root root       281 Jan 24 10:41 dbt3.customer-schema.sql.gz
-rw-r--r-- 1 root root   9173713 Jan 24 10:41 dbt3.customer.sql.gz
-rw-r--r-- 1 root root       401 Jan 24 10:41 dbt3.lineitem-schema.sql.gz
-rw-r--r-- 1 root root 221097124 Jan 24 10:42 dbt3.lineitem.sql.gz
-rw-r--r-- 1 root root       228 Jan 24 10:41 dbt3.nation-schema.sql.gz
-rw-r--r-- 1 root root      1055 Jan 24 10:41 dbt3.nation.sql.gz
-rw-r--r-- 1 root root       294 Jan 24 10:41 dbt3.orders-schema.sql.gz
-rw-r--r-- 1 root root  47020810 Jan 24 10:41 dbt3.orders.sql.gz
-rw-r--r-- 1 root root       264 Jan 24 10:41 metadata

篇幅有限未将所有表列出来

The discovery is based on the backup of each table and generates compressed files, so you can specify a table to restore when restoring

take a look

[root@VM_0_5_centos backup]# cat metadata
Started dump at: 2018-01-24 10:35:50
SHOW MASTER STATUS:
    Log: bin.000001
    Pos: 154
    GTID:

Finished dump at: 2018-01-24 10:35:50

The metadata file records the binary log location (master-data=1)

Open compressed file

[root@VM_0_5_centos backup]# gunzip dbt3.customer-schema.sql.gz dbt3.customer.sql.gz dbt3-schema-create.sql.gz

[root@VM_0_5_centos backup]# cat dbt3-schema-create.sql
CREATE DATABASE `dbt3` /*!40100 DEFAULT CHARACTER SET utf8mb4 */;

[root@VM_0_5_centos backup]# cat dbt3-schema-create.sql
CREATE DATABASE `dbt3` /*!40100 DEFAULT CHARACTER SET utf8mb4 */;
[root@VM_0_5_centos backup]# cat dbt3.customer-schema.sql
/*!40101 SET NAMES binary*/;
/*!40014 SET FOREIGN_KEY_CHECKS=0*/;

CREATE TABLE `customer` (
  `c_custkey` int(11) NOT NULL,
  `c_name` varchar(25) DEFAULT NULL,
  `c_address` varchar(40) DEFAULT NULL,
  `c_nationkey` int(11) DEFAULT NULL,
  `c_phone` char(15) DEFAULT NULL,
  `c_acctbal` double DEFAULT NULL,
  `c_mktsegment` char(10) DEFAULT NULL,
  `c_comment` varchar(117) DEFAULT NULL,
  PRIMARY KEY (`c_custkey`),
  KEY `i_c_nationkey` (`c_nationkey`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

[root@VM_0_5_centos backup]# head -5 dbt3.customer.sql
/*!40101 SET NAMES binary*/;
/*!40014 SET FOREIGN_KEY_CHECKS=0*/;
/*!40103 SET TIME_ZONE='+00:00' */;
INSERT INTO `customer` VALUES
(1,"Customer#000000001","j5JsirBM9PsCy0O1m",15,"25-989-741-2988",711.56,"BUILDING","regular, regular platelets are fluffily according to the even attainments. blithely iron"),

In summary:

document effect
-schema.sql table structure for each table
.sql data files
-schema-create.sql.gz Create library

4.3 Recovery

Recover using the myloader command

-d 恢复文件目录
-t 指定线程数
-B 指定库
[root@VM_0_5_centos mdata]# myloader -d /mdata/backup -t 4 -B test

tips:

4 threads on the SSD is nearly twice as fast as the source single thread (the performance improvement of the HDD disk may be affected to a certain extent)

Ⅴ, mydumper principle:

With the foundation of mysqldump here, we will not open glog for detailed analysis.

The core question: how to do parallelism? A table can be exported in parallel, but also to maintain consistency

step1:

session1 (main thread):

flush tables with read lock; the entire database is locked into read-only, other threads can only read, not write, for myisam

start transaction with consistent snapshot starts a consistent snapshot transaction, for innodb

show master status to get the binary file location point

step2:

The main thread creates a child thread that performs backup tasks and switches to transaction isolation level rr

session2:start transaction with consistent snapshot;

session3:start transaction with consistent snapshot;

session4:start transaction with consistent snapshot;

In this way, the content read by multiple threads is consistent

step3:

backup no-innodb

step4:

session1:unlock tables;

Back up innodb to the end of the backup

summary:

From the perspective of the whole process, the data seen by multiple threads is consistent, so select each table, and the data obtained is consistent, in fact, it uses the characteristics of mvcc (not talking about non-innodb)

Question:
How can a table be parallelized?

  • The premise of single-table parallelism is that there must be a unique index in the table, and the unique index must be an integer, not a composite index
  • First detect the unique index, shard the table according to the unique index and then back it up, cut it in advance, and calculate the interval first (not every interval is equal), it can be seen in show processlist;

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325269362&siteId=291194637