Data integration Several modes of importing data into MySQL through JDBC

Abstract: Currently, MySQL JDBC provides a variety of ways to write data into MySQL. This article will introduce several modes supported by data integration (DataX, synchronization center, original CDP): * insert into xxx values ​​(..), (.. ), (..) * replace into xxx values ​​(..), (..), (..) * insert into xxx values ​​(..), (..),

currently MySQL JDBC provides a variety of data write The way to enter MySQL, this article will introduce several modes supported by data integration (DataX, synchronization center, original CDP):

insert into xxx values ​​(..), (..), (..)
replace into xxx values ​​(.. ), (..), (..)
insert into xxx values ​​(..), (..), (..), … on duplicate key update …
1. Functional differences
1.1 insert into method

Regular SQL insert, if If the submitted data on MySQL Server violates database constraints (primary key conflict, data type mismatch), an error will be reported directly;
correspondingly, dirty data will be reported in data integration. It is often used to insert data into an empty table;

1.2 The replace into method

is similar to insert into, the difference: if the primary key (PRIMARYKEY or UNIQUE index) of the new record to be inserted in the table conflicts with the old record in the table, replace into itself has the ability to handle conflicts:

1. When there is a pk conflict, delete first and then insert
2. When there is a uk conflict, update directly
** Use replace into Precautions**

1. To be able to use replace, you must have both the insert and delete permissions of the table;
2. Conflicting records: the primary key of the new record and the old record 3. Conflict records: The values ​​of all
columns are taken from the values ​​specified in the hot replace statement. All missing columns are set to their default values, that is, if you do not synchronize all the columns of the table each time, there will be some columns that have values ​​in the old records but have no values ​​after replacing into;
4. The replace statement will return A number to indicate the number of affected rows. The number is the sum of the number of rows deleted and inserted.
1.3 insert into… on duplicate key update

method If the primary key (PRIMARYKEY or UNIQUE index) in the new record to be inserted conflicts with the old record in the table (with the same value), the old record is updated.

3. The existing pits of Replace into
If inventory exists in the master and backup, when replacing into is performed based on uk, the auto_increment of the master and backup will be inconsistent (the auto_increment of the backup database is less than the maximum value of the actual data), and the replacement into will be caused when the master and the slave are switched and inserted. Error, after one failure, auto_increment will be updated to the maximum value + 1;
3.1 Instance
master:
use test;
CREATE TABLE `test` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL,
  `v` varchar(100) DEFAULT NULL,
  `extra` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uk_k` (`k`)
) ENGINE=InnoDB ;

insert into test(k,v,extra) values(1,1,'extra1'),(2,2,'extra2',3,3,'extra3');
after inserting, the main database and The data of the standby database is completely consistent with the schema; execute replace into:

replace into test(k,v) values(1,'1-1');

the data of the active and standby databases are consistent, but the schema is inconsistent.

The main database table structure is as follows:
CREATE TABLE `test` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL,
  `v` varchar(100) DEFAULT NULL,
  `extra` varchar( 200) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uk_k` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=gbk;
standby database:
CREATE TABLE `test` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL,
  `v` varchar(100 ) DEFAULT NULL,
  `extra` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uk_k` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=gbk;
reason analysis:

record in binlog SQL:
### UPDATE test.test
### WHERE
### @1=1
### @2=1
### @3='1'
### @4='extra1'
### SET
### @1=4
### @2=1
### @3='1-1'
### @4=NULL
As described in Chapter 1:
replace into When there is a uk conflict, it is a direct update, and the update operation does not involve the modification of auto_increment.

Based on this, some replace operations are recommended to use insert into on duplicate key update.

2. Best practices for data integration
Currently data integration already supports the above three modes, corresponding to the writeMode field in the configuration item of the DataX MySQLWriter plugin;

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "streamreader",
          "parameter": {
            "column": [
              {
                "value": "DataX",
                "type": "


            "sliceRecordCount": 1000
          }
        },
        "writer": {
          "name": "mysqlwriter",
          "parameter": {
            "writeMode": "insert/replace/update",
            "username": "root",
            "password": "root",
            "column": [
              "id",
              "name"
            ],
            "connection": [
              {
                "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/datax?useUnicode=true&characterEncoding=gbk",
                "table": [
                  "test"
                ]
              }
            ]
          }
        }
      }
    ]
  }
}

4.1 How data integration ensures the idempotency of synchronization to MySQL jobs

Briefly explain idempotency: the results obtained by running the same synchronization job multiple times are consistent;

Scenario 1: The data in the table can be deleted
In the data When the synchronization task is integrated and configured, the pre-SQL (delete or truncate table statement) is configured. Each time the synchronization task is executed, the pre-SQL will be executed before the real synchronization is executed to clear the table, so that multiple runs can be realized. Idempotency of synchronization tasks.

Scenario 2: The data in the table cannot be deleted, and the MySQL database configuration of the common backflow online business is to configure the writeMode as replace or update. When synchronizing, it will be inserted into the MySQL database in the way of replace into or insert into... on duplicate key update.
Reference:

https://askdba.alibaba-inc.com/libary/control/getArticle.do?articleId=12735
https://blog.xupeng.me/2013/10/11/mysql-replace-into-trap/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326218985&siteId=291194637