Datax helps easily migrate SQLServer data to GreatSQL

1.Environmental description

1.1 Source SQLSserver

Version IP port
Microsoft SQL Server 2017 192.168.140.160 1433

1.2 Target GreatSQL

Version IP port
GreatSQL-8.0.32 192.168.139.86 3308

2. Installation environment

2.1 Install SQLServer environment

Environment description : Start the database using a mirror with Docker

2.1.1Install docker

1. Install basic software packages

$ yum install -y wget net-tools nfs-utils lrzsz gcc gcc-c++ make cmake libxml2-devel openssl-devel curl curl-devel unzip sudo ntp libaio-devel wget vim ncurses-devel autoconf automake zlib-devel python-devel epel-release openssh-server socat ipvsadm conntrack yum-utils

2. Configure docker-ce domestic yum source (Alibaba Cloud)

$ yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

3. Install docker dependency packages

$ yum install -y  device-mapper-persistent-data lvm2 

4.Install docker-ce

$ yum install docker-ce -y 

5. Start the container

$ systemctl start docker && systemctl enable docker 

2.1.2 Pull the image

$ docker pull mcr.microsoft.com/mssql/server:2017-latest 

2.1.3 Running the container

$ docker run -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=********" \
-p 1433:1433 --name sqlserver2017 \
-d mcr.microsoft.com/mssql/server:2017-latest 

Remember to set the password here to a complex password

Parameter explanation:

  • -e "ACCEPT_EULA=Y": Select the agreement license by default

  • -e "SA_PASSWORD=********": Set the connection password. The password cannot be too short or simple, otherwise it will not meet the sqlserver password specification and the container will stop running.

  • -p 1433:1433: The host port is mapped to the container port (the former is the host)

  • --name sqlserver2017: container alias

  • -d: run in the background

  • mcr.microsoft.com/mssql/server:2017-latest:image name:label

2.1.4 Using database

1. Enter the container

$ docker exec -it sqlserver2017 bash

2. Connect to the database

$ /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "********"

3. Query the database

1> select name from sys.Databases;
2> go

4.Create database

1> create database testdb;
2> go

5. Create a table and insert data

use testdb
create table t1(id int)
go
Insert into t1 values(1),(2)
go

2.2 Install GreatSQL environment

Use the Docker image to install, just pull the GreatSQL image directly

$ docker pull greatsql/greatsql

and create GreatSQL container

$ docker run -d --name greatsql --hostname=greatsql -e MYSQL_ALLOW_EMPTY_PASSWORD=1 greatsql/greatsql

2.3 Install datax

DataX installation requires dependent environments

  • JDK (1.8 or above, 1.8 recommended)

  • Python (Python2.6.X and above recommended)

Installation steps: decompress and use. However, if you run the job without performing other operations after decompression, an error will be reported.

$ cd /soft

$ ll
total 3764708
-rw-r--r--  1 root root  853734462 Dec  9 04:06 datax.tar.gz

$ tar xf datax.tar.gz

$ python /soft/datax/bin/datax.py /soft/datax/job/job.json 

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
2023-07-19 11:19:17.483 [main] WARN  ConfigParser - 插件[streamreader,streamwriter]加载失败,1s后重试... Exception:Code:[Common-00], Describe:[您提供的配置文件存在错误信息,请检查您的作业配置 .] - 配置信息错误,您提供的配置文件[/soft/datax/plugin/reader/._mysqlreader/plugin.json]不存在. 请检查您的配置文件. 
2023-07-19 11:19:18.488 [main] ERROR Engine - 
经DataX智能分析,该任务最可能的错误原因是:
com.alibaba.datax.common.exception.DataXException: Code:[Common-00], Describe:[您提供的配置文件存在错误信息,请检查您的作业配置 .] - 配置信息错误,您提供的配置文件[/soft/datax/plugin/reader/._mysqlreader/plugin.json]不存在. 请检查您的配置文件.
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:26)
at com.alibaba.datax.common.util.Configuration.from(Configuration.java:95)
at com.alibaba.datax.core.util.ConfigParser.parseOnePluginConfig(ConfigParser.java:153)
at com.alibaba.datax.core.util.ConfigParser.parsePluginConfig(ConfigParser.java:125)
at com.alibaba.datax.core.util.ConfigParser.parse(ConfigParser.java:63)
at com.alibaba.datax.core.Engine.entry(Engine.java:137)
at com.alibaba.datax.core.Engine.main(Engine.java:204)

To solve the error: delete all files starting with ._ in the plugin directory and the plugin/reader and plugin/writer directories.

Need to delete hidden files in three directories

  • plugin/
  • plugin/reader/
  • plugin/writer/
$ rm -rf /opt/app/datax/plugin/._*
$ rm -rf /opt/app/datax/plugin/reader/._*
$ rm -rf /opt/app/datax/plugin/writer/._*

Run a test case to check whether datax is installed successfully

$ python /soft/datax/bin/datax.py  /soft/datax/job/job.json 

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
 2023-07-19 11:22:12.298 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2023-07-19 11:22:12.305 [main] INFO  Engine - the machine info  => 
osInfo: Oracle Corporation 1.8 25.251-b08
jvmInfo: Linux amd64 4.19.25-200.1.el7.bclinux.x86_64
cpu num: 48
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]

MEMORY_NAME                    | allocation_size                | init_size                      
PS Eden Space                  | 256.00MB                       | 256.00MB                       
Code Cache                     | 240.00MB                       | 2.44MB                         
Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
PS Survivor Space              | 42.50MB                        | 42.50MB                        
PS Old Gen                     | 683.00MB                       | 683.00MB                       
Metaspace                      | -0.00MB                        | 0.00MB  

2023-07-19 11:22:12.320 [main] INFO  Engine - 
{"content":[{"reader":{"name":"streamreader",
"parameter":{"column":[
{"type":"string","value":"DataX"},
{"type":"long","value":19890604},
{"type":"date","value":"1989-06-04 00:00:00"},
{"type":"bool","value":true},
{"type":"bytes","value":"test"}
],"sliceRecordCount":100000}
},"writer":{"name":"streamwriter","parameter":{"encoding":"UTF-8","print":false}}}],
"setting":{"errorLimit":{"percentage":0.02,"record":0},
"speed":{"byte":10485760}}}

2023-07-19 11:22:12.336 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2023-07-19 11:22:12.337 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2023-07-19 11:22:12.338 [main] INFO  JobContainer - DataX jobContainer starts job.
2023-07-19 11:22:12.339 [main] INFO  JobContainer - Set jobId = 0
2023-07-19 11:22:12.352 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2023-07-19 11:22:12.352 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .
2023-07-19 11:22:12.352 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2023-07-19 11:22:12.352 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2023-07-19 11:22:12.353 [job-0] INFO  JobContainer - Job set Max-Byte-Speed to 10485760 bytes.
2023-07-19 11:22:12.354 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2023-07-19 11:22:12.354 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2023-07-19 11:22:12.371 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2023-07-19 11:22:12.375 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2023-07-19 11:22:12.376 [job-0] INFO  JobContainer - Running by standalone Mode.
2023-07-19 11:22:12.384 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2023-07-19 11:22:12.388 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2023-07-19 11:22:12.388 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2023-07-19 11:22:12.396 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2023-07-19 11:22:12.697 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[302]ms
2023-07-19 11:22:12.698 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2023-07-19 11:22:22.402 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.020s |  All Task WaitReaderTime 0.033s | Percentage 100.00%
2023-07-19 11:22:22.402 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2023-07-19 11:22:22.402 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2023-07-19 11:22:22.403 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2023-07-19 11:22:22.403 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2023-07-19 11:22:22.403 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /soft/datax/hook
2023-07-19 11:22:22.404 [job-0] INFO  JobContainer - 
 [total cpu info] => 
averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
-1.00%                         | -1.00%                         | -1.00%
 [total gc info] => 
 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

2023-07-19 11:22:22.404 [job-0] INFO  JobContainer - PerfTrace not enable!
2023-07-19 11:22:22.404 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.020s |  All Task WaitReaderTime 0.033s | Percentage 100.00%
2023-07-19 11:22:22.406 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2023-07-19 11:22:12
任务结束时刻                    : 2023-07-19 11:22:22
任务总计耗时                    :                 10s
任务平均流量                    :          253.91KB/s
记录写入速度                    :          10000rec/s
读出记录总数                    :              100000
读写失败总数                    :                   0

3. SQLServer2GreatSQL full migration

3.1 Create test data from the source (SQLServer)

$ docker exec -it 47bd0ed79c26 /bin/bash

$ /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "********"

1> create database testdb
1> use testdb
1> insert into t1 values(1),(2),(3);
2> go
1> select * from t1;
2> go
id         
\-----------
​          1
​          2
​          3

3.2 Create table structure on the target side (GreatSQL)

greatsql> create database testdb;
greatsql> use testdb;
greatsql> create table t1 (id int primary key);

3.3 Write Datax job file

$ cat /soft/datax/job/sqlserver_to_greatsql.json
{
​    "job": {
​        "content": [
​            {
​                "reader": {
​                    "name": "sqlserverreader",
​                    "parameter": {
​                        "connection": [
​                            {
​                                "jdbcUrl": ["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"],
​                                "table": ["t1"]
​                            }
​                        ],
​                        "password": "********",
​                        "username": "SA",
​                        "column": ["*"]
​                    }
​                },
​                "writer": {
​                    "name": "mysqlwriter",
​                    "parameter": {
​                        "column": ["*"],
​                        "connection": [
​                            {
​                                "jdbcUrl": "jdbc:mysql://10.17.139.86:3308/testdb",
​                                "table": ["t1"]
​                            }
​                       ],
​                        "password": "******",
​                        "session": [],
​                        "username": "admin",
​                        "writeMode": "insert"
​                    }
​                }
​            }
​        ],
​        "setting": {
​            "speed": {
​                "channel": "5"
​            }
​        }
​    }
}

3.4 Run Datax migration task

$ python /soft/datax/bin/datax.py /soft/datax/job/sqlserver_to_greatsql.json 

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

2023-11-28 09:58:44.087 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2023-11-28 09:58:44.104 [main] INFO  Engine - the machine info  => 
osInfo: Oracle Corporation 1.8 25.181-b13
jvmInfo: Linux amd64 3.10.0-957.el7.x86_64

cpu num: 8
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]
MEMORY_NAME                    | allocation_size                | init_size                      
PS Eden Space                  | 256.00MB                       | 256.00MB                       
Code Cache                     | 240.00MB                       | 2.44MB                         
Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
PS Survivor Space              | 42.50MB                        | 42.50MB                        
PS Old Gen                     | 683.00MB                       | 683.00MB                       
Metaspace                      | -0.00MB                        | 0.00MB                         

2023-11-28 09:58:44.137 [main] INFO  Engine - 
{
"content":[
{"reader":{
"name":"sqlserverreader",
"parameter":{
"column":["*"],
"connection":[
{"jdbcUrl":["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"],
"table":["t1"]}],
"password":"*************",
"username":"SA"}},

"writer":{"name":"mysqlwriter","parameter":{"column":["*"],

"connection":[{"jdbcUrl":"jdbc:mysql://10.17.139.86:3308/testdb",
"table":["t1"]}],
"password":"********",
"session":[],
"username":"admin",
"writeMode":"insert"}}}],
"setting":{"speed":{"channel":"5"}}}

2023-11-28 09:58:44.176 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2023-11-28 09:58:44.179 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2023-11-28 09:58:44.180 [main] INFO  JobContainer - DataX jobContainer starts job.
2023-11-28 09:58:44.183 [main] INFO  JobContainer - Set jobId = 0
2023-11-28 09:58:44.542 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb.
2023-11-28 09:58:44.544 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
2023-11-28 09:58:45.099 [job-0] INFO  OriginalConfPretreatmentUtil - table:[t1] all columns:[id].
2023-11-28 09:58:45.099 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2023-11-28 09:58:45.102 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
insert INTO %s (id) VALUES(?)
], which jdbcUrl like:[jdbc:mysql://10..17.139.86:16310/testdb?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2023-11-28 09:58:45.103 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2023-11-28 09:58:45.103 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] do prepare work 
2023-11-28 09:58:45.104 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2023-11-28 09:58:45.104 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2023-11-28 09:58:45.105 [job-0] INFO  JobContainer - Job set Channel-Number to 5 channels.
2023-11-28 09:58:45.112 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] splits to [1] tasks.
2023-11-28 09:58:45.114 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2023-11-28 09:58:45.135 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2023-11-28 09:58:45.139 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2023-11-28 09:58:45.142 [job-0] INFO  JobContainer - Running by standalone Mode.
2023-11-28 09:58:45.151 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2023-11-28 09:58:45.157 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2023-11-28 09:58:45.158 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2023-11-28 09:58:45.173 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2023-11-28 09:58:45.181 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select * from t1 
] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb].
2023-11-28 09:58:45.398 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select * from t1 
] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb].
2023-11-28 09:58:45.454 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[284]ms
2023-11-28 09:58:45.455 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2023-11-28 09:58:55.175 [job-0] INFO  StandAloneJobContainerCommunicator - Total 3 records, 3 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2023-11-28 09:58:55.175 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2023-11-28 09:58:55.175 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2023-11-28 09:58:55.176 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] do post work.
2023-11-28 09:58:55.176 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2023-11-28 09:58:55.176 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /soft/datax/hook
2023-11-28 09:58:55.177 [job-0] INFO  JobContainer - 
 [total cpu info] => 
averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
-1.00%                         | -1.00%                         | -1.00%
 [total gc info] => 
 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
 PS MarkSweep         | 1                  | 1                  | 1                  | 0.061s             | 0.061s             | 0.061s             
 PS Scavenge          | 1                  | 1                  | 1                  | 0.039s             | 0.039s             | 0.039s             
2023-11-28 09:58:55.177 [job-0] INFO  JobContainer - PerfTrace not enable!
2023-11-28 09:58:55.177 [job-0] INFO  StandAloneJobContainerCommunicator - Total 3 records, 3 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2023-11-28 09:58:55.179 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2023-11-28 09:58:44
任务结束时刻                    : 2023-11-28 09:58:55
任务总计耗时                    :                 10s
任务平均流量                    :                0B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                   3
读写失败总数                    :                   0

3.5 Verify data at the target end

greatsql> select * from t1;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+
3 rows in set (0.01 sec)

4. SQLServer to GreatSQL incremental migration

4.1 Create test data on the source side (SQLServer)

2> create table t2 (id int,createtime datetime);
3> go
1> insert into t2 values(1,GETDATE());
2> g
(1 rows affected)
1> insert into t2 values(2,GETDATE());
2> go
(1 rows affected)
1> insert into t2 values(3,GETDATE());
2> go
(1 rows affected)
1> insert into t2 values(4,GETDATE());
2> go
(1 rows affected)
1> insert into t2 values(5,GETDATE());
2> go
(1 rows affected)
1> insert into t2 values(6,GETDATE());
2> go
(1 rows affected)
1> select * from t2;
2> go
id          createtime             
---------- -----------------------
​          1 2023-11-28 02:18:20.790
​          2 2023-11-28 02:18:27.040
​          3 2023-11-28 02:18:32.103
​          4 2023-11-28 02:18:37.690
​          5 2023-11-28 02:18:41.450
​          6 2023-11-28 02:18:46.330

4.2 Write the full migration job file of Datax

$ cat sqlserver_to_greatsql_inc.json 
{
​    "job": {
​        "content": [
​            {
​                "reader": {
​                    "name": "sqlserverreader",
​                    "parameter": {
​                        "connection": [
​                            {
​                                "jdbcUrl": ["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"],
​                                "table": ["t2"]
​                            }
​                        ],
​                        "password": "********",
​                        "username": "SA",
​                        "column": ["*"]
​                    }
​                },
​                "writer": {
​                    "name": "mysqlwriter",
​                    "parameter": {
​                        "column": ["*"],
​                        "connection": [
​                            {
​                                "jdbcUrl": "jdbc:mysql://10.17.139.86:3308/testdb",
​                                "table": ["t2"]
​                            }
​                        ],
​                        "password": "!QAZ2wsx",
​                        "session": [],
​                        "username": "admin",
​                        "writeMode": "insert"
​                    }
​                }
​            }
​        ],
​        "setting": {
​            "speed": {
​                "channel": "5"
​            }
​        }
​    }
}

4.3 Run Datax full migration task

$ python /soft/datax/bin/datax.py /soft/datax/job/sqlserver_to_greatsql_inc.json 

 DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
2023-11-28 10:19:59.279 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2023-11-28 10:19:59.286 [main] INFO  Engine - the machine info  => 
osInfo: Oracle Corporation 1.8 25.181-b13
jvmInfo: Linux amd64 3.10.0-957.el7.x86_64
cpu num: 8
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]

MEMORY_NAME                    | allocation_size                | init_size                      
PS Eden Space                  | 256.00MB                       | 256.00MB                       
Code Cache                     | 240.00MB                       | 2.44MB                         
Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
PS Survivor Space              | 42.50MB                        | 42.50MB                        
PS Old Gen                     | 683.00MB                       | 683.00MB                       
Metaspace                      | -0.00MB                        | 0.00MB                         

2023-11-28 10:19:59.302 [main] INFO  Engine - 
{"content":[{"reader":{"name":"sqlserverreader","parameter":{"column":[
"*"],"connection":[{"jdbcUrl":["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"],
"table":["t2"]}],"password":"*************","username":"SA"}},
"writer":{"name":"mysqlwriter","parameter":{"column":["*"],
"connection":[{"jdbcUrl":"jdbc:mysql://10..17.139.86:16310/testdb","table":["t2"]}],
"password":"********",
"session":[],
"username":"admin",
"writeMode":"insert"}}}],
"setting":{"speed":{"channel":"5"}}}

2023-11-28 10:19:59.319 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2023-11-28 10:19:59.321 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2023-11-28 10:19:59.321 [main] INFO  JobContainer - DataX jobContainer starts job.
2023-11-28 10:19:59.324 [main] INFO  JobContainer - Set jobId = 0
2023-11-28 10:19:59.629 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb.
2023-11-28 10:19:59.630 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
2023-11-28 10:20:00.027 [job-0] INFO  OriginalConfPretreatmentUtil - table:[t2] all columns:[
id,createtime].
2023-11-28 10:20:00.027 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2023-11-28 10:20:00.029 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
insert INTO %s (id,createtime) VALUES(?,?)
], which jdbcUrl like:[jdbc:mysql://10..17.139.86:16310/testdb?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2023-11-28 10:20:00.030 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2023-11-28 10:20:00.031 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] do prepare work .
2023-11-28 10:20:00.031 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2023-11-28 10:20:00.032 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2023-11-28 10:20:00.032 [job-0] INFO  JobContainer - Job set Channel-Number to 5 channels.
2023-11-28 10:20:00.037 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] splits to [1] tasks.
2023-11-28 10:20:00.038 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2023-11-28 10:20:00.060 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2023-11-28 10:20:00.063 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2023-11-28 10:20:00.066 [job-0] INFO  JobContainer - Running by standalone Mode.
2023-11-28 10:20:00.073 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2023-11-28 10:20:00.080 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2023-11-28 10:20:00.080 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2023-11-28 10:20:00.093 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2023-11-28 10:20:00.101 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select * from t2 
] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb].
2023-11-28 10:20:00.262 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select * from t2 
] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb].
2023-11-28 10:20:00.334 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[243]ms
2023-11-28 10:20:00.335 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2023-11-28 10:20:10.087 [job-0] INFO  StandAloneJobContainerCommunicator - Total 6 records, 54 bytes | Speed 5B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2023-11-28 10:20:10.088 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2023-11-28 10:20:10.088 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2023-11-28 10:20:10.089 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] do post work.
2023-11-28 10:20:10.090 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2023-11-28 10:20:10.091 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /soft/datax/hook
2023-11-28 10:20:10.094 [job-0] INFO  JobContainer - 
 [total cpu info] => 
averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
-1.00%                         | -1.00%                         | -1.00%                      
 [total gc info] => 
 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
 PS MarkSweep         | 1                  | 1                  | 1                  | 0.034s             | 0.034s             | 0.034s             
 PS Scavenge          | 1                  | 1                  | 1                  | 0.031s             | 0.031s             | 0.031s             

2023-11-28 10:20:10.094 [job-0] INFO  JobContainer - PerfTrace not enable!
2023-11-28 10:20:10.095 [job-0] INFO  StandAloneJobContainerCommunicator - Total 6 records, 54 bytes | Speed 5B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2023-11-28 10:20:10.097 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2023-11-28 10:19:59
任务结束时刻                    : 2023-11-28 10:20:10
任务总计耗时                    :                 10s
任务平均流量                    :                5B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                   6
读写失败总数                    :                   0

4.4 Verify all migrated data

greatsql> select * from t2;
+----+---------------------+
| id | createtime          |
+----+---------------------+
|  1 | 2023-11-28 02:18:21 |
|  2 | 2023-11-28 02:18:27 |
|  3 | 2023-11-28 02:18:32 |
|  4 | 2023-11-28 02:18:38 |
|  5 | 2023-11-28 02:18:41 |
|  6 | 2023-11-28 02:18:46 |
+----+---------------------+ 

You can also use checksum table x to verify. For large tables, you cannot select the entire table *

4.5 Insert incremental data from the source (SQLServer)

2> insert into t2 values(7,'202311-28 03:18:46.330');
3> go
Changed database context to 'jem_db'.
(1 rows affected)
1> insert into t2 values(8,'2023-11-28 03:20:46.330');
2> go
(1 rows affected)
1> insert into t2 values(9,'2023-11-28 03:25:46.330');
2> go
(1 rows affected)
1> insert into t2 values(10,'2023-11-28 03:30:46.330');
2> go
(1 rows affected)
1> select * from t2;
2> go
id          createtime             
----------- -----------------------
​          1 2023-11-28 02:18:20.790
​          2 2023-11-28 02:18:27.040
​          3 2023-11-28 02:18:32.103
​          4 2023-11-28 02:18:37.690
​          5 2023-11-28 02:18:41.450
​          6 2023-11-28 02:18:46.330
​          7 2023-11-28 03:18:46.330
​          8 2023-11-28 03:20:46.330
​          9 2023-11-28 03:25:46.330
​         10 2023-11-28 03:30:46.330

4.6 Write Datax incremental migration job file

$ cat sqlserver_to_greatsql_inc.json 
{
​    "job": {
​        "content": [
​            {
​                "reader": {
​                    "name": "sqlserverreader",
​                    "parameter": {
​                        "connection": [
​                            {
​                                "jdbcUrl": ["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"],
​                                "table": ["t2"]
​                            }
​                        ],
​                        "password": "********",
​                        "username": "SA",
​                        "column": ["*"],
​                        "where":"createtime > '${start_time}' and createtime < '${end_time}'"
​                    }
​                },
​                "writer": {
​                    "name": "mysqlwriter",
​                    "parameter": {
​                        "column": ["*"],
​                        "connection": [
​                            {
​                                "jdbcUrl": "jdbc:mysql://10..17.139.86:16310/testdb",
​                                "table": ["t2"]
​                            }
​                        ],
​                        "password": "!QAZ2wsx",
​                        "session": [],
​                        "username": "admin",
​                        "writeMode": "insert"
​                    }
​                }
​            }
​        ],
​        "setting": {
​            "speed": {
​                "channel": "5"
​            }
​        }
​    }
}

4.7 Run Datax incremental migration task

$ python /soft/datax/bin/datax.py /soft/datax/job/sqlserver_to_mysql_inc.json -p "-Dstart_time='2023-11-28 03:17:46.330' -Dend_time='2023-11-28 03:31:46.330'"

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

2023-11-28 10:29:24.492 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl

2023-11-28 10:29:24.504 [main] INFO  Engine - the machine info  => 

osInfo: Oracle Corporation 1.8 25.181-b13
jvmInfo: Linux amd64 3.10.0-957.el7.x86_64
cpu num: 8

totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]
MEMORY_NAME                    | allocation_size                | init_size                      
PS Eden Space                  | 256.00MB                       | 256.00MB                       
Code Cache                     | 240.00MB                       | 2.44MB                         
Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
PS Survivor Space              | 42.50MB                        | 42.50MB                        
PS Old Gen                     | 683.00MB                       | 683.00MB                       
Metaspace                      | -0.00MB                        | 0.00MB                         

2023-11-28 10:29:24.524 [main] INFO  Engine - 
{"content":[{"reader":{"name":"sqlserverreader","parameter":{"column":["*"],
"connection":[{"jdbcUrl":["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"],
"table":["t2"]}],"password":"*************","username":"SA",
"where":"createtime > '2023-11-28 03:17:46.330' and createtime < '2023-11-28 03:31:46.330'"}},
"writer":{"name":"mysqlwriter","parameter":{"column":["*"],"connection":[{"jdbcUrl":"jdbc:mysql://10..17.139.86:16310/testdb","table":["t2"]}],
"password":"********",
"session":[],
"username":"admin",
"writeMode":"insert"}}}],
"setting":{"speed":{"channel":"5"}}}

2023-11-28 10:29:24.542 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2023-11-28 10:29:24.544 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2023-11-28 10:29:24.544 [main] INFO  JobContainer - DataX jobContainer starts job.
2023-11-28 10:29:24.546 [main] INFO  JobContainer - Set jobId = 0
2023-11-28 10:29:24.830 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb.
2023-11-28 10:29:24.831 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
2023-11-28 10:29:25.113 [job-0] INFO  OriginalConfPretreatmentUtil - table:[t2] all columns:[id,createtime].
2023-11-28 10:29:25.113 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2023-11-28 10:29:25.115 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
insert INTO %s (id,createtime) VALUES(?,?)
], which jdbcUrl like:[jdbc:mysql://10..17.139.86:16310/testdb?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2023-11-28 10:29:25.116 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2023-11-28 10:29:25.117 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] do prepare work .
2023-11-28 10:29:25.117 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2023-11-28 10:29:25.118 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2023-11-28 10:29:25.118 [job-0] INFO  JobContainer - Job set Channel-Number to 5 channels.
2023-11-28 10:29:25.123 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] splits to [1] tasks.
2023-11-28 10:29:25.124 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2023-11-28 10:29:25.146 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2023-11-28 10:29:25.150 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2023-11-28 10:29:25.153 [job-0] INFO  JobContainer - Running by standalone Mode.
2023-11-28 10:29:25.159 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2023-11-28 10:29:25.165 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2023-11-28 10:29:25.165 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2023-11-28 10:29:25.176 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2023-11-28 10:29:25.183 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select * from t2 where (createtime > '2023-11-28 03:17:46.330' and createtime < '2023-11-28 03:31:46.330')
] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb].
2023-11-28 10:29:25.344 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select * from t2 where (createtime > '2023-11-28 03:17:46.330' and createtime < '2023-11-28 03:31:46.330')
] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb].
2023-11-28 10:29:25.606 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[431]ms
2023-11-28 10:29:25.607 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2023-11-28 10:29:35.173 [job-0] INFO  StandAloneJobContainerCommunicator - Total 4 records, 37 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2023-11-28 10:29:35.173 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2023-11-28 10:29:35.174 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2023-11-28 10:29:35.175 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] do post work.
2023-11-28 10:29:35.175 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2023-11-28 10:29:35.177 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /soft/datax/hook
2023-11-28 10:29:35.179 [job-0] INFO  JobContainer - 

 [total cpu info] => 
averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
-1.00%                         | -1.00%                         | -1.00%

 [total gc info] => 
 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
 PS MarkSweep         | 1                  | 1                  | 1                  | 0.052s             | 0.052s             | 0.052s             
 PS Scavenge          | 1                  | 1                  | 1                  | 0.024s             | 0.024s             | 0.024s             
2023-11-28 10:29:35.180 [job-0] INFO  JobContainer - PerfTrace not enable!
2023-11-28 10:29:35.181 [job-0] INFO  StandAloneJobContainerCommunicator - Total 4 records, 37 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%

2023-11-28 10:29:35.183 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2023-11-28 10:29:24
任务结束时刻                    : 2023-11-28 10:29:35
任务总计耗时                    :                 10s
任务平均流量                    :                3B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                   4
读写失败总数                    :                   0

4.8 Verify incremental data to the target (GreatSQL)

greatsql> select * from t2;

+----+---------------------+
| id | createtime          |
+----+---------------------+
|  1 | 2023-11-28 02:18:21 |
|  2 | 2023-11-28 02:18:27 |
|  3 | 2023-11-28 02:18:32 |
|  4 | 2023-11-28 02:18:38 |
|  5 | 2023-11-28 02:18:41 |
|  6 | 2023-11-28 02:18:46 |
|  7 | 2023-11-28 03:18:46 |
|  8 | 2023-11-28 03:20:46 |
|  9 | 2023-11-28 03:25:46 |
| 10 | 2023-11-28 03:30:46 |
+----+---------------------+
10 rows in set (0.00 sec)

Summary of incremental migration: The purpose of incremental migration is achieved by adding filter conditions. Mainly through filtering conditions, the full amount of migrated data is filtered out, and then incremental migration is completed in disguise.


Enjoy GreatSQL :)

About GreatSQL

GreatSQL is a domestic independent open source database suitable for financial-level applications. It has many core features such as high performance, high reliability, high ease of use, and high security. It can be used as an optional replacement for MySQL or Percona Server and is used in online production environments. , completely free and compatible with MySQL or Percona Server.

Related links: GreatSQL Community Gitee GitHub Bilibili

GreatSQL Community:

image

Community reward suggestions and feedback: https://greatsql.cn/thread-54-1-1.html

Community blog prize-winning submission details: https://greatsql.cn/thread-100-1-1.html

(If you have any questions about the article or have unique insights, you can go to the official community website to ask or share them~)

Technical exchange group:

WeChat & QQ group:

QQ group: 533341697

WeChat group: Add GreatSQL Community Assistant (WeChat ID: wanlidbc) as a friend and wait for the community assistant to add you to the group.

Linus took it upon himself to prevent kernel developers from replacing tabs with spaces. His father is one of the few leaders who can write code, his second son is the director of the open source technology department, and his youngest son is an open source core contributor. Robin Li: Natural language will become a new universal programming language. The open source model will fall further and further behind Huawei: It will take 1 year to fully migrate 5,000 commonly used mobile applications to Hongmeng. Java is the language most prone to third-party vulnerabilities. Rich text editor Quill 2.0 has been released with features, reliability and developers. The experience has been greatly improved. Ma Huateng and Zhou Hongyi shook hands to "eliminate grudges." Meta Llama 3 is officially released. Although the open source of Laoxiangji is not the code, the reasons behind it are very heart-warming. Google announced a large-scale restructuring
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/GreatSQL/blog/11053973