Actual combat of massive time series data migration based on DataX: from MySQL to TDengine3.x

background

MySQLIn the database, there are over 100 million equipment historical data tables, how to migrate to it quickly and at low cost TDengine3.x?

As can be seen from the title, the data migration/synchronization tool we use is that DataXthe data source ( Source) is a traditional relational database MySQL, and the target database ( Sink) is a new type of time-series database with scene characteristics TDengine.

DataX: It is an open source version of Alibaba Cloud DataWorks data integration , and it is an offline data synchronization tool/platform widely used in Alibaba Group. DataXIt realizes the efficient data synchronization function between various heterogeneous data sources including MySQL, Oracle, OceanBase, SqlServer, Postgre, HDFS, Hive, ADS, HBase, TableStore (OTS), MaxCompute (ODPS), Hologres, DRDS, databend, etc.

MySQL: slightly. .

TDengine: It is an open-source, high-performance, cloud-native time-series database (Time-Series Database, TSDB). TDengineIt can be widely used in the Internet of Things, Industrial Internet, Internet of Vehicles, operation and ITmaintenance, finance and other fields. In addition to the core time-series database functions, it TDenginealso provides functions such as caching, data subscription, and streaming computing. It is a minimalist time-series data processing platform that minimizes the complexity of system design and reduces R&D and operating costs.

Data migration from MySQLto TDengine3.xis facing the migration of heterogeneous data. First of all, we need to understand the difference between the data model of MySQLand TDengine. For details, please refer to a model comparison of electric meter data officially provided by Taosi Data: TDengine Getting Started Guide for MySQL Developers .

data model

Taking the case of reservoir water level monitoring as an example, in , MySQLwe will have 1 device information table (device number, manufacturer, model, etc.) and 1 device data table (sequence data collected by sensors).

2023-05-28-Device.jpg

2023-05-28-WaterTable.jpg

For MySQLthe 2 tables in , TDenginemodel with the design idea of ​​: after migrating to , TDengineit will become 1 super table + N (number of devices) sub-tables , and the name of each sub-table corresponds to MySQLeach device code in the device information table. Specifically, TDenginethe data model in is as follows:

create database if not exists sensor;
create stable if not exists sensor.water(ts timestamp, level float, status int) tags(district_code nchar(6), unit_id nchar(36), sensor_code int);

Only one super table is created here, and the specific sub-table will be MySQLautomatically created according to the device code in the device information table during data migration.

2023-05-28-Desc.jpg

Prepare Migration Tool

At the beginning, I downloaded it directly from https://github.com/taosdata/DataX's README: Download DataX download addressTDengine3.x , but later I found out that there is no version of writer; then I directly downloaded the source code of https://github.com/taosdata/DataX , compiled it locally to generate jara package, and put it in the directory DataXof plugin.

2023-05-28-mvn.jpg
Note: After the local source code mvn clean package -Dmaven.test.skip=trueis built and generated , copy the directory under , rename it to , correspondingly modify the and in it , and the directory under .tdengine30writer-0.0.1-SNAPSHOT.jar\datax\plugin\writertdenginewritertdengine30writerplugin.jsonplugin_job_template.jsonlibstaos-jdbcdriver-3.0.2.jar

2023-05-28-Plugin.jpg
At this point, the tool is ready, and the rest is to write the configuration script for data migration.

Migrating Device Information Sheet

job-water.json: The migration configuration script is divided into two parts: one is the data source, and the other is the target library. As a result of the step of migrating the device information table, all sub-tables are created: one table for one device.

  • Data source
    "name": "mysqlreader", when migrating the device information table, alias the device code tbname, and TDengineit will be automatically created as the name of the subtable.

Note: A letter d is added before the device code here, because TDenginethe table name cannot be a number in .

  • target library

"name": "tdengine30writer", columnlists the column names queried in the data source in the part, MySQLcorresponding to the sequence and name in the data source, tabledirectly write the name of the super table in the table name.

{
    
    
    "job": {
    
    
        "content": [
            {
    
    
                "reader": {
    
    
                    "name": "mysqlreader",
                    "parameter": {
    
    
                        "username": "root",
                        "password": "your-password",
                        "connection": [
                            {
    
    
                                "jdbcUrl": [
                                    "jdbc:mysql://your-ip:3306/iotdata?useSSL=false&serverTimezone=Asia/Shanghai"
                                ],
                                "querySql": [
                                    "select concat('d', code) as tbname, create_time as ts, sensor_code, district_code, unit_id from b_device WHERE sensor_code=2;"
                                ]
                            }
                        ]
                    }
                },
                "writer": {
    
    
                    "name": "tdengine30writer",
                    "parameter": {
    
    
                        "username": "root",
                        "password": "taosdata",
                        "column": [
                            "tbname",
                            "ts",
                            "sensor_code",
                            "district_code",
                            "unit_id"
                        ],
                        "connection": [
                            {
    
    
                                "table": [
                                    "water"
                                ],
                                "jdbcUrl": "jdbc:TAOS-RS://192.168.44.158:6041/sensor"
                            }
                        ],
                        "batchSize": 1000,
                        "ignoreTagsUnmatched": true
                    }
                }
            }
        ],
        "setting": {
    
    
            "speed": {
    
    
                "channel": 1
            }
        }
    }
}
  • Execute migration/sync script
D:\datax\bin>datax.py ../job/job-water.json

Migrating Device Data Sheet

job-water-data.json: The migration configuration script is divided into two parts: one is the data source, and the other is the target library. As a result of the step of migrating the device data table, the sensor data will be written into the corresponding sub-table according to the device number.

  • data source

When migrating the device data table, query the fields collected by the sensor, and also alias the device code tbname, and TDengineautomatically write the data into the corresponding sub-table.

  • target library

columnList the column names queried in the data source in the part, corresponding to MySQLthe sequence and name in the data source. When configuring the device data table, you need to pay attention to write tablethe names of all sub-tables in the table name.

{
    
    
    "job": {
    
    
        "content": [
            {
    
    
                "reader": {
    
    
                    "name": "mysqlreader",
                    "parameter": {
    
    
                        "username": "root",
                        "password": "your-password",
                        "connection": [
                            {
    
    
                                "jdbcUrl": [
                                    "jdbc:mysql://your-ip:3306/iotdata?useSSL=false&serverTimezone=Asia/Shanghai&net_write_timeout=600"
                                ],
                                "querySql": [
                                    "select concat('d', code) as tbname, create_time as ts, value as level, status from sensor_water;"
                                ]
                            }
                        ]
                    }
                },
                "writer": {
    
    
                    "name": "tdengine30writer",
                    "parameter": {
    
    
                        "username": "root",
                        "password": "taosdata",
                        "column": [
                            "tbname",
                            "ts",
                            "level",
                            "status"
                        ],
                        "connection": [
                            {
    
    
                                "table": [
                                    "d66057408201830",
                                    "d66057408063030",
                                    "d66057408027630",
                                    "d66057408208130",
                                    "d66057408009630",
                                    "d66057408000530",
                                    "d66057408067330",
                                    "d66057408025430"
                                ],
                                "jdbcUrl": "jdbc:TAOS-RS://192.168.44.158:6041/sensor"
                            }
                        ],
                        "encoding": "UTF-8",
                        "batchSize": 1000,
                        "ignoreTagsUnmatched": true
                    }
                }
            }
        ],
        "setting": {
    
    
            "speed": {
    
    
                "channel": 1
            }
        }
    }
}
  • Execute migration/sync script
D:\datax\bin>datax.py ../job/job-water-data.json

Problems that may be encountered when using DataX

DataX Chinese garbled characters

After execution D:\datax\bin>datax.py ../job/job.json, the Chinese output on the console is garbled.

  • Solution: Enter directly CHCP 65001and press Enter until the Active code page: 65001 appears in a new window, execute the job command again, and the Chinese will be displayed normally.

2023-05-28-SubTable.jpg

Plugin loading failed, specified plugin loading not completed: [mysqlreader, tdengine20writer]

  • Solution: The name of the plug-in used should be written correctly

com.alibaba.datax.common.exception. DataXException: Code:[TDengineWriter-00], Description:[parameter value is missing]. - The parameter [username] is not set.

  • Solution: The configuration items of TDengine2.0 and 3.0 are different, because I used the configuration of TDengine2.0 to migrate at the beginning, and the parameters can be modified according to the documentation of 3.0.

java.lang. ClassCastException: java.lang. String cannot be cast to java.util. List

  • Solution: The values ​​of jdbcUrl and querySql in the reading part of mysql reader need to be enclosed in "[]", which is a jdbcfixed template.

com.alibaba.datax.common.exception. DataXException: Code:[TDengineWriter-02], Description:[runtime exception]. - No suitable driver found for [“jdbc: TAOS-RS://192.168.44.158:6041/sensor”]

  • Solution: "jdbcUrl" on the writer side: "jdbc: TAOS-RS://192.168.44.158:6041/sensor", use a string instead of an array.

Null pointer error: ERROR WriterRunner - Writer Runner Received Exceptions:

java.lang.NullPointerException: null
        at com.taosdata.jdbc.rs.RestfulDriver.connect(RestfulDriver.java:111) ~[taos-jdbcdriver-2.0.37.jar:na]
        at java.sql.DriverManager.getConnection(Unknown Source) ~[na:1.8.0_311]
        at java.sql.DriverManager.getConnection(Unknown Source) ~[na:1.8.0_311]
        at com.alibaba.datax.plugin.writer.tdenginewriter.DefaultDataHandler.handle(DefaultDataHandler.java:75) ~[tdenginewriter-0.0.1-SNAPSHOT.jar:na]
  • Solution: See that taos-jdbcdriver uses the 2.0 jar package, download the DataX source code, compile and generate tdengine30writer-0.0.1-SNAPSHOT.jar, and copy the tdenginewriter folder to tdengine30writer, put tdengine30writer-0.0.1-SNAPSHOT.jar into tdengine30writer, delete tdengine30writer\ libs among them taos-jdbcdriver-2.0.37.jar, add taos-jdbcdriver-3.0.2.jar.

com.alibaba.datax.common.exception. DataXException: Code:[TDengineWriter-02], Description:[Runtime Exception]. - TDengine ERROR (2600): sql: describe 66057408201830, desc: syntax error near “66057408201830”

  • Solution: The table name cannot be a number, I added a letter d before the number.

com.mysql.jdbc.exceptions.jdbc4. CommunicationsException: Application was streaming results when the connection failed. Consider raising value of ‘net_write_timeout’ on the server.

  • Solution: URLIncrease this parameter on the connection of the data source, net_write_timeout/net_read_timeoutset it slightly larger, and the default is 60s.
    For example:jdbc:mysql://your-ip:3306/iotdata?useSSL=false&serverTimezone=Asia/Shanghai&net_write_timeout=600

MySQLView variable values ​​in : SHOW VARIABLES LIKE "net%".

2023-05-28-NetParam.jpg

small summary

The above is based on the actual record of time-series data migration DataXfrom MySQLto , with the help of the tool, the rapid migration of massive time-series data is completed through a configuration file-driven approach.TDengine3.xDataX

The actual migration test results show that 3,000+ reservoir water level sensing devices, 100 million+ historical data sheets, and 50 million+ were migrated in half a day.

Reference


If you have any questions or any bugs are found, please feel free to contact me.

Your comments and suggestions are welcome!

Guess you like

Origin blog.csdn.net/u013810234/article/details/130910778