background
MySQL
In the database, there are over 100 million equipment historical data tables, how to migrate to it quickly and at low cost TDengine3.x
?
As can be seen from the title, the data migration/synchronization tool we use is that DataX
the data source ( Source
) is a traditional relational database MySQL
, and the target database ( Sink
) is a new type of time-series database with scene characteristics TDengine
.
DataX: It is an open source version of Alibaba Cloud DataWorks data integration , and it is an offline data synchronization tool/platform widely used in Alibaba Group. DataX
It realizes the efficient data synchronization function between various heterogeneous data sources including MySQL, Oracle, OceanBase, SqlServer, Postgre, HDFS, Hive, ADS, HBase, TableStore (OTS), MaxCompute (ODPS), Hologres, DRDS, databend, etc.
MySQL: slightly. .
TDengine: It is an open-source, high-performance, cloud-native time-series database (Time-Series Database, TSDB). TDengine
It can be widely used in the Internet of Things, Industrial Internet, Internet of Vehicles, operation and IT
maintenance, finance and other fields. In addition to the core time-series database functions, it TDengine
also provides functions such as caching, data subscription, and streaming computing. It is a minimalist time-series data processing platform that minimizes the complexity of system design and reduces R&D and operating costs.
Data migration from MySQL
to TDengine3.x
is facing the migration of heterogeneous data. First of all, we need to understand the difference between the data model of MySQL
and TDengine
. For details, please refer to a model comparison of electric meter data officially provided by Taosi Data: TDengine Getting Started Guide for MySQL Developers .
data model
Taking the case of reservoir water level monitoring as an example, in , MySQL
we will have 1 device information table (device number, manufacturer, model, etc.) and 1 device data table (sequence data collected by sensors).
For MySQL
the 2 tables in , TDengine
model with the design idea of : after migrating to , TDengine
it will become 1 super table + N (number of devices) sub-tables , and the name of each sub-table corresponds to MySQL
each device code in the device information table. Specifically, TDengine
the data model in is as follows:
create database if not exists sensor;
create stable if not exists sensor.water(ts timestamp, level float, status int) tags(district_code nchar(6), unit_id nchar(36), sensor_code int);
Only one super table is created here, and the specific sub-table will be MySQL
automatically created according to the device code in the device information table during data migration.
Prepare Migration Tool
At the beginning, I downloaded it directly from https://github.com/taosdata/DataX's README: Download DataX download addressTDengine3.x
, but later I found out that there is no version of writer; then I directly downloaded the source code of https://github.com/taosdata/DataX , compiled it locally to generate jar
a package, and put it in the directory DataX
of plugin
.
Note: After the local source code mvn clean package -Dmaven.test.skip=true
is built and generated , copy the directory under , rename it to , correspondingly modify the and in it , and the directory under .tdengine30writer-0.0.1-SNAPSHOT.jar
\datax\plugin\writer
tdenginewriter
tdengine30writer
plugin.json
plugin_job_template.json
libs
taos-jdbcdriver-3.0.2.jar
At this point, the tool is ready, and the rest is to write the configuration script for data migration.
Migrating Device Information Sheet
job-water.json
: The migration configuration script is divided into two parts: one is the data source, and the other is the target library. As a result of the step of migrating the device information table, all sub-tables are created: one table for one device.
- Data source
"name": "mysqlreader", when migrating the device information table, alias the device codetbname
, andTDengine
it will be automatically created as the name of the subtable.
Note: A letter d is added before the device code here, because TDengine
the table name cannot be a number in .
- target library
"name": "tdengine30writer", column
lists the column names queried in the data source in the part, MySQL
corresponding to the sequence and name in the data source, table
directly write the name of the super table in the table name.
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "your-password",
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://your-ip:3306/iotdata?useSSL=false&serverTimezone=Asia/Shanghai"
],
"querySql": [
"select concat('d', code) as tbname, create_time as ts, sensor_code, district_code, unit_id from b_device WHERE sensor_code=2;"
]
}
]
}
},
"writer": {
"name": "tdengine30writer",
"parameter": {
"username": "root",
"password": "taosdata",
"column": [
"tbname",
"ts",
"sensor_code",
"district_code",
"unit_id"
],
"connection": [
{
"table": [
"water"
],
"jdbcUrl": "jdbc:TAOS-RS://192.168.44.158:6041/sensor"
}
],
"batchSize": 1000,
"ignoreTagsUnmatched": true
}
}
}
],
"setting": {
"speed": {
"channel": 1
}
}
}
}
- Execute migration/sync script
D:\datax\bin>datax.py ../job/job-water.json
Migrating Device Data Sheet
job-water-data.json
: The migration configuration script is divided into two parts: one is the data source, and the other is the target library. As a result of the step of migrating the device data table, the sensor data will be written into the corresponding sub-table according to the device number.
- data source
When migrating the device data table, query the fields collected by the sensor, and also alias the device code tbname
, and TDengine
automatically write the data into the corresponding sub-table.
- target library
column
List the column names queried in the data source in the part, corresponding to MySQL
the sequence and name in the data source. When configuring the device data table, you need to pay attention to write table
the names of all sub-tables in the table name.
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "your-password",
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://your-ip:3306/iotdata?useSSL=false&serverTimezone=Asia/Shanghai&net_write_timeout=600"
],
"querySql": [
"select concat('d', code) as tbname, create_time as ts, value as level, status from sensor_water;"
]
}
]
}
},
"writer": {
"name": "tdengine30writer",
"parameter": {
"username": "root",
"password": "taosdata",
"column": [
"tbname",
"ts",
"level",
"status"
],
"connection": [
{
"table": [
"d66057408201830",
"d66057408063030",
"d66057408027630",
"d66057408208130",
"d66057408009630",
"d66057408000530",
"d66057408067330",
"d66057408025430"
],
"jdbcUrl": "jdbc:TAOS-RS://192.168.44.158:6041/sensor"
}
],
"encoding": "UTF-8",
"batchSize": 1000,
"ignoreTagsUnmatched": true
}
}
}
],
"setting": {
"speed": {
"channel": 1
}
}
}
}
- Execute migration/sync script
D:\datax\bin>datax.py ../job/job-water-data.json
Problems that may be encountered when using DataX
DataX Chinese garbled characters
After execution D:\datax\bin>datax.py ../job/job.json
, the Chinese output on the console is garbled.
- Solution: Enter directly
CHCP 65001
and press Enter until the Active code page: 65001 appears in a new window, execute the job command again, and the Chinese will be displayed normally.
Plugin loading failed, specified plugin loading not completed: [mysqlreader, tdengine20writer]
- Solution: The name of the plug-in used should be written correctly
com.alibaba.datax.common.exception. DataXException: Code:[TDengineWriter-00], Description:[parameter value is missing]. - The parameter [username] is not set.
- Solution: The configuration items of TDengine2.0 and 3.0 are different, because I used the configuration of TDengine2.0 to migrate at the beginning, and the parameters can be modified according to the documentation of 3.0.
java.lang. ClassCastException: java.lang. String cannot be cast to java.util. List
- Solution: The values of jdbcUrl and querySql in the reading part of mysql reader need to be enclosed in "[]", which is a
jdbc
fixed template.
com.alibaba.datax.common.exception. DataXException: Code:[TDengineWriter-02], Description:[runtime exception]. - No suitable driver found for [“jdbc: TAOS-RS://192.168.44.158:6041/sensor”]
- Solution: "jdbcUrl" on the writer side: "jdbc: TAOS-RS://192.168.44.158:6041/sensor", use a string instead of an array.
Null pointer error: ERROR WriterRunner - Writer Runner Received Exceptions:
java.lang.NullPointerException: null
at com.taosdata.jdbc.rs.RestfulDriver.connect(RestfulDriver.java:111) ~[taos-jdbcdriver-2.0.37.jar:na]
at java.sql.DriverManager.getConnection(Unknown Source) ~[na:1.8.0_311]
at java.sql.DriverManager.getConnection(Unknown Source) ~[na:1.8.0_311]
at com.alibaba.datax.plugin.writer.tdenginewriter.DefaultDataHandler.handle(DefaultDataHandler.java:75) ~[tdenginewriter-0.0.1-SNAPSHOT.jar:na]
- Solution: See that taos-jdbcdriver uses the 2.0 jar package, download the DataX source code, compile and generate tdengine30writer-0.0.1-SNAPSHOT.jar, and copy the tdenginewriter folder to tdengine30writer, put tdengine30writer-0.0.1-SNAPSHOT.jar into tdengine30writer, delete tdengine30writer\ libs among them taos-jdbcdriver-2.0.37.jar, add taos-jdbcdriver-3.0.2.jar.
com.alibaba.datax.common.exception. DataXException: Code:[TDengineWriter-02], Description:[Runtime Exception]. - TDengine ERROR (2600): sql: describe 66057408201830, desc: syntax error near “66057408201830”
- Solution: The table name cannot be a number, I added a letter d before the number.
com.mysql.jdbc.exceptions.jdbc4. CommunicationsException: Application was streaming results when the connection failed. Consider raising value of ‘net_write_timeout’ on the server.
- Solution:
URL
Increase this parameter on the connection of the data source,net_write_timeout/net_read_timeout
set it slightly larger, and the default is 60s.
For example:jdbc:mysql://your-ip:3306/iotdata?useSSL=false&serverTimezone=Asia/Shanghai&net_write_timeout=600
MySQL
View variable values in : SHOW VARIABLES LIKE "net%"
.
small summary
The above is based on the actual record of time-series data migration DataX
from MySQL
to , with the help of the tool, the rapid migration of massive time-series data is completed through a configuration file-driven approach.TDengine3.x
DataX
The actual migration test results show that 3,000+ reservoir water level sensing devices, 100 million+ historical data sheets, and 50 million+ were migrated in half a day.
Reference
- https://github.com/taosdata/DataX
- MysqlReader plugin documentation
- DataX TDengineWriter plugin documentation
- https://developer.aliyun.com/ask/430332
- TDengine 2.* version data migration tool based on DataX
If you have any questions or any bugs are found, please feel free to contact me.
Your comments and suggestions are welcome!