Migrating DataX postgresql to tdenginue
Environmental preparation
Executing DataX requires java 1.8 and a python environment. Python supports 2.0 and 3.0. I use python3.6 here
Download DataX
unzipDataX
tar -zxvf dataX.tar.gz
Write DataX JOB
Write a job file, the suffix is json
{
"job": {
"setting": {
"speed": {
// 设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它.
"byte": 10485762
},
//出错限制
"errorLimit": {
//出错的record条数上限,当大于该值即报错。
"record": 0,
//出错的record百分比上限 1.0表示100%,0.02表示2%
"percentage": 0.02
}
},
"content": [{
"reader": {
"name": "postgresqlreader",
"parameter": {
// 数据库连接用户名
"username": "***",
// 数据库连接密码
"password": "***",
"column": [
"data_time", "product_id", "type", "content", "create_time", "device_id", "id"
],
//切分主键
//"splitPk": "id",
"connection": [{
"table": [
"迁移目标表"
],
//"querySql": [
// "SELECT 'device_log_djzttcc_d1' AS tbname,data_time AS _ts, product_id AS productid, type , content , device_id AS deviceid from logdev_d1;"
//],
"jdbcUrl": [
"jdbc:postgresql://ip:port/数据库名"
]
}]
}
},
"writer": {
"name": "tdengine30writer",
"parameter": {
"username": "***",
"password": "***",
"column": [
"_ts",
"productid",
"type",
"content",
"createtime",
"deviceid",
"id"
],
"connection": [{
"table": [
"迁移目的库"
],
"jdbcUrl": "jdbc:TAOS://ip:port/数据库名"
}],
"batchSize": 100,
"ignoreTagsUnmatched": true
}
}
}]
}
}
One thing to note here is that there are only td2.0 writers in the alibaba/datax warehouse. If you want to write td3.0, you need to go to the taosdata/datax warehouse to compile the tdengine30writer and put it in the datax plugin directory.
Noun explanation and modification points
- reader, writer refers to read and write plug-ins, the official support is as follows
type | data source | Reader (read) | Writer (write) | document |
---|---|---|---|---|
RDBMS relational database | MySQL | √ | √ | read , write |
Oracle | √ | √ | read , write | |
OceanBase | √ | √ | read , write | |
SQLServer | √ | √ | read , write | |
PostgreSQL | √ | √ | read , write | |
DRDS | √ | √ | read , write | |
Universal RDBMS (supports all relational databases) | √ | √ | read , write | |
Alibaba Cloud Data Warehouse Data Storage | ODPS | √ | √ | read , write |
ADS | √ | Write | ||
OSS | √ | √ | read , write | |
OCS | √ | Write | ||
NoSQL data storage | OTS | √ | √ | read , write |
Hbase0.94 | √ | √ | read , write | |
Hbase1.1 | √ | √ | read , write | |
Phoenix4.x | √ | √ | read , write | |
Phoenix5.x | √ | √ | read , write | |
MongoDB | √ | √ | read , write | |
Hive | √ | √ | read , write | |
Cassandra | √ | √ | read , write | |
unstructured data storage | TxtFile | √ | √ | read , write |
FTP | √ | √ | read , write | |
HDFS | √ | √ | read , write | |
Elasticsearch | √ | Write | ||
time series database | OpenTSDB | √ | read | |
TSDB | √ | √ | read , write | |
TDengine2.0 | √ | √ | read , write | |
TDengine3.0 | √ | √ | read , write |
- connection specifies the table to be queried or written and the address of the library connection
execute script
The job file .json written by python datax.py