阿里巴巴离线数据同步dataX3.0实现定时数据同步
1、熟悉dataX3.0使用,网址:https://github.com/alibaba/DataX/wiki/Quick-Start
2、建立数据同步配置,创建作业的配置文件json文件
{
"job": {
"setting": {
"speed": {
"byte":10485760
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column" : [
{
"value": "DataX",
"type": "string"
},
{
"value": 19890604,
"type": "long"
},
{
"value": "1989-06-04 00:00:00",
"type": "date"
},
{
"value": true,
"type": "bool"
},
{
"value": "test",
"type": "bytes"
}
],
"sliceRecordCount": 100000
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"print": false,
"encoding": "UTF-8"
}
}
}
]
}
}
3、测试执行数据同步,要下载编译后的版本,并且要安装python2.6以上才能执行。
4、编写windows下批处理文档bat执行python脚本,同步昨天的数据。
# -*- coding:utf-8 -*-
## windows 定时任务
## author zhujunbo
## 该文件放在datax的bin目录下
import time
import datetime
import os
def startask(path, yesterday):
files = os.listdir(path)
for f in files:
if(os.path.isfile(path + '/' + f)):
## fileList.append(f)
file = path + f
#执行datax 命令
os.system('python D:\\datax\\bin\\datax.py -p ''-Dyesterday='+str(yesterday)+'' + ' ' + file);
#print 'python D:\\datax\\bin\\datax.py -p ''-Dyesterday='+str(yesterday)+'' + ' ' + file
if __name__ == "__main__":
today = datetime.date.today();
##昨天日期
yesterday = today - datetime.timedelta(1)
startask('D:\\datax\\job\\', yesterday)
5、windows定时任务脚本编写,定时任务设置、测试、运行
@echo off
D:
cd D:\datax\bin
start python autoDataSync.py
exit