Alibaba offline data synchronization dataX3.0 realizes timing data synchronization

Alibaba offline data synchronization dataX3.0 realizes timing data synchronization

1. Familiar with the use of dataX3.0, website: https://github.com/alibaba/DataX/wiki/Quick-Start

2. Establish data synchronization configuration and create job configuration file json file

{
    "job": {
        "setting": {
            "speed": {
                "byte":10485760
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        },
        "content": [
            {
                "reader": {
                    "name": "streamreader",
                    "parameter": {
                        "column" : [
                            {
                                "value": "DataX",
                                "type": "string"
                            },
                            {
                                "value": 19890604,
                                "type": "long"
                            },
                            {
                                "value": "1989-06-04 00:00:00",
                                "type": "date"
                            },
                            {
                                "value": true,
                                "type": "bool"
                            },
                            {
                                "value": "test",
                                "type": "bytes"
                            }
                        ],
                        "sliceRecordCount": 100000
                    }
                },
                "writer": {
                    "name": "streamwriter",
                    "parameter": {
                        "print": false,
                        "encoding": "UTF-8"
                    }
                }
            }
        ]
    }
}

3. To test execution data synchronization, download the compiled version and install python2.6 or above to execute.

 

4. Write a batch document bat under windows to execute a python script to synchronize yesterday's data.

 

# -*- coding:utf-8 -*-
## windows 定时任务
## author zhujunbo
## 该文件放在datax的bin目录下

import time
import datetime
import os

def startask(path, yesterday):
    files = os.listdir(path)
    for f in files:
        if(os.path.isfile(path + '/' + f)):
            ## fileList.append(f)
            file = path + f
            #执行datax 命令
            os.system('python D:\\datax\\bin\\datax.py -p ''-Dyesterday='+str(yesterday)+'' + '  ' +  file);

            #print  'python D:\\datax\\bin\\datax.py -p ''-Dyesterday='+str(yesterday)+'' + '  ' +  file

if __name__ == "__main__":
    today = datetime.date.today();
    ##Yesterday's date
    yesterday = today - datetime.timedelta(1)
    startask('D:\\datax\\job\\', yesterday)

5. Scripting of windows timing tasks, setting, testing and running of timing tasks

@echo off
D:
cd D: \ datax \ bin
start python autoDataSync.py
exit

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326309928&siteId=291194637