Data acquisition ETL tool Elasticsearch-datatran v6.5.5 released

The data acquisition ETL tool Elasticsearch-datatran v6.5.5 was released,

v6.5.5 function improvement

  1. Bring a new and revised version of bboss official website, welcome to experience: https://www.bbossgroups.com

  2. Data synchronization mechanism optimization: tran logic reuse optimization of each plug-in

  3. ftp/sftp file download lock optimization, greatly improving the performance of the file collection plug-in

  4. Increase the parallel download mechanism of ftp/sftp files, realize the number of parallel download threads through setDownloadWorkThreads, the default is 3, if it is set to 0, it means serial download

    FtpConfig ftpConfig = new FtpConfig().setFtpIP("10.13.6.127").setFtpPort(21)
                 .setFtpUser("ecsftp").setFtpPassword("ecsftp").setDownloadWorkThreads(4)//设置4个线程并行下载文件,可以允许最多4个文件同时下载
                 .setRemoteFileDir("xcm").setRemoteFileValidate(new RemoteFileValidate() {
                     /**
                      * 校验数据文件合法性和完整性接口
    
                      * @param validateContext 封装校验数据文件信息
                      *     dataFile 待校验零时数据文件,可以根据文件名称获取对应文件的md5签名文件名、数据量稽核文件名称等信息,
                      *     remoteFile 通过数据文件对应的ftp/sftp文件路径,计算对应的目录获取md5签名文件、数据量稽核文件所在的目录地址
                      *     ftpContext ftp配置上下文对象
                      *     然后通过remoteFileAction下载md5签名文件、数据量稽核文件,再对数据文件进行校验即可
                      *     redownload 标记校验来源是否是因校验失败重新下载文件导致的校验操作,true 为重下后 文件校验,false为第一次下载校验
                      * @return int
                      * 文件内容校验成功
                      *     RemoteFileValidate.FILE_VALIDATE_OK = 1;
                      *     校验失败不处理文件
                      *     RemoteFileValidate.FILE_VALIDATE_FAILED = 2;
                      *     文件内容校验失败并备份已下载文件
                      *     RemoteFileValidate.FILE_VALIDATE_FAILED_BACKUP = 3;
                      *     文件内容校验失败并删除已下载文件
                      *     RemoteFileValidate.FILE_VALIDATE_FAILED_DELETE = 5;
                      */
                     public Result validateFile(ValidateContext validateContext) {
    //                        if(redownload)
    //                            return Result.default_ok;
    ////                        return Result.default_ok;
    //                        Result result = new Result();
    //                        result.setValidateResult(RemoteFileValidate.FILE_VALIDATE_FAILED_REDOWNLOAD);
    //                        result.setRedownloadCounts(3);
    //                        result.setMessage("MD5校验"+remoteFile+"失败,重试3次");//设置校验失败原因信息
    //                        //根据remoteFile的信息计算md5文件路径地址,并下载,下载务必后进行签名校验
    //                        //remoteFileAction.downloadFile("remoteFile.md5","dataFile.md5");
    //                        return result;
                         return Result.default_ok;
                     }
                 })
  5. Improve data synchronization job task monitoring indicator statistics

  6. Increase data batch/serial synchronous write redis case

  7. Add a remote data file verification mechanism to realize the functions of md5 signature verification, record number verification, and retry download of verification failure (support setting the number of retry downloads), etc.

  8. Improved data processing: the getValue method of the Context interface supports obtaining the parsed log file record field value

  9. Desensitize the data source password in the job startup log

Elasticsearch-datatran Features

Elasticsearch-datatran is an open source data acquisition and synchronization ETL tool  from  bboss  , which provides data acquisition, data cleaning, conversion processing, and data storage functions. Supports massive data collection and synchronization between multiple data sources such as Elasticsearch, relational databases (mysql, oracle, db2, sqlserver, Dameng, etc.), Mongodb, HBase, Hive, Kafka, text files, excel files, and SFTP/FTP; support Real-time incremental collection and full collection of data; support for data record cutting according to fields; support for multi-level file path information to write different file data into different database tables .

Provides the function of custom processing and collecting data. You can process the collected data to the destination according to your own requirements. If you need to customize and save the data to a specific place, you can implement the CustomOutPut interface for processing.

The unique feature of Elasticsearch-datatran  is that its data synchronization job is developed in java language, which is small and exquisite, and can use all the functions provided by java and the existing component framework to process and process massive stock data and real-time incremental data at will; According to data scale and synchronization performance requirements, configure and adjust the memory, worker threads, and thread queue sizes required for data collection and synchronization jobs as needed; jobs can be run independently, or embedded in various applications developed based on java to run collectively; With job task monitoring api, job start and stop api, you can easily customize an ETL management tool of your own.

If you are still suffering from the inability of open source tools such as logstash, flume, and filebeat to meet complex and massive data processing scenarios, then Elasticsearch-datatran will be a good choice.

Elasticsearch version compatibility: support mutual data migration between various Elasticsearch versions (1.x, 2.x, 5.x, 6.x, 7.x, 8.x+)

Full-featured file data collection plug-in: supports downloading various files from ftp/sftp in parallel, collecting and processing various file data in parallel

import bboss

The general project can import the following maven coordinates:

        <dependency>
            <groupId>com.bbossgroups.plugins</groupId>
            <artifactId>bboss-elasticsearch-rest-jdbc</artifactId>
            <version>6.5.5</version>
        </dependency>

If it is a spring boot project, you also need to import the following maven coordinates:

        <dependency>
            <groupId>com.bbossgroups.plugins</groupId>
            <artifactId>bboss-elasticsearch-spring-boot-starter</artifactId>
            <version>6.5.5</version>
        </dependency>

Guess you like

Origin www.oschina.net/news/189887