ETL & streaming batch integration framework bboss v7.1.3 released

Data collection ETL & streaming batch integration framework bboss v7.1.3 released ---  efficient, stable, fast and safe

The biggest highlight of this version: Optimizing support for Clickhouse from both the persistence layer and ETL aspects, and adding a new Clickhouse client load balancing mechanism

bboss is an open source project released based on the open source agreement Apache License. It is operated and maintained by the open source team  bboss . It mainly consists of the following three parts: 

  • Elasticsearch Highlevel Java Restclient  , a high-performance and high-compatibility Elasticsearch/Opensearch java client framework

  • Data collection synchronization ETL  , a powerful ETL tool based on Java language to implement data collection operations, provides a wealth of input plug-ins and output plug-ins, and can easily expand new input plug-ins and output plug-ins based on plug-in specifications.

  • The stream-batch integrated computing framework provides a simple framework for flexible data indicator statistics and stream-batch integrated processing functions. It can be combined with data collection and synchronization ETL tools to implement data stream processing and batch processing calculations, and can also be used independently; calculation results can be saved to various relational databases, distributed data warehouses Elasticsearch, Clickhouse, etc. It is especially suitable for enterprise-level data analysis and computing scenarios with small data volume and scale. It has the characteristics of low cost, quick results, and easy operation and maintenance, helping enterprises to reduce costs. Increase efficiency.

Project source code address reference: Source code download and build

Get started quickly: https://esdoc.bbossgroups.com/#/quickstart

v7.1.3 functional improvements

  1. Add a load balancing mechanism to the Clickhouse data source to solve the problem that the Clickhouse-native-jdbc driver only has a disaster recovery function but no load balancing function. The usage method is as follows:

    Add b.balance and b.enableBalance parameters after the jdbc url address

    jdbc:clickhouse://101.13.6.4:29000,101.13.6.7:29000,101.13.6.6:29000/visualops?b.balance=roundbin&b.enableBalance=true

    b. When enableBalance is true, the load balancing mechanism is enabled and has the original disaster recovery function. Otherwise, it only has the disaster recovery function.

     b.balance specifies the load balancing algorithm, currently supports random (random algorithm, unfair mechanism) and    

     roundbin (polling algorithm, fair mechanism) two algorithms, the default random algorithm

In addition, you can also set it on DBConf, for example:

BConf tempConf = new DBConf();
tempConf.setPoolname(ds.getDbname());
tempConf.setDriver(ds.getDbdriver());
tempConf.setJdbcurl( ds.getDburl());
tempConf.setUsername(ds.getDbuser());
tempConf.setPassword(ds.getDbpassword());
tempConf.setValidationQuery(ds.getValidationQuery());
//tempConf.setTxIsolationLevel("READ_COMMITTED");
tempConf.setJndiName("jndi-"+ds.getDbname());
PropertiesContainer propertiesContainer = PropertiesUtil.getPropertiesContainer();
int initialConnections = propertiesContainer.getIntProperty("initialConnections",5);
tempConf.setInitialConnections(initialConnections);
int minimumSize = propertiesContainer.getIntProperty("minimumSize",5);
tempConf.setMinimumSize(minimumSize);
int maximumSize = propertiesContainer.getIntProperty("maximumSize",10);
tempConf.setMaximumSize(maximumSize);
tempConf.setUsepool(true);
tempConf.setExternal(false);
tempConf.setEncryptdbinfo(false);
boolean showsql = propertiesContainer.getBooleanProperty("showsql",true);
tempConf.setShowsql(showsql);
tempConf.setQueryfetchsize(null);
tempConf.setEnableBalance(true);
tempConf.setBalance(DBConf.BALANCE_RANDOM);
return SQLManager.startPool(tempConf);

Persistence layer use cases:

https://gitee.com/bboss/bestpractice/blob/master/persistent/src/com/frameworkset/sqlexecutor/TestClickHouseDB.java

ETL DB output plug-in case (DB input plug-in is similar):

https://gitee.com/bboss/bboss-datatran-demo/blob/main/src/main/java/org/frameworkset/elasticsearch/imp/clickhouse/Db2Clickhousedemo.java

  1. Optimize the stopping db data source mechanism and fix the problem that when stopping the data source and removing the data source information, the stop data source operation is not performed.

  2. Handling sqlite database creation statement compatibility issues

  3. Clickhouse-native-jdbc driver compatibility issue repair and processing

  4. Optimize the jvm rollout mechanism: close the IOC container and related resources by default when the jvm exits. Only when automatic shutdown is enabled, resources can be automatically closed and released when the jvm exits. Otherwise, the ShutdownUtil.shutdown() method needs to be manually called to release resources. Enable Automatically release resource methods:

    jvm命令行参数
        -DenableShutdownHook=true
    环境变量
        enableShutdownHook=true
    默认关闭:
        enableShutdownHook=false
  5. The file output plug-in file serial number has a perfect serial number rolling mechanism in scenarios such as kafka, mysql cdc, and MongoDB cdc.

  6. Improve the persistence layer error log: give a friendly prompt message when the data source does not exist

  7. Optimize Jackson's processing of localdatetime types. If the jackson-datatype-jsr310 plug-in is not introduced, ignore the exception of loading the localdatetime processing plug-in.

  8. Optimize event context reset mechanism based on message flow processing

  9. Deduplication is compatible with maven coordinates of older versions, and compatible version correspondence:

Old version coordinates New version coordinates
bboss-elasticsearch-rest-file2ftp bboss-datatran-fileftp
bboss-elasticsearch-rest-file bboss-datatran-fileftp
bboss-elasticsearch-rest-hbase bboss-datatran-hbase
bboss-elasticsearch-rest-jdbc bboss-datatran-jdbc
bboss-elasticsearch-rest-kafka1x bboss-datatran-kafka1x
bboss-elasticsearch-rest-kafka2x bboss-datatran-kafka2x
bboss-elasticsearch-rest-mongodb bboss-datatran-mongodb

Just refer to the corresponding relationship above to migrate the old version to the coordinates of the new version.

 

Import and use bboss

The latest version number of bboss version is obtained from the following document chapter [ 1.1 Importing bboss maven coordinates into the project ] :

https://esdoc.bbossgroups.com/#/db-es-tool

bboss ETL plug-in usage guide

https://esdoc.bbossgroups.com/#/datatran-plugins

bboss detailed introduction document

https://esdoc.bbossgroups.com/#/README

bboss actual combat video

Elasticsearch Bboss Stream ETL introduction video

Real-time collection of Mysql binlog data addition, deletion and modification video tutorial

bboss flow-batch integrated computing introductory tutorial

Elasticsearch exports Excel files and cuts Excel files according to the number of records to solve the problem of excessive export of a single file.

General database management tool - supports relational database, Clickhouse, doris and other database management

https://doc.bbossgroups.com/#/tools

Supongo que te gusta

Origin www.oschina.net/news/274695/bboss-7-1-3-released
Recomendado
Clasificación