Data collection ETL & streaming batch integration framework bboss v7.1.3 released --- efficient, stable, fast and safe
The biggest highlight of this version: Optimizing support for Clickhouse from both the persistence layer and ETL aspects, and adding a new Clickhouse client load balancing mechanism
bboss is an open source project released based on the open source agreement Apache License. It is operated and maintained by the open source team bboss . It mainly consists of the following three parts:
-
Elasticsearch Highlevel Java Restclient , a high-performance and high-compatibility Elasticsearch/Opensearch java client framework
-
Data collection synchronization ETL , a powerful ETL tool based on Java language to implement data collection operations, provides a wealth of input plug-ins and output plug-ins, and can easily expand new input plug-ins and output plug-ins based on plug-in specifications.
-
The stream-batch integrated computing framework provides a simple framework for flexible data indicator statistics and stream-batch integrated processing functions. It can be combined with data collection and synchronization ETL tools to implement data stream processing and batch processing calculations, and can also be used independently; calculation results can be saved to various relational databases, distributed data warehouses Elasticsearch, Clickhouse, etc. It is especially suitable for enterprise-level data analysis and computing scenarios with small data volume and scale. It has the characteristics of low cost, quick results, and easy operation and maintenance, helping enterprises to reduce costs. Increase efficiency.
Project source code address reference: Source code download and build
Get started quickly: https://esdoc.bbossgroups.com/#/quickstart
v7.1.3 functional improvements
-
Add a load balancing mechanism to the Clickhouse data source to solve the problem that the Clickhouse-native-jdbc driver only has a disaster recovery function but no load balancing function. The usage method is as follows:
Add b.balance and b.enableBalance parameters after the jdbc url address
jdbc:clickhouse://101.13.6.4:29000,101.13.6.7:29000,101.13.6.6:29000/visualops?b.balance=roundbin&b.enableBalance=true
b. When enableBalance is true, the load balancing mechanism is enabled and has the original disaster recovery function. Otherwise, it only has the disaster recovery function.
b.balance specifies the load balancing algorithm, currently supports random (random algorithm, unfair mechanism) and
roundbin (polling algorithm, fair mechanism) two algorithms, the default random algorithm
In addition, you can also set it on DBConf, for example:
BConf tempConf = new DBConf();
tempConf.setPoolname(ds.getDbname());
tempConf.setDriver(ds.getDbdriver());
tempConf.setJdbcurl( ds.getDburl());
tempConf.setUsername(ds.getDbuser());
tempConf.setPassword(ds.getDbpassword());
tempConf.setValidationQuery(ds.getValidationQuery());
//tempConf.setTxIsolationLevel("READ_COMMITTED");
tempConf.setJndiName("jndi-"+ds.getDbname());
PropertiesContainer propertiesContainer = PropertiesUtil.getPropertiesContainer();
int initialConnections = propertiesContainer.getIntProperty("initialConnections",5);
tempConf.setInitialConnections(initialConnections);
int minimumSize = propertiesContainer.getIntProperty("minimumSize",5);
tempConf.setMinimumSize(minimumSize);
int maximumSize = propertiesContainer.getIntProperty("maximumSize",10);
tempConf.setMaximumSize(maximumSize);
tempConf.setUsepool(true);
tempConf.setExternal(false);
tempConf.setEncryptdbinfo(false);
boolean showsql = propertiesContainer.getBooleanProperty("showsql",true);
tempConf.setShowsql(showsql);
tempConf.setQueryfetchsize(null);
tempConf.setEnableBalance(true);
tempConf.setBalance(DBConf.BALANCE_RANDOM);
return SQLManager.startPool(tempConf);
Persistence layer use cases:
ETL DB output plug-in case (DB input plug-in is similar):
-
Optimize the stopping db data source mechanism and fix the problem that when stopping the data source and removing the data source information, the stop data source operation is not performed.
-
Handling sqlite database creation statement compatibility issues
-
Clickhouse-native-jdbc driver compatibility issue repair and processing
-
Optimize the jvm rollout mechanism: close the IOC container and related resources by default when the jvm exits. Only when automatic shutdown is enabled, resources can be automatically closed and released when the jvm exits. Otherwise, the ShutdownUtil.shutdown() method needs to be manually called to release resources. Enable Automatically release resource methods:
jvm命令行参数 -DenableShutdownHook=true 环境变量 enableShutdownHook=true 默认关闭: enableShutdownHook=false
-
The file output plug-in file serial number has a perfect serial number rolling mechanism in scenarios such as kafka, mysql cdc, and MongoDB cdc.
-
Improve the persistence layer error log: give a friendly prompt message when the data source does not exist
-
Optimize Jackson's processing of localdatetime types. If the jackson-datatype-jsr310 plug-in is not introduced, ignore the exception of loading the localdatetime processing plug-in.
-
Optimize event context reset mechanism based on message flow processing
-
Deduplication is compatible with maven coordinates of older versions, and compatible version correspondence:
Old version coordinates | New version coordinates |
---|---|
bboss-elasticsearch-rest-file2ftp | bboss-datatran-fileftp |
bboss-elasticsearch-rest-file | bboss-datatran-fileftp |
bboss-elasticsearch-rest-hbase | bboss-datatran-hbase |
bboss-elasticsearch-rest-jdbc | bboss-datatran-jdbc |
bboss-elasticsearch-rest-kafka1x | bboss-datatran-kafka1x |
bboss-elasticsearch-rest-kafka2x | bboss-datatran-kafka2x |
bboss-elasticsearch-rest-mongodb | bboss-datatran-mongodb |
Just refer to the corresponding relationship above to migrate the old version to the coordinates of the new version.
Import and use bboss
The latest version number of bboss version is obtained from the following document chapter [ 1.1 Importing bboss maven coordinates into the project ] :
https://esdoc.bbossgroups.com/#/db-es-tool
bboss ETL plug-in usage guide
https://esdoc.bbossgroups.com/#/datatran-plugins
bboss detailed introduction document
https://esdoc.bbossgroups.com/#/README
bboss actual combat video
Elasticsearch Bboss Stream ETL introduction video
Real-time collection of Mysql binlog data addition, deletion and modification video tutorial
bboss flow-batch integrated computing introductory tutorial
General database management tool - supports relational database, Clickhouse, doris and other database management