New version release: Apache InLong(incubating) enters the 1.0 era

Apache InLong (Yinglong) is a one-stop mass data integration platform that provides automatic, secure, reliable and high-performance data transmission capabilities, while supporting batch and streaming, facilitating business construction of streaming-based data analysis, modeling and analysis. application. InLong supports the functions of collection, aggregation, caching and sorting in the field of big data. Users only need simple configuration to import data from the data source to the real-time computing engine or to offline storage.

1. Introduction to Apache InLong (incubating) 

Apache InLong (Yinglong) is a one-stop massive data integration framework donated by Tencent to the Apache community. It provides automatic, secure, reliable and high-performance data transmission capabilities to facilitate business construction of streaming-based data analysis, modeling and applications. The InLong project, formerly known as TubeMQ, focuses on high-performance, low-cost message queuing services. In order to further release the ecological capabilities around TubeMQ, we upgraded the project to InLong, focusing on building a one-stop massive data integration framework.

Apache InLong is based on the TDBank used internally by Tencent. Relying on trillion-level data access and processing capabilities, Apache InLong integrates the entire process of data collection, aggregation, storage, sorting and data processing. It is easy to use, flexible expansion, stable and reliable, etc. characteristic.

 

 

Apache InLong serves the entire life cycle from data collection to landing, and provides different processing modules according to different stages of data, including:

  • inlong-agent, data collection agent, supports reading regular logs from specified directories or files and reporting them one by one. In the future, the capabilities of DB collection and HTTP reporting will also be expanded.
  • inlong-dataproxy, a Flume-ng-based Proxy component, supports data transmission blocking and disk retransmission, and has the ability to forward received data to different MQs (message queues).
  • inlong-tubemq, Tencent's self-developed message queue service, focuses on high-performance storage and transmission of massive data in big data scenarios, and has good core advantages in terms of mass practice and low cost.
  • Inlong-sort, performs ETL processing on data consumed from different MQs, and then aggregates and writes them to storage systems such as Hive, ClickHouse, HBase, Iceberg, etc.
  • inlong-manager provides complete data service management and control capabilities, including metadata, task flow, permissions, OpenAPI, etc.
  • inlong-audit provides data auditing services that are independent of data flow and cover the entire process of auditing.
  • inlong-dashboard, the front-end page for managing data access, simplifies the use of the entire InLong management and control platform.

 

 

Before version 1.0 (including 0.9.0 to 0.12.0), InLong focused on opening up basic links and building supporting capabilities.

In terms of basic links, the data links based on the two message queues of TubeMQ and Apache Pulsar have been completed, which respectively meet the usage scenarios of low cost, high performance and high consistency and high performance.

In terms of supporting capacity building, the simplification of the deployment steps of each module has been completed, and the deployment of stand-alone, Docker Compose and Kubernetes has been added; the construction of the indicators of each module has been completed, and the monitoring indicators of various dimensions have been enriched; the full-link data audit capability has been completed, allowing data "Location" is clearly visible.

In subsequent versions, InLong will first provide plug-in support to facilitate rapid expansion of new collection and storage flow directions; increase data flow management, including heartbeat status, data flow start and stop, etc.; at the same time, strengthen the stability and performance of the whole link , to increase batch data collection capabilities and multi-cluster management capabilities.

 

2. Main features of Apache InLong (incubating) version 1.0.0

The just-released 1.0.0-incubating mainly includes the following:

This release closed about 124+ issues, including 8 major features and 36 improvements.

InLong Sort supports single-tenant sorting

In version 1.0.0, Sort added a single-tenant-level sorting capability, which can support a collection flow to start a Flink task, which provides a basis for subsequent data flow state management.

InLong Sort supports Flink version 1.13.5

Students in the community have mentioned upgrading the Flink version before to support the use of FLink SQL in InLong. In version 1.0.0, Sort has completed the upgrade of Flink 1.13.5, which is convenient for Sort to expand new sinks and connect to public cloud scenarios.

InLong Sort supports Standalone mode

Sort can perform ETL processing on the data in MQ. Initially, Sort only has the Flink version. Although it can use the powerful real-time processing capabilities of Flink, it increases the requirements of the InLong project for the deployment environment. Users must have a Flink cluster to run InLong.

Starting from version 1.0.0, InLong introduced the Sort Standalone module to support data sorting in non-Flink scenarios.

Embedding and display of the whole process audit data

In the previous version, InLong introduced the data audit module, but the data embedding and display has not been completed, and the audit service cannot be fully used.

In version 1.0.0, InLong Audit not only optimizes the audit API and disaster recovery scenarios, but also completes the embedding and data display of all components, enabling the audit module to be deployed and available.

 

Supports authenticated access to Apache Pulsar

In previous versions, InLong supported Apache Pulsar based data links. In actual scenarios, Pulsar clusters are authenticated. In version 1.0.0, access to Apache Pulsar clusters with authentication is implemented.

DataProxy supports HTTP/UDP protocol

In order to facilitate users to directly use the DataProxy SDK to expand the capabilities of the InLong acquisition terminal, based on the original TCP protocol in version 1.0.0, we have opened the DataProxy HTTP/UDP protocol support.

Agent DB collection supports SQL collection

DB acquisition is a very common usage scenario in the field of data integration. InLong has begun to complement this capability to support mainstream relational databases and different incremental/full scenarios.

The 1.0.0 version first realized the collection of MySQL data through SQL, and completed the collection of other databases and Binlog in subsequent versions.

Other features and bug fixes

For related content, please refer to the version release notes (marked at the end of the article), which lists the features, enhancements and bug fixes of this version in detail, as well as specific contributors.

 

3. Apache InLong (incubating) follow-up planning

In subsequent versions, we will further strengthen the basic capacity building of InLong, expand more data sources and targets, and cover more usage scenarios, including:

  • Plug-in capability
  • Added Iceberg, ClickHouse, Kafka streams
  • Added relational database Binlog collection and Kafka collection

4. Apache InLong(incubating) Contributor Recruitment

Apache InLong(incubating) currently has a total of 84 contributors. It is still in the early stage of project incubation, and there are still many to-do items, including: Feature development, community operation, document translation, etc. We look forward to more open source enthusiasts joining InLong and bringing InLong together. Make it an Apache top-level project.

Below is the timeline of the InLong project:

  • December 22, 2021, release version 0.12.0
  • November 5, 2021, release version 0.11.0
  • September 3, 2021, release version 0.10.0
  • On July 12, 2021, the first version 0.9.0 vote after the name change was initiated
  • On April 11, 2021, the community name change was completed and changed to Apache InLong
  • On February 11, 2021, an application for community name change was initiated
  • On December 20, 2020, the project name change discussion and voting will be held
  • On May 30, 2020, the first community version was released in accordance with the Apache Community Specification
  • On November 3, 2019, entered the Apache community incubation
  • On September 12, 2019, TubeMQ was open sourced and donated to the Apache community

 

Attached:

Apache InLong project official website

https://inlong.apache.org

 

Apache InLong GitHub address

https://github.com/apache/incubator-inlong

 

Apache InLong release history

https://github.com/apache/incubatorinlong/blob/master/CHANGES.md

 

For more information, please pay attention to: Tencent Big Data Public Account

{{o.name}}
{{m.name}}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=324069851&siteId=291194637