This article tells you why TDengine data subscription is better than Kafka in time series scenarios

In  TDengine  3.0, we have further upgraded the streaming computing and data subscription functions, helping users greatly simplify the complexity of data architecture and reduce overall operation and maintenance costs. The data subscription and consumption interface provided by TDengine, which is similar to message queue products, is essentially to help applications obtain data written to TDengine in real time, or process data in the order of event arrival. Compared with other message queues, it provides greater flexibility It also effectively reduces the amount of data transmitted and the complexity of applications.

In this article, TDengine developers reveal the process and specific implementation of TDengine data subscription in detail, providing reference to those in need. Previously, we also summarized some important grammatical rules in the article "About TDengine 3.0 Data Subscription, What You Need to Know" . If you are studying the TDengine data subscription function, you can take a look together.

Classification of data subscriptions

TDengine supports multiple subscription types, including subquery result subscriptions, supertable subscriptions, and entire database subscriptions. Super table subscription and library subscription support the parameter with meta. After adding this parameter, the subscription result will contain the meta information of the data, which is generally used for data synchronization migration. The specific syntax is as follows:

  • column subscription
CREATE TOPIC topic_name as subquery;
  • library subscription
CREATE TOPIC topic_name as database db_name [with meta];
  • Super Table Subscription
CREATE TOPIC topic_name as stable stb_name [with meta];

Comparison with Kafka

The original intention of TDengine's products has always been to be simple and easy to use. Therefore, when doing data subscription functions, all APIs are benchmarked against Kafka. If someone has studied the model of TDengine in depth, they will find that its architectural model corresponds to many designs of Kafka. Topic is similar to Kafka, Vnode is also very close to Partition in Kafka, and the table name of the sub-table is similar to that in Kafka. Corresponds to the Event Key, so this architectural design naturally has the characteristics of a message queue. It is precisely based on this that TDengine can do the data subscription function so easily.

Compared with Kafka, the basic concepts of TDengine's data subscription function are the same, but the specific implementation methods may be different. The implementation path is as follows:

This article tells you why TDengine data subscription is better than Kafka in time series scenarios - TDengine Database time series database

In the time series data scenario, TDengine reduces users' dependence on Kafka. Its Vnode can allow different consumers to consume data at the same time. Users only need to subscribe to the part of the data they care about. For example, you only want to pay attention to the data that exceeds the limit in the current. , then the total amount of data transmission when you use TDengine for subscription is very small, but when you use Kafka for data subscription, you may need to pull all the data from the server, and then filter the data on the client. At this time, the two The performance of the controller is completely out of the same order of magnitude.

TDengine data subscription key parameter description

This article tells you why TDengine data subscription is better than Kafka in time series scenarios - TDengine Database time series database

Consume sample code

This article tells you why TDengine data subscription is better than Kafka in time series scenarios - TDengine Database time series database

TDengine data subscription process

Client functions

  • Submit commit
  • Get endpoint
  • heart leap daycare
  • consumption data

This article tells you why TDengine data subscription is better than Kafka in time series scenarios - TDengine Database time series database

The client-side processing logic in a single consumer thread is very simple, and there is no need to control resources concurrently.

Server-side functions

  • Consumption allocation control (rebalance) (c1 represents consumer ID, g1 represents group ID)

This article tells you why TDengine data subscription is better than Kafka in time series scenarios - TDengine Database time series database

This function is controlled by a timer and checks whether rebalance is required every 2 seconds. After rebalance, the consumer needs to obtain a new EP before it can consume normally. Otherwise, the consumer ID will not match and will be retried.

  • Consumption status control

This article tells you why TDengine data subscription is better than Kafka in time series scenarios - TDengine Database time series database

  • Consumption progress control

This article tells you why TDengine data subscription is better than Kafka in time series scenarios - TDengine Database time series database

Conclusion

The advantages of TDengine's data subscription and streaming computing functions are also reflected in the specific practices of enterprises. Taking Siemens' digital solution transformation project as an example, TDengine helped its SIMICAS® OEM 2.0 version to remove Flink, Kafka and Redis, greatly simplifying The system architecture saves operation and maintenance costs; during the transformation of the data architecture of Shiqiao Group's online goods platform and financial GPS system , after deploying TDengine, a complete set of last location Redis clusters and trajectory query Hbase clusters were directly offline. Fall collectively.

If you are also facing data processing problems that cannot achieve both performance and cost, and urgently need to upgrade your data architecture, you are welcome to add a small T vx: tdengine to communicate point-to-point with more professional solution architects.

About TDengine

The core of TDengine is a high-performance, clustered open source, cloud-native time series database ( Time Series Database , TSDB ). It is specially designed and optimized for scenarios such as the Internet of Things, Industrial Internet, electric power, and IT operation and maintenance, and has extremely strong elastic scalability. . At the same time, it also has built-in caching, streaming computing, data subscription and other system functions, which can greatly reduce the complexity of system design and reduce R&D and operating costs. It is a high-performance, distributed Internet of Things and industrial big data platform. . Currently, TDengine mainly provides two major versions, namely TDengine Enterprise that supports privatized deployment and TDengine Cloud, a fully managed Internet of Things and Industrial Internet cloud service platform. Both of them have more enhancements based on the functions of the open source time series database TDengine  OSS. Users You can choose a version based on your business volume and needs.


To learn more about the specific details of TDengine Database , you can view the relevant source code on GitHub .

Guess you like

Origin blog.csdn.net/taos_data/article/details/133028418