In-depth understanding of Apache Pulsar's hierarchical storage

1. The role of tiered storage

Pulsar allows users to store topic backlogs of any size. But if all the messages are stored in Bookkeeper, it is necessary to continuously expand the number of Bookkeeper clusters, and the system will automatically balance the data, which is very costly.

So Pulsar has the concept of hierarchical storage, which stores long-ago historical messages in HDFS. Pulsar's message is composed of fragments. Except for the last fragment data currently written, which is changed, other fragment data has been encapsulated and will not change. So copying historical shards to HDFS will not break data integrity. After the copy is complete, the data pointer in the message log metadata can be updated immediately, and the data copy stored in BookKeeper can be deleted

Messages stored in Bookkeeper or hierarchical storage are transparent to users

2. Enable tiered storage

Modify the broker.conf of all servers in the Pulsar cluster, configure the uninstallation address and path, and enable the automatic operation of the uninstallation

For details, please refer to the official website Pulsar tiered storage

Guess you like

Origin blog.csdn.net/yy8623977/article/details/125254022
Recommended