Apache Pulsar computing and storage separation

First of all, let’s look at the cause of the expansion problem, which is limited by Kafka’s own architecture. Kafka uses partitions as read-write units, and partitions are bound to nodes, and these data will be written into the metadata storage. At this time, once a bottleneck occurs in the computing layer (CPU/network card) or storage layer (util), there is no way to make other nodes bear the pressure. If this problem is to be solved, Kafka will have to make major changes in its architecture.

From the perspective of architecture, I personally understand that the solution is: separation of computing and storage + storage segmentation. Apache Pulsar does this very well. Let's take a brief look at what Pulsar does. Look at the picture below:
insert image description here
Separation of computing and storage: it solves the problem of fast transfer of computing pressure. Compute nodes and storage nodes are separate. Computing nodes are only responsible for the processing of computing logic and are stateless nodes. When there is a bottleneck in the node, it can quickly expand horizontally.

Storage segmentation: It mainly solves the rapid transfer of IO pressure on the storage layer. Pulsar uses Bookkeeper as the storage layer. Pulsar divides logical partitions into multiple segments for management and storage at the actual storage level. If there is a bottleneck in a certain storage machine, just disable the segment on the machine and pull up a new segment on the new machine.

To sum up, once the Pulsar cluster encounters the bottleneck similar to the Kafka cluster mentioned above, it will be more elegant and convenient from the perspective of expansion. This is the advantage brought by the architecture itself.

Guess you like

Origin blog.csdn.net/qq798280904/article/details/130454413