Distributed generation message queue Apache Pulsar

About Pulsar

Apache Pulsar is an enterprise-class distributed messaging system, originally developed by Yahoo and open source in 2016, is currently under the Apache Incubator Foundation. Plusar Yahoo has been used in a production environment more than three years, primarily serving Mail, Finance, Sports, Flickr, the Gemini Ads platform, Sherpa and Yahoo's KV storage.
Pulsar was able to be called the next generation of message queues, mainly because of the following characteristics:

  • Linear expansion. It can be silky expansion to hundreds of nodes (Kafka expansion take up a lot of system resources required to copy data between nodes, but not completely Plusar)

  • High throughput. Yahoo has stood the test in the production environment, millions of messages per second

  • Low latency. In the massive amount of messages still able to maintain low latency (<5ms)

  • Persistence mechanism. Plusar persistence mechanism built on top of Apache BookKeeper, IO provides isolation before writing and reading

  • Copy locations based. Plusar copy multi-regional / area available as a primary feature support. Users simply configure the available area, the message will be copied to a steady stream of other available areas. When one of the available area of ​​a network partition or hang, plusar will continue after retries.

  • Diversification of deployment. Both can run on bare metal, but also support the current example Docker, some of K8S container schemes and different cloud vendors, while local developers only need to start the entire command line environment.

  • Topic supports a variety of consumption patterns: exclusive, shared, failover

Architecture Overview

From the top view, a Plusar unit composed of several clusters, the clusters within each unit may copy the data before, plusar generally have the following components:

  • Broker: Producer responsible for handling the message sent and distributed to consumers. To handle a variety of tasks through a collaborative global ZK cluster, for example, said location-based replication. And the message is stored BookKeeper, but also need to have a single cluster ZK cluster to store some metadata.

  • BookKeeper Cluster: comprising a plurality of internal bookies, for persistent messages.

  • ZooKeeper cluster


    640

Broker

In Kafka and RocketMQ in, Broker is responsible for storing and consumer consumption displacement of the stored message data, etc., and Plusar the broker and the two of them are different, plusar the broker is a stateless node, is responsible for three things:

  • REST interface for performing exposure query commands and administrator of the owner of the topic, etc.

  • TCP server, a protocol for asynchronous communication between the nodes of the current use of open source before Google Protocol Buffer

  • In order to support geo-replication, the message broker will publish its own cluster where the other available areas.

Messages are posted to BookKeeper first, and then cached copy of the Broker local memory, so in general will read the message read from the memory, so the first mentioned in the topic to find the owner to say because a ledger BookKeeper allows only a writer, so we can call the rest interface to get to a certain topic of the current owner.

BookKeeper

Bookkeeper is a laterally extended, fault tolerance, low latency distributed storage service, the basic unit is Bookkeeper recorded actually a byte array, and the array of record called ledger, BK will record a plurality of nodes copied to bookies, called ledger stored bookies, resulting in higher availability, and error tolerance. BK from the design stage to take into account various failures, downtime can Bookies, lost data, dirty data, but the entire main cluster have enough Bookies service behavior is correct.
In Pulsar, each partition is composed of several topic ledger composition, the ledger is an append-only data structure, allowing only a single Writer, ledger of each record is copied to a plurality of bookies, the ledger is a after closing (eg broker is down or reach a certain size) would only support reading, and when the data is no longer needed ledger (for example, all consumers have been consumed in the news this ledger) will been deleted.
640?wx_fmt=png

Bookkeeper的主要优势在于它可以保证在出现故障时在ledger的读取一致性。因为ledger只能被同时被一个writer写入,因为没有竞争,BK可以更高效的实现写入。在Broker宕机后重启时,Plusar会启动一个恢复的操作,从ZK中读取最后一个写入的Ledger并读取最后一个已提交的记录,然后所有的消费者也都被保证能看到同样的内容。

640?wx_fmt=png

我们知道Kafka在0.8版本之前是将消费进度存储到ZK中的,但是ZK本质上基于单个日志的中心服务,简单来讲,ZK的性能不会随着你增加更多的节点而线性增加,会只会相反减少,因为更多的节点意味着需要将日志同步到更多的节点,性能也会随之下降,因此QPS也会受单机性能影响,因此0.8版本之后就将消费进度存储到了Kafka的Topic中,而RocketMQ最初的版本也类似,有几种不同的实现例如ZK、数据库等,目前版本采用的是存储到本机文件系统中,而Plusar采用了和Kafka类似的思想,Plusar将消费进度也存储到了BK的ledger中。

640?wx_fmt=png

元数据

Plusar中的元数据主要存储到ZK中,例如不同可用区相关的配置会存在全局的ZK中,集群内部的ZK用于存储例如某个topic的数据写入到了那些Ledger、Broker目前的一些埋点数据等等

Plusar核心概念

Topic

发布订阅系统中最核心的概念是topic,简单来说,topic可以理解为一个管道,producer可以往这个管道丢消息,consumer可以从这个管道的另一端读取消息,但是这里可以有多个consumer同时从这个管道读取消息。
640?wx_fmt=png
每个topic可以划分为多个分区,同一个topic下的不同分区所包含的消息都是不同的。每个消息在被添加到一个分区后都会分配一个唯一的offset,在同一个分区内消息是有序的,因此客户端可以根据比如说用户ID进行一个哈希取模从而使得整个用户的消息都发往整个分区,从而一定程度上避免race condition的问题。
通过分区,将大量的消息分散到不同的节点处理从而获得高吞吐。默认情况下,plusar的topic都是非分区的,但是支持通过cli或者接口创建一定分区数目的topic。

640?wx_fmt=png


默认情况下Plusar会自动均衡Producer和Consumer,但有时候客户端想要根据自己的业务规则也进行路由,Plusar默认支持以下几种规则:单分区、轮询、哈希、自定义(即自己实现相关接口来定制路由规则)

消费模式

消费决定了消息具体是如何被分发到消费者的,Plusar支持几种不同的消费模式: exclusive、shared、failover。图示如下:

640?wx_fmt=png

  • Exclusive: 一个topic只能被一个消费者消费。Plusar默认就是这个模式

  • Shared: 共享模式或者叫轮询模式,多个消费者可以连接到同一个topic,消息被依次分发给消费者,当一个消费者宕机或者主动断开连接,那么发到那个消费者的还没有ack的消息会得到重新调度分发给其他消费者。

  • Failover: multiple consumers can be connected to the same topic and follow the lexicographical ordering, consumers will start spending the first message, called the master, when the master disconnect, not ack and all the rest of the message queue distribution to another consumer.
    Plusar now also supports another Reader interface, support incoming message ID, for example, say Message.Earliest to start spending from the earliest messages.

to sum up

Plusar distributed message queue as a next generation, has very much attractive properties, but also makes up for some shortcomings of other competing products, such as copy area, multi-tenant, scalability, isolation and the like to read and write.

640?wx_fmt=gif

640?wx_fmt=jpeg

Guess you like

Origin blog.csdn.net/u013411339/article/details/91488583