Apache Pulsar is an all-in-one messaging and streaming platform. Messages can be consumed and acknowledged individually, or as a stream with a latency of less than 10 milliseconds. Its layered architecture allows rapid scaling across hundreds of nodes without data reorganization.
Its features include multi-tenancy with resource separation and access control, geo-replication across regions, tiered storage, and support for the six official client languages. It supports up to a million unique themes and is designed to simplify your application architecture.
Pulsar is an Apache Software Foundation top 10 project with a vibrant and enthusiastic community and a user base ranging from small businesses to large enterprises.
Official website: https://pulsar.apache.org/
theory
Apache Pulsar, a top-level project of the Apache Software Foundation, is a next-generation cloud-native distributed message flow platform that integrates message, storage, and lightweight function computing. It adopts a separate computing and storage architecture design, supports multi-tenancy, persistent storage, and cross-regional replication. It has strong consistency, high throughput, low latency, and high scalability and other streaming data storage features.
Pulsar was born in 2012. The original purpose was to integrate other messaging systems within Yahoo and build a unified logic, support large cluster and cross-regional messaging platform. Other messaging systems (including Kafka) at that time could not meet Yahoo's needs, such as large cluster multi-tenancy, stable and reliable IO service quality, million-level topics, cross-regional replication, etc., so Pulsar came into being.
The key features of Pulsar are as follows
A single instance of Pulsar natively supports multiple clusters, and can seamlessly replicate messages between clusters across computer rooms.
● Extremely low publishing latency and end-to-end latency
● Can seamlessly expand to more than one million topics
● Simple client API, supports Java, Go, Python and C++
● Supports multiple topic subscription modes (exclusive subscription, shared subscription, failover subscription)
● Guaranteed message delivery through the persistent message storage mechanism provided by Apache BookKeeper
● Stream-native data processing is realized by the lightweight serverless computing framework Pulsar Functions.
● Pulsar IO, a serverless connector framework based on Pulsar Functions, makes it easier to move data into and out of Apache Pulsar.
● Tiered storage can offload data from hot storage to cold/long-term storage (such as S3, GCS) when the data becomes stale.
concept
The official website introduces the source of
the Producer message and is also the publisher of the message, responsible for sending the message to the topic.
Consumer The consumer of the message is responsible for subscribing and consuming messages from the topic.
The carrier of Topic message data. In Pulsar, Topic can be divided into multiple partitions. If not set, there is only one partition by default.
Broker Broker is a stateless component, which is mainly responsible for receiving messages sent by Producer and delivering them to Consumer.
BookKeeper's distributed pre-write log system provides storage services for message systems than Pulsar, and provides cross-machine replication for multiple data centers.
Bookie Bookie is an Apache BookKeeper server that provides persistence for messages.
Cluster Apache Pulsar instance cluster, consisting of one or more instances.
cloud native architecture
Apache Pulsar adopts an architecture that separates computing and storage, and is not coupled with computing logic, enabling independent data expansion and fast recovery. With the development of cloud native, the computing-storage separation architecture appears more and more frequently in various systems. The Broker layer of Pulsar is a stateless computing logic layer, which is mainly responsible for receiving and distributing messages, while the storage layer is composed of Bookie nodes, responsible for storing and reading messages.
Pulsar's computing-storage-separated architecture can achieve unlimited horizontal expansion. If the system has many Producers and Consumers, it can directly expand the computing logic layer Broker without being affected by data consistency. If it is not this kind of architecture, when we expand the capacity, the computing logic and storage will change in real time, and it is easy to be limited by data consistency. At the same time, the logic of the computing layer is complex and error-prone, while the logic of the storage layer is relatively simple, and the probability of error is relatively small. Under this architecture, if an error occurs at the computing layer, it can be recovered unilaterally without affecting the storage layer.
Pulsar also supports data tiered storage, which can move old messages to cheap storage solutions, while the latest messages can be stored in SSD. This can save costs and maximize the use of resources.
A Pulsar cluster consists of multiple Pulsar instances, including
- Multiple Broker instances, responsible for receiving and distributing messages
- A ZooKeeper service that coordinates the cluster configuration
- BookKeeper server cluster Bookie, used for message persistence
- Message synchronization between clusters through cross-regional replication
design principle
Pulsar adopts the publish-subscribe design pattern (pub-sub). In this design pattern, the producer publishes messages to the topic, and the consumer subscribes to the messages in the topic and sends ack confirmation after the processing is completed.
features
deploy
Docker
Start Pulsar in Docker
docker run -it -p 6650:6650 -p 8080:8080 --mount source=pulsardata,target=/pulsar/data --mount source=pulsarconf,target=/pulsar/conf apachepulsar/pulsar:3.0.0 bin/pulsar standalone
If you want to change the Pulsar configuration and start Pulsar, run the following command by passing the environment variable with PULSAR_PREFIX_ prefix. See the default configuration file for more details.
docker run -it -e PULSAR_PREFIX_xxx=yyy -p 6650:6650 -p 8080:8080 --mount source=pulsardata,target=/pulsar/data --mount source=pulsarconf,target=/pulsar/conf apachepulsar/pulsar:2.10.0 sh -c "bin/apply-config-from-env.py conf/standalone.conf && bin/pulsar standalone"
Recommendations:
● By default, docker containers run with UID 10000 and GID 0. Make sure that the mounted volume provides write permission for UID 10000 or GID 0. Note that UID 10000 is arbitrary, so it is recommended to make these mounts writable by the root group (GID 0).
● Data, metadata and configuration are persisted on Docker volumes to avoid "rebooting" each time the container restarts. To learn more about volumes, you can use the docker volume inspect command.
● For Docker on Windows, make sure it is configured to use Linux containers.
After successfully starting Pulsar, you can see info level log messages as follows:
08:18:30.970 [main] INFO org.apache.pulsar.broker.web.WebService - HTTP Service started at http://0.0.0.0:8080
...
07:53:37.322 [main] INFO org.apache.pulsar.broker.PulsarService - 消息服务准备就绪, bootstrap service port = 8080, broker url= pulsar://localhost:6650, cluster=standalone, configs=org.apache.pulsar.broker.ServiceConfiguration@98b63c1
...
If you need to perform a health check, you can use bin/pulsar-admin brokers healthcheck
the command. (pulsar-admin is a tool for managing Pulsar entities)
When starting a local standalone cluster, public/default
a namespace is automatically created. Namespaces are used for development purposes. All Pulsar themes are managed in namespaces.
Use Pulsar in Docker
If you're running a local standalone cluster, you can use one of these root urls to interact with your cluster:
pulsar://localhost:6650
http://localhost:8080
The following example guides you to get started with Pulsar by using the Python client API.
Install the Pulsar Python client library directly from PyPI:
pip install pulsar-client
use message
Create a consumer and subscribe to the topic: Create a consumer and subscribe to the topic
import pulsar
client = pulsar.Client('pulsar://localhost:6650')
consumer = client.subscribe('my-topic', subscription_name='my-sub')
while True:
msg = consumer.receive()
print("Received message: '%s'" % msg.data())
consumer.acknowledge(msg)
client.close()
generate message
Start a producer to send some test messages: Start a producer to send some test messages
import pulsar
client = pulsar.Client('pulsar://localhost:6650')
producer = client.create_producer('my-topic')
for i in range(10):
producer.send(('hello-pulsar-%d' % i).encode('utf-8'))
client.close()
Get the topic statistics
In Pulsar, you can use the REST API, Java, or command-line tools to control every aspect of the system. For details about the API, see Admin API Overview.
In the simplest example, you can use curl to probe statistics for a specific topic:
curl http://localhost:8080/admin/v2/persistent/public/default/my-topic/stats | python -m json.tool
The output is like this:
{
···
"consumers": [
{
"msgRateOut": 1.8332950480217471,
"msgThroughputOut": 91.33142602871978,
"bytesOutCounter": 6607,
"msgOutCounter": 133,
"msgRateRedeliver": 0.0,
"chunkedMessageRate": 0.0,
"consumerName": "3c544f1daa",
"availablePermits": 867,
"unackedMessages": 0,
"avgMessagesPerEntry": 6,
"blockedConsumerOnUnackedMsgs": false,
"lastAckedTimestamp": 1625389546162,
"lastConsumedTimestamp": 1625389546070,
"metadata": {
},
"address": "/127.0.0.1:35472",
"connectedSince": "2021-07-04T08:58:21.287682Z",
"clientVersion": "2.8.0"
}
],
···
}
Docker-compose
stand-alone
https://jpinjpblog.wordpress.com/2020/12/10/pulsar-with-manager-and-dashboard-on-docker-compose/
version: "3.5"
services:
pulsar:
image: "apachepulsar/pulsar:2.6.2"
command: bin/pulsar standalone
environment:
PULSAR_MEM: " -Xms512m -Xmx512m -XX:MaxDirectMemorySize=1g"
volumes:
- ./pulsar/data:/pulsar/data
ports:
- "6650:6650"
- "8080:8080"
restart: unless-stopped
networks:
- network_test_bed
pulsar-manager:
image: "apachepulsar/pulsar-manager:v0.2.0"
ports:
- "9527:9527"
- "7750:7750"
depends_on:
- pulsar
environment:
SPRING_CONFIGURATION_FILE: /pulsar-manager/pulsar-manager/application.properties
networks:
- network_test_bed
redis:
image: "redislabs/redistimeseries:1.4.7"
ports:
- "6379:6379"
volumes:
- ./redis/redis-data:/var/lib/redis
environment:
- REDIS_REPLICATION_MODE=master
- PYTHONUNBUFFERED=1
networks:
- network_test_bed
alertmanager:
image: prom/alertmanager:v0.21.0
ports:
- "9093:9093"
volumes:
- ./alertmanager/:/etc/alertmanager/
networks:
- network_test_bed
restart: always
command:
- '--config.file=/etc/alertmanager/config.yml'
- '--storage.path=/alertmanager'
prometheus:
image: prom/prometheus:v2.23.0
volumes:
- ./prometheus/standalone.prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
networks:
- network_test_bed
grafana:
image: streamnative/apache-pulsar-grafana-dashboard:0.0.14
environment:
PULSAR_CLUSTER: "standalone"
PULSAR_PROMETHEUS_URL: "http://163.221.68.230:9090"
restart: unless-stopped
ports:
- "3000:3000"
networks:
- network_test_bed
depends_on:
- prometheus
networks:
network_test_bed:
name: network_test_bed
driver: bridge
cluster
method one
# 安装
curl -SL https://github.com/docker/compose/releases/download/v2.19.1/docker-compose-linux-x86_64 -o /usr/bin/docker-compose && chmod +x /usr/bin/docker-compose
# 部署
version: '3'
services:
# Start zookeeper
zookeeper:
image: apachepulsar/pulsar:latest
container_name: zookeeper
restart: on-failure # 失败后重启
# user: root # 当镜像是apachepulsar/pulsar:3.0.0时需要开启
networks:
- pulsar
volumes:
- ./data/zookeeper:/pulsar/data/zookeeper
environment:
- metadataStoreUrl=zk:zookeeper:2181
- PULSAR_MEM=-Xms256m -Xmx256m -XX:MaxDirectMemorySize=256m
command: >
bash -c "bin/apply-config-from-env.py conf/zookeeper.conf && \
bin/generate-zookeeper-config.sh conf/zookeeper.conf && \
exec bin/pulsar zookeeper"
healthcheck:
test: ["CMD", "bin/pulsar-zookeeper-ruok.sh"]
interval: 10s
timeout: 5s
retries: 30
# Init cluster metadata
pulsar-init:
container_name: pulsar-init
hostname: pulsar-init
image: apachepulsar/pulsar:latest
restart: on-failure # 失败后重启
# user: root # 当镜像是apachepulsar/pulsar:3.0.0时需要开启
networks:
- pulsar
command: >
bin/pulsar initialize-cluster-metadata \
--cluster cluster-a \
--zookeeper zookeeper:2181 \
--configuration-store zookeeper:2181 \
--web-service-url http://broker:8080 \
--broker-service-url pulsar://broker:6650
depends_on:
zookeeper:
condition: service_healthy
# Start bookie
bookie:
image: apachepulsar/pulsar:latest
container_name: bookie
restart: on-failure
# user: root # 当镜像是apachepulsar/pulsar:3.0.0时需要开启
networks:
- pulsar
environment:
- clusterName=cluster-a
- zkServers=zookeeper:2181
- metadataServiceUri=metadata-store:zk:zookeeper:2181
# 否则每次我们运行docker时,由于Cookie的原因,我们都无法启动
# 查看: https://github.com/apache/bookkeeper/blob/405e72acf42bb1104296447ea8840d805094c787/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Cookie.java#L57-68
- advertisedAddress=bookie
- BOOKIE_MEM=-Xms512m -Xmx512m -XX:MaxDirectMemorySize=256m
depends_on:
zookeeper:
condition: service_healthy
pulsar-init:
condition: service_completed_successfully
# 将本地目录映射到容器,避免由于容器磁盘不足导致bookie启动失败
volumes:
- ./data/bookkeeper:/pulsar/data/bookkeeper
command: bash -c "bin/apply-config-from-env.py conf/bookkeeper.conf && exec bin/pulsar bookie"
# Start broker
broker:
image: apachepulsar/pulsar:latest
container_name: broker
hostname: broker
restart: on-failure
# user: root # 当镜像是apachepulsar/pulsar:3.0.0时需要开启
networks:
- pulsar
environment:
- metadataStoreUrl=zk:zookeeper:2181
- zookeeperServers=zookeeper:2181
- clusterName=cluster-a
- managedLedgerDefaultEnsembleSize=1
- managedLedgerDefaultWriteQuorum=1
- managedLedgerDefaultAckQuorum=1
- advertisedAddress=broker
# 将Broker的Listener信息发布到Zookeeper中,供Clients(Producer/Consumer)使用
- advertisedListeners=external:pulsar://broker:6650,external1:pulsar://127.0.0.1:66500
- PULSAR_MEM=-Xms512m -Xmx512m -XX:MaxDirectMemorySize=256m
depends_on:
zookeeper:
condition: service_healthy
bookie:
condition: service_started
expose:
- 8080
- 6650
ports:
- "6650:6650"
- "8080:8080"
volumes:
- ./data/broker/data:/pulsar/data/
- ./data/broker/conf:/pulsar/conf
- ./data/broker/logs:/pulsar/logs
- ./data/ssl/:/pulsar/ssl
command: bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar broker"
pulsar-manager:
image: apachepulsar/pulsar-manager:v0.3.0 # :v0.4.0也有
container_name: pulsar-manager
hostname: pulsar-manager
restart: always
networks:
- pulsar
ports:
- "9527:9527" # 前端端口
- "7750:7750" # 后端端口
depends_on:
- broker
links:
- broker
environment:
SPRING_CONFIGURATION_FILE: /pulsar-manager/pulsar-manager/application.properties
volumes:
- ./pulsar-manager/dbdata:/pulsar-manager/pulsar-manager/dbdata
- ./pulsar-manager/application.properties:/pulsar-manager/pulsar-manager/application.properties
- ./data/ssl:/pulsar-manager/ssl
networks:
pulsar:
driver: bridge
Method Two
version: '2.1'
services:
zoo1:
image: apachepulsar/pulsar:2.4.1
hostname: zoo1
ports:
- "2181:2181"
environment:
ZK_ID: 1
PULSAR_ZK_CONF: /conf/zookeeper.conf
volumes:
- ./zoo1/data:/pulsar/data/zookeeper/
- ./zoo1/log/:/pulsar/logs
- ./conf:/conf
- ./scripts:/scripts
command: /bin/bash "/scripts/start_zk.sh"
zoo2:
image: apachepulsar/pulsar:2.4.1
hostname: zoo2
ports:
- "2182:2181"
environment:
ZK_ID: 2
PULSAR_ZK_CONF: /conf/zookeeper.conf
volumes:
volumes:
- ./zoo2/data:/pulsar/data/zookeeper/
- ./zoo2/log/:/pulsar/logs
- ./conf:/conf
- ./scripts:/scripts
command: /bin/bash "/scripts/start_zk.sh"
zoo3:
image: apachepulsar/pulsar:2.4.1
hostname: zoo3
ports:
- "2183:2181"
environment:
ZK_ID: 3
PULSAR_ZK_CONF: /conf/zookeeper.conf
volumes:
- ./zoo3/data:/pulsar/data/zookeeper/
- ./zoo3/log/:/pulsar/logs
- ./conf:/conf
- ./scripts:/scripts
command: /bin/bash "/scripts/start_zk.sh"
bookie1:
image: apachepulsar/pulsar:2.4.1
hostname: bookie1
ports:
- "3181:3181"
environment:
BOOKIE_CONF: /conf/bookkeeper.conf
volumes:
- ./bookie1/data:/pulsar/data/bookkeeper/
- ./bookie1/log/:/pulsar/logs
- ./conf:/conf
- ./scripts:/scripts
command: /bin/bash "/scripts/start_mainbk.sh"
depends_on:
- zoo1
- zoo2
- zoo3
bookie2:
image: apachepulsar/pulsar:2.4.1
hostname: bookie2
ports:
- "3182:3181"
environment:
BOOKIE_CONF: /conf/bookkeeper.conf
volumes:
- ./bookie2/data:/pulsar/data/bookkeeper/
- ./bookie2/log/:/pulsar/logs
- ./conf:/conf
- ./scripts:/scripts
command: /bin/bash "/scripts/start_otherbk.sh"
depends_on:
- bookie1
bookie3:
image: apachepulsar/pulsar:2.4.1
hostname: bookie3
ports:
- "3183:3181"
environment:
BOOKIE_CONF: /conf/bookkeeper.conf
volumes:
- ./bookie3/data:/pulsar/data/bookkeeper/
- ./bookie3/log/:/pulsar/logs
- ./conf:/conf
- ./scripts:/scripts
command: /bin/bash "/scripts/start_otherbk.sh"
depends_on:
- bookie1
broker1:
image: apachepulsar/pulsar:2.4.1
hostname: broker1
environment:
PULSAR_BROKER_CONF: /conf/broker.conf
ports:
- "6660:6650"
- "8090:8080"
volumes:
- ./broker1/data:/pulsar/data/broker/
- ./broker1/log/:/pulsar/logs
- ./conf:/conf
- ./scripts:/scripts
command: /bin/bash "/scripts/start_broker.sh"
depends_on:
- bookie1
- bookie2
- bookie3
broker2:
image: apachepulsar/pulsar:2.4.1
hostname: broker2
environment:
PULSAR_BROKER_CONF: /conf/broker.conf
ports:
- "6661:6650"
- "8091:8080"
volumes:
- ./broker2/data:/pulsar/data/broker/
- ./broker2/log/:/pulsar/logs
- ./conf:/conf
- ./scripts:/scripts
command: /bin/bash "/scripts/start_broker.sh"
depends_on:
- bookie1
- bookie2
- bookie3
pulsar-proxy:
image: apachepulsar/pulsar:2.4.1
hostname: pulsar-proxy
ports:
- "6650:6650"
- "8080:8080"
environment:
PULSAR_PROXY_CONF: "/conf/proxy.conf"
volumes:
- ./proxy/log/:/pulsar/logs
- ./conf:/conf
- ./scripts:/scripts
command: /bin/bash "/scripts/start_proxy.sh"
depends_on:
- broker1
- broker2
Those who cannot be commanded by themselves must be commanded by others.