Pulsar

Apache Pulsar is an all-in-one messaging and streaming platform. Messages can be consumed and acknowledged individually, or as a stream with a latency of less than 10 milliseconds. Its layered architecture allows rapid scaling across hundreds of nodes without data reorganization.
Its features include multi-tenancy with resource separation and access control, geo-replication across regions, tiered storage, and support for the six official client languages. It supports up to a million unique themes and is designed to simplify your application architecture.
Pulsar is an Apache Software Foundation top 10 project with a vibrant and enthusiastic community and a user base ranging from small businesses to large enterprises.

Official website: https://pulsar.apache.org/

theory

Apache Pulsar, a top-level project of the Apache Software Foundation, is a next-generation cloud-native distributed message flow platform that integrates message, storage, and lightweight function computing. It adopts a separate computing and storage architecture design, supports multi-tenancy, persistent storage, and cross-regional replication. It has strong consistency, high throughput, low latency, and high scalability and other streaming data storage features.

Pulsar was born in 2012. The original purpose was to integrate other messaging systems within Yahoo and build a unified logic, support large cluster and cross-regional messaging platform. Other messaging systems (including Kafka) at that time could not meet Yahoo's needs, such as large cluster multi-tenancy, stable and reliable IO service quality, million-level topics, cross-regional replication, etc., so Pulsar came into being.

The key features of Pulsar are as follows

A single instance of Pulsar natively supports multiple clusters, and can seamlessly replicate messages between clusters across computer rooms.
● Extremely low publishing latency and end-to-end latency
● Can seamlessly expand to more than one million topics
● Simple client API, supports Java, Go, Python and C++
● Supports multiple topic subscription modes (exclusive subscription, shared subscription, failover subscription)
● Guaranteed message delivery through the persistent message storage mechanism provided by Apache BookKeeper
● Stream-native data processing is realized by the lightweight serverless computing framework Pulsar Functions.
● Pulsar IO, a serverless connector framework based on Pulsar Functions, makes it easier to move data into and out of Apache Pulsar.
● Tiered storage can offload data from hot storage to cold/long-term storage (such as S3, GCS) when the data becomes stale.

concept

The official website introduces the source of
the Producer message and is also the publisher of the message, responsible for sending the message to the topic.
Consumer The consumer of the message is responsible for subscribing and consuming messages from the topic.
The carrier of Topic message data. In Pulsar, Topic can be divided into multiple partitions. If not set, there is only one partition by default.
Broker Broker is a stateless component, which is mainly responsible for receiving messages sent by Producer and delivering them to Consumer.
BookKeeper's distributed pre-write log system provides storage services for message systems than Pulsar, and provides cross-machine replication for multiple data centers.
Bookie Bookie is an Apache BookKeeper server that provides persistence for messages.
Cluster Apache Pulsar instance cluster, consisting of one or more instances.

cloud native architecture

Apache Pulsar adopts an architecture that separates computing and storage, and is not coupled with computing logic, enabling independent data expansion and fast recovery. With the development of cloud native, the computing-storage separation architecture appears more and more frequently in various systems. The Broker layer of Pulsar is a stateless computing logic layer, which is mainly responsible for receiving and distributing messages, while the storage layer is composed of Bookie nodes, responsible for storing and reading messages.

Pulsar's computing-storage-separated architecture can achieve unlimited horizontal expansion. If the system has many Producers and Consumers, it can directly expand the computing logic layer Broker without being affected by data consistency. If it is not this kind of architecture, when we expand the capacity, the computing logic and storage will change in real time, and it is easy to be limited by data consistency. At the same time, the logic of the computing layer is complex and error-prone, while the logic of the storage layer is relatively simple, and the probability of error is relatively small. Under this architecture, if an error occurs at the computing layer, it can be recovered unilaterally without affecting the storage layer.

Pulsar also supports data tiered storage, which can move old messages to cheap storage solutions, while the latest messages can be stored in SSD. This can save costs and maximize the use of resources.
insert image description here
A Pulsar cluster consists of multiple Pulsar instances, including

  • Multiple Broker instances, responsible for receiving and distributing messages
  • A ZooKeeper service that coordinates the cluster configuration
  • BookKeeper server cluster Bookie, used for message persistence
  • Message synchronization between clusters through cross-regional replication
    insert image description here

design principle

Pulsar adopts the publish-subscribe design pattern (pub-sub). In this design pattern, the producer publishes messages to the topic, and the consumer subscribes to the messages in the topic and sends ack confirmation after the processing is completed.
insert image description here

features

insert image description here

deploy

Docker

Start Pulsar in Docker

docker run -it -p 6650:6650 -p 8080:8080 --mount source=pulsardata,target=/pulsar/data --mount source=pulsarconf,target=/pulsar/conf apachepulsar/pulsar:3.0.0 bin/pulsar standalone

If you want to change the Pulsar configuration and start Pulsar, run the following command by passing the environment variable with PULSAR_PREFIX_ prefix. See the default configuration file for more details.

docker run -it -e PULSAR_PREFIX_xxx=yyy -p 6650:6650  -p 8080:8080 --mount source=pulsardata,target=/pulsar/data --mount source=pulsarconf,target=/pulsar/conf apachepulsar/pulsar:2.10.0 sh -c "bin/apply-config-from-env.py conf/standalone.conf && bin/pulsar standalone"

Recommendations:
● By default, docker containers run with UID 10000 and GID 0. Make sure that the mounted volume provides write permission for UID 10000 or GID 0. Note that UID 10000 is arbitrary, so it is recommended to make these mounts writable by the root group (GID 0).
● Data, metadata and configuration are persisted on Docker volumes to avoid "rebooting" each time the container restarts. To learn more about volumes, you can use the docker volume inspect command.
● For Docker on Windows, make sure it is configured to use Linux containers.

After successfully starting Pulsar, you can see info level log messages as follows:

08:18:30.970 [main] INFO  org.apache.pulsar.broker.web.WebService - HTTP Service started at http://0.0.0.0:8080
...
07:53:37.322 [main] INFO  org.apache.pulsar.broker.PulsarService - 消息服务准备就绪,  bootstrap service port = 8080, broker url= pulsar://localhost:6650, cluster=standalone, configs=org.apache.pulsar.broker.ServiceConfiguration@98b63c1
...

If you need to perform a health check, you can use bin/pulsar-admin brokers healthcheckthe command. (pulsar-admin is a tool for managing Pulsar entities)
When starting a local standalone cluster, public/defaulta namespace is automatically created. Namespaces are used for development purposes. All Pulsar themes are managed in namespaces.

Use Pulsar in Docker

If you're running a local standalone cluster, you can use one of these root urls to interact with your cluster:

pulsar://localhost:6650
http://localhost:8080

The following example guides you to get started with Pulsar by using the Python client API.
Install the Pulsar Python client library directly from PyPI:

pip install pulsar-client

use message

Create a consumer and subscribe to the topic: Create a consumer and subscribe to the topic

import pulsar

client = pulsar.Client('pulsar://localhost:6650')
consumer = client.subscribe('my-topic', subscription_name='my-sub')

while True:
    msg = consumer.receive()
    print("Received message: '%s'" % msg.data())
    consumer.acknowledge(msg)

client.close()

generate message

Start a producer to send some test messages: Start a producer to send some test messages

import pulsar

client = pulsar.Client('pulsar://localhost:6650')
producer = client.create_producer('my-topic')

for i in range(10):
    producer.send(('hello-pulsar-%d' % i).encode('utf-8'))

client.close()

Get the topic statistics

In Pulsar, you can use the REST API, Java, or command-line tools to control every aspect of the system. For details about the API, see Admin API Overview.
In the simplest example, you can use curl to probe statistics for a specific topic:

curl http://localhost:8080/admin/v2/persistent/public/default/my-topic/stats | python -m json.tool

The output is like this:

{
    
    
···
            "consumers": [
                {
    
    
                    "msgRateOut": 1.8332950480217471,
                    "msgThroughputOut": 91.33142602871978,
                    "bytesOutCounter": 6607,
                    "msgOutCounter": 133,
                    "msgRateRedeliver": 0.0,
                    "chunkedMessageRate": 0.0,
                    "consumerName": "3c544f1daa",
                    "availablePermits": 867,
                    "unackedMessages": 0,
                    "avgMessagesPerEntry": 6,
                    "blockedConsumerOnUnackedMsgs": false,
                    "lastAckedTimestamp": 1625389546162,
                    "lastConsumedTimestamp": 1625389546070,
                    "metadata": {
    
    },
                    "address": "/127.0.0.1:35472",
                    "connectedSince": "2021-07-04T08:58:21.287682Z",
                    "clientVersion": "2.8.0"
                }
            ],
···
}

Docker-compose

stand-alone

https://jpinjpblog.wordpress.com/2020/12/10/pulsar-with-manager-and-dashboard-on-docker-compose/

version: "3.5"
services:
  pulsar:
    image: "apachepulsar/pulsar:2.6.2"
    command: bin/pulsar standalone
    environment:
      PULSAR_MEM: " -Xms512m -Xmx512m -XX:MaxDirectMemorySize=1g"
    volumes:
      - ./pulsar/data:/pulsar/data
    ports:
      - "6650:6650"
      - "8080:8080"
    restart: unless-stopped
    networks:
      - network_test_bed
 
  pulsar-manager:
    image: "apachepulsar/pulsar-manager:v0.2.0"
    ports:
      - "9527:9527"
      - "7750:7750"
    depends_on:
      - pulsar
    environment:
      SPRING_CONFIGURATION_FILE: /pulsar-manager/pulsar-manager/application.properties
    networks:
      - network_test_bed

  redis:
    image: "redislabs/redistimeseries:1.4.7"
    ports:
      - "6379:6379"
    volumes:
      - ./redis/redis-data:/var/lib/redis
    environment:
      - REDIS_REPLICATION_MODE=master
      - PYTHONUNBUFFERED=1
    networks: 
      - network_test_bed
 
  alertmanager:
    image: prom/alertmanager:v0.21.0
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager/:/etc/alertmanager/
    networks:
      - network_test_bed
    restart: always
    command:
      - '--config.file=/etc/alertmanager/config.yml'
      - '--storage.path=/alertmanager'
 
  prometheus:
    image: prom/prometheus:v2.23.0
    volumes:
      - ./prometheus/standalone.prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    networks:
      - network_test_bed
 
  grafana:
    image: streamnative/apache-pulsar-grafana-dashboard:0.0.14
    environment:
      PULSAR_CLUSTER: "standalone"
      PULSAR_PROMETHEUS_URL: "http://163.221.68.230:9090"
    restart: unless-stopped
    ports:
      - "3000:3000"
    networks:
      - network_test_bed
    depends_on:
      - prometheus
 
networks:
  network_test_bed:
    name: network_test_bed
    driver: bridge

cluster

Official website example

method one

# 安装
curl -SL https://github.com/docker/compose/releases/download/v2.19.1/docker-compose-linux-x86_64 -o /usr/bin/docker-compose && chmod +x /usr/bin/docker-compose

# 部署
version: '3'
services:
  # Start zookeeper
  zookeeper:
    image: apachepulsar/pulsar:latest
    container_name: zookeeper
    restart: on-failure                       # 失败后重启
    # user: root							  # 当镜像是apachepulsar/pulsar:3.0.0时需要开启
    networks:
      - pulsar
    volumes:
      - ./data/zookeeper:/pulsar/data/zookeeper
    environment:
      - metadataStoreUrl=zk:zookeeper:2181
      - PULSAR_MEM=-Xms256m -Xmx256m -XX:MaxDirectMemorySize=256m
    command: >
      bash -c "bin/apply-config-from-env.py conf/zookeeper.conf && \
             bin/generate-zookeeper-config.sh conf/zookeeper.conf && \
             exec bin/pulsar zookeeper"
    healthcheck:
      test: ["CMD", "bin/pulsar-zookeeper-ruok.sh"]
      interval: 10s
      timeout: 5s
      retries: 30

  # Init cluster metadata
  pulsar-init:
    container_name: pulsar-init
    hostname: pulsar-init
    image: apachepulsar/pulsar:latest
    restart: on-failure                       # 失败后重启
    # user: root							  # 当镜像是apachepulsar/pulsar:3.0.0时需要开启
    networks:
      - pulsar
    command: >
      bin/pulsar initialize-cluster-metadata \
               --cluster cluster-a \
               --zookeeper zookeeper:2181 \
               --configuration-store zookeeper:2181 \
               --web-service-url http://broker:8080 \
               --broker-service-url pulsar://broker:6650
    depends_on:
      zookeeper:
        condition: service_healthy

  # Start bookie
  bookie:
    image: apachepulsar/pulsar:latest
    container_name: bookie
    restart: on-failure
    # user: root							  # 当镜像是apachepulsar/pulsar:3.0.0时需要开启
    networks:
      - pulsar
    environment:
      - clusterName=cluster-a
      - zkServers=zookeeper:2181
      - metadataServiceUri=metadata-store:zk:zookeeper:2181
      # 否则每次我们运行docker时,由于Cookie的原因,我们都无法启动
      # 查看: https://github.com/apache/bookkeeper/blob/405e72acf42bb1104296447ea8840d805094c787/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Cookie.java#L57-68
      - advertisedAddress=bookie
      - BOOKIE_MEM=-Xms512m -Xmx512m -XX:MaxDirectMemorySize=256m
    depends_on:
      zookeeper:
        condition: service_healthy
      pulsar-init:
        condition: service_completed_successfully
    # 将本地目录映射到容器,避免由于容器磁盘不足导致bookie启动失败
    volumes:
      - ./data/bookkeeper:/pulsar/data/bookkeeper
    command: bash -c "bin/apply-config-from-env.py conf/bookkeeper.conf && exec bin/pulsar bookie"

  # Start broker
  broker:
    image: apachepulsar/pulsar:latest
    container_name: broker
    hostname: broker
    restart: on-failure
    # user: root							  # 当镜像是apachepulsar/pulsar:3.0.0时需要开启
    networks:
      - pulsar
    environment:
      - metadataStoreUrl=zk:zookeeper:2181
      - zookeeperServers=zookeeper:2181
      - clusterName=cluster-a
      - managedLedgerDefaultEnsembleSize=1
      - managedLedgerDefaultWriteQuorum=1
      - managedLedgerDefaultAckQuorum=1
      - advertisedAddress=broker
      # 将Broker的Listener信息发布到Zookeeper中,供Clients(Producer/Consumer)使用
      - advertisedListeners=external:pulsar://broker:6650,external1:pulsar://127.0.0.1:66500
      - PULSAR_MEM=-Xms512m -Xmx512m -XX:MaxDirectMemorySize=256m
    depends_on:
      zookeeper:
        condition: service_healthy
      bookie:
        condition: service_started
    expose:
      - 8080
      - 6650
    ports:
      - "6650:6650"
      - "8080:8080"
    volumes:
      - ./data/broker/data:/pulsar/data/
      - ./data/broker/conf:/pulsar/conf
      - ./data/broker/logs:/pulsar/logs
      - ./data/ssl/:/pulsar/ssl
    command: bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar broker"

  pulsar-manager:
    image: apachepulsar/pulsar-manager:v0.3.0    # :v0.4.0也有
    container_name: pulsar-manager
    hostname: pulsar-manager
    restart: always
    networks:
      - pulsar
    ports:
      - "9527:9527"				# 前端端口
      - "7750:7750"				# 后端端口
    depends_on:
      - broker
    links:
      - broker
    environment:
      SPRING_CONFIGURATION_FILE: /pulsar-manager/pulsar-manager/application.properties
    volumes:
      - ./pulsar-manager/dbdata:/pulsar-manager/pulsar-manager/dbdata
      - ./pulsar-manager/application.properties:/pulsar-manager/pulsar-manager/application.properties
      - ./data/ssl:/pulsar-manager/ssl
networks:
  pulsar:
    driver: bridge

Method Two

version: '2.1'

services:
  zoo1:
    image: apachepulsar/pulsar:2.4.1
    hostname: zoo1
    ports:
      - "2181:2181"
    environment:
        ZK_ID: 1
        PULSAR_ZK_CONF: /conf/zookeeper.conf
    volumes:
      - ./zoo1/data:/pulsar/data/zookeeper/
      - ./zoo1/log/:/pulsar/logs
      - ./conf:/conf
      - ./scripts:/scripts
    command: /bin/bash "/scripts/start_zk.sh"
    
  zoo2:
    image: apachepulsar/pulsar:2.4.1
    hostname: zoo2
    ports:
      - "2182:2181"
    environment:
        ZK_ID: 2
        PULSAR_ZK_CONF: /conf/zookeeper.conf
    volumes:
    volumes:
      - ./zoo2/data:/pulsar/data/zookeeper/
      - ./zoo2/log/:/pulsar/logs
      - ./conf:/conf
      - ./scripts:/scripts
    command: /bin/bash "/scripts/start_zk.sh"

  zoo3:
    image: apachepulsar/pulsar:2.4.1
    hostname: zoo3
    ports:
      - "2183:2181"
    environment:
        ZK_ID: 3
        PULSAR_ZK_CONF: /conf/zookeeper.conf
    volumes:
      - ./zoo3/data:/pulsar/data/zookeeper/
      - ./zoo3/log/:/pulsar/logs
      - ./conf:/conf
      - ./scripts:/scripts
    command: /bin/bash "/scripts/start_zk.sh"

  bookie1:
    image: apachepulsar/pulsar:2.4.1
    hostname: bookie1
    ports:
      - "3181:3181"
    environment:
        BOOKIE_CONF: /conf/bookkeeper.conf
    volumes:
      - ./bookie1/data:/pulsar/data/bookkeeper/
      - ./bookie1/log/:/pulsar/logs
      - ./conf:/conf
      - ./scripts:/scripts
    command: /bin/bash "/scripts/start_mainbk.sh"
    depends_on:
      - zoo1
      - zoo2
      - zoo3

  bookie2:
    image: apachepulsar/pulsar:2.4.1
    hostname: bookie2
    ports:
      - "3182:3181"
    environment:
        BOOKIE_CONF: /conf/bookkeeper.conf
    volumes:
      - ./bookie2/data:/pulsar/data/bookkeeper/
      - ./bookie2/log/:/pulsar/logs
      - ./conf:/conf
      - ./scripts:/scripts
    command: /bin/bash "/scripts/start_otherbk.sh"
    depends_on:
      - bookie1

  bookie3:
    image: apachepulsar/pulsar:2.4.1
    hostname: bookie3
    ports:
      - "3183:3181"
    environment:
        BOOKIE_CONF: /conf/bookkeeper.conf
    volumes:
      - ./bookie3/data:/pulsar/data/bookkeeper/
      - ./bookie3/log/:/pulsar/logs
      - ./conf:/conf
      - ./scripts:/scripts
    command: /bin/bash "/scripts/start_otherbk.sh"
    depends_on:
      - bookie1

  broker1:
    image: apachepulsar/pulsar:2.4.1
    hostname: broker1
    environment:
        PULSAR_BROKER_CONF: /conf/broker.conf
    ports:
      - "6660:6650"
      - "8090:8080"
    volumes:
      - ./broker1/data:/pulsar/data/broker/
      - ./broker1/log/:/pulsar/logs
      - ./conf:/conf
      - ./scripts:/scripts
    command: /bin/bash "/scripts/start_broker.sh"
    depends_on:
      - bookie1
      - bookie2
      - bookie3

  broker2:
    image: apachepulsar/pulsar:2.4.1
    hostname: broker2
    environment:
        PULSAR_BROKER_CONF: /conf/broker.conf
    ports:
      - "6661:6650"
      - "8091:8080"
    volumes:
      - ./broker2/data:/pulsar/data/broker/
      - ./broker2/log/:/pulsar/logs
      - ./conf:/conf
      - ./scripts:/scripts
    command: /bin/bash "/scripts/start_broker.sh"
    depends_on:
      - bookie1
      - bookie2
      - bookie3

  pulsar-proxy:
    image: apachepulsar/pulsar:2.4.1
    hostname: pulsar-proxy
    ports:
      - "6650:6650"
      - "8080:8080"
    environment:
        PULSAR_PROXY_CONF: "/conf/proxy.conf"
    volumes:
      - ./proxy/log/:/pulsar/logs
      - ./conf:/conf
      - ./scripts:/scripts
    command: /bin/bash "/scripts/start_proxy.sh"
    depends_on:
      - broker1
      - broker2

Those who cannot be commanded by themselves must be commanded by others.

Guess you like

Origin blog.csdn.net/qq_50573146/article/details/131897568