Introduction to Pulsar and introduction to Pulsar deployment, principles and usage

Introduction to Pulsar and introduction to Pulsar deployment, principles and usage

Introduction to Pulsar

birth background

Apache Pulsar is an enterprise-level distributed messaging system originally developed by Yahoo. It was open sourced in 2016 and graduated as a top-level project of the Apache Foundation in September 2018. Pulsar has been used in Yahoo's production environment for more than three years, mainly serving Mail, Finance, Sports, Flickr, the Gemini Ads platform, and Sherpa (Yahoo's KV storage).

Pulsar is a multi-tenant, high-performance solution for server-to-server messaging. Pulsar was originally developed by Yahoo and managed by the Apache Software Foundation .

Features

  • Native support for multiple clusters in a Pulsar instance, seamless geo-replication of messages across clusters.
  • Extremely low release and end-to-end latency.
  • Seamlessly scales to over a million themes.
  • A simple client API with bindings for Java, Go, Python and C++.
  • Multiple subscription modes for topics (exclusive, shared and failover).
  • The persistent message storage provided by Apache BookKeeper guarantees message delivery.
  • Pulsar Functions, a serverless lightweight computing framework, provides streaming local data processing capabilities.
  • Pulsar IO, a serverless connector framework built on Pulsar functions, makes it easier to move data into and out of Apache Pulsar.
  • As data ages, tiered storage offloads data from hot/warm storage to cold/long-term storage (such as S3 and GCS).

messaging

Pulsar is based on the publish-subscribe (pub-sub) production subscription model. Producers publish messages to Topics. Consumers can subscribe to these topics to process messages and send confirmation messages after processing is completed.

Key words Word interpretation
value Data in Pulsar is stored as bytes
key Messages can be tagged by Key. This is useful for things like topic compression
properties An optional map for user configuration parameters
Sequence ID Each message is placed in an ordered sequence corresponding to the Topic. This field records the sequence order of the message.
publish time The release time of the message, the timestamp of the message release (automatically attached by the producer)
event time The application can attach a timestamp to the message, an optional timestamp representing the time when an event occurred, for example, the time the message was processed. If not explicitly set, the event time is 0.

message producer

Send mode

Producers publish messages to Topic. There are two modes for sending messages: synchronous sending and asynchronous sending:

Mode illustrate:
Send synchronously Each time the producer sends a message, it needs to wait for the ack confirmation from the broker. If the confirmation message is not received, the producer considers that the message has failed to be sent.
Send asynchronously The producer puts the message into the blocking queue and returns directly. Pulsar's client sends messages to the broker through a background thread. If the queue is full, the producer will be informed that the push failed when the message is placed again.

message compression

When producers publish messages, the data will be compressed during transmission. Currently, the compression methods supported by Pulsar include LZ4, ZLIB, ZSTD, and SNAPPY. If batching is enabled, the producer will accumulate a batch of messages for sending in a single request. The batch size can be defined by the maximum number of messages and the maximum publishing delay.

Batch sending (Batching)

If batch processing is enabled, the producer will accumulate a batch of messages and then send them out in one request. The size of the batch depends on the maximum number of messages and the maximum publishing delay.

message consumer

consumption pattern

Consumers receive messages from Topic for data processing. Similarly, message reception is also divided into two modes: synchronous reception and asynchronous reception:

Mode illustrate:
synchronous reception Synchronous reception is blocked until a message comes in
Asynchronous reception Asynchronous reception immediately returns a future value. Once there is a new message, it is completed directly, such as in javaCompletableFuture

Consumption confirmation (ack)

  1. When the consumer successfully receives the message:

    When the consumer successfully processes a message, it will send a confirmation request to the broker to tell the broker that the message can be deleted. Otherwise, the broker will always store the message. Messages can be confirmed one by one or cumulatively. The consumer only needs to confirm the last message received. All messages involved in this stream will not be re-delivered to the consumer.

  2. When the consumer fails to consume

    When the consumer fails to process the message, it will send a failure confirmation to the broker. At this time, the broker will resend the message to the consumer. The failure confirmation can be sent one by one or cumulatively, depending on the consumption subscription mode. In the exclusive and failover subscription modes, the consumer will only fail to acknowledge the last message received. On the Pulsar client, you can trigger the broker to automatically re-deliver the message by setting a timeout. If the consumer does not send a confirmation request within the timeout range, the broker will automatically re-send the message to the consumer.

  3. Confirm timeout

    If a message continues to fail to be processed, the broker will be triggered to keep resending the message to the consumer, making the consumer unable to process other messages. The dead letter topic mechanism allows the consumer to receive new messages when it cannot successfully consume certain messages. Consumption, under this mechanism, messages that cannot be consumed are stored in a separate topic (Dead letter topic), and users can decide how to process the messages in this topic.

Message persistence

Message persistence is achieved through BookKeeper. Once a subscription relationship is created, Pulsar will retain all messages (even if the consumer disconnects). These messages will only be sent when the consumer confirms that the retained messages have been successfully processed. Discard the message.

There are two types of message retention:

​ 1. Messages within the retention policy can be stored persistently in Pulsar even if the consumer has sent a confirmation. Confirmed messages not covered by the retention policy will be deleted. If there is no retention policy, all confirmed messages will be deleted. ;

2. Set the message expiration time, which will be based on the TTL expiration time applied to the namespace. If it expires, the message will be deleted even if it is not confirmed.

When a message is sent repeatedly, you can choose two persistence strategies:

​ 1. Persistence of duplicate messages into BookKeeper

​ 2. It is judged that if it is a duplicate message, no persistence operation will be performed.

tenant

Pulsar has supported multi-tenancy from the beginning. Topic names are hierarchical, with tenants at the top.

namespace

A namespace is a logical naming term within a tenant. A tenant can create multiple namespaces through the admin API. For example, a tenant that connects multiple applications can create a different namespace for each application.

Insert image description here

Topic

Like other pub-sub systems, topics in Pulsar are named channels that transmit messages from producers to consumers:

{
    
    persistent|non-persistent}://tenant/namespace/topic
Key words Word interpretation
persistent|non-persistent Identifies topic type:
Persistent: All messages are persisted on disk (BookKeeper node)
Non-persistent: Data only exists in memory. When the broker is restarted, messages will be lost.
tenant Tenant; the tenant of the topic in the instance. Tenants are an important component of pulsar's multi-tenant support and can be dispersed in the cluster.
namespace Used as a grouping mechanism for topics. Most topic configurations are performed at the namespace level. Each tenant can have multiple namespaces.
topic It can be customized by the user. The topic name is free format and has no special meaning in the Pulsar instance.

When producer writes to a topic that does not exist, it will automatically create the topic under the provided namespace.

Users do not need to explicitly create a topic in Pulsar. If the client attempts to write/receive information to a non-existent topic, Pulsar will automatically create the topic under the namespace provided by the topic. Consumers can subscribe to multiple topics: configured
by
name :persistent://public/default/finance-.*
Configure topic subscription list.
Regular topics can only be provided by a single broker, which limits the maximum throughput of the topic. A partitioned topic is a special type of topic that is processed by multiple brokers. , which allows for higher throughput. There is no difference in how the subscription mode works between partitioned topics and ordinary topics. You can specify the number of partitions when creating a topic.

Message routing pattern

When publishing to a distributed partition topic, the routing mode must be specified. There are three default routing modes, and the default is polling - similar to Kafka.

model describe
RoundRobinPartition If no key is provided, messages will be published to each partition in a polling manner to achieve maximum throughput - the default mode.
SinglePartition If key is not provided, the producer randomly selects a partition and publishes all messages to this partition. If key is specified, the key will be hashed (default javaStringHash = Murmur3_32Hash recommended for multi-clients) and the message assigned to the specific partition
CustomPartition Partitioning of specific messages is done using a custom message router implementation that will be called. Users implement the MessageRouter interface on the java clent side to implement custom routing modes.

Message subscription mode (subscription)

Pulsar has three subscription modes: exclusive, shared, and failover.

exclusive

Exclusive mode: Only one consumer is allowed to subscribe to a topic, otherwise an error will be reported

In exclusive mode, a subscription is only allowed to be used by one consumer to subscribe to a topic. If multiple consumers use the same subscription to subscribe to the same topic, an error will occur. exclusive is the default subscription mode. As shown in the figure below, both Consumer A-0 and Consumer A-1 use the same subscription (the same consumer group), and only Consumer A-0 is allowed to consume messages.
Insert image description here

Failover | Disaster recovery (failover)

Failover mode: Multiple consumers subscribe to the same topic, sorted by consumer name. The first consumer is the only consumer that receives the message (the main consumer). When the main consumer disconnects, all subsequent messages will be sent to the next consumer

In failover mode, multiple consumers are allowed to use the same subscription to subscribe to the topic. But for a given topic, the broker will select one consumer as the primary consumer of the topic, and other consumers will be designated as failover consumers. When the primary consumer loses connection, the topic will be reassigned to one of the failover consumers, and the newly assigned consumer will become the new primary consumer. When this happens, all unacknowledged messages will be delivered to the new primary consumer. This process is similar to consumer group rebalancing in Kafka.

As shown in the figure below, Consumer B-0 is the main consumer of the topic. When Consumer B-0 loses the connection, Consumer B-1 can become the new main consumer to consume the topic.
Insert image description here

shared

Shared mode: multiple consumers subscribe to the same topic, messages are sent between consumers in a cyclic manner, and a given message can only be sent to one consumer. When the consumer disconnects, all messages are sent to Messages that are not acknowledged will be rescheduled to other consumers.

In shared mode, multiple consumers can use the same subscription to subscribe to a topic. Messages are distributed to consumers in a polling manner, and each consumer is only sent to one consumer. When a consumer loses connection, all unacknowledged messages sent to that consumer will be rescheduled to be sent to the remaining consumers on the subscription.

However, messages cannot be guaranteed to be in order and batch ack is not supported.

As shown in the figure below, Consumer C-1, Consumer C-2, and Consumer C-3 receive messages in a polling manner.

Insert image description here

Shared key (key_shared)

key_shared mode: multiple consumers subscribe to the same topic, messages are delivered between consumers in a distributed manner (<key, value>), messages with the same key are delivered to the same consumer, and when the consumer disconnects , will cause the consumer corresponding to the key to change

In shared mode, multiple consumers can use the same subscription to subscribe to a topic. Messages are distributed to consumers according to key, and messages containing the same key are only sent to the same consumer.

As shown in the figure below, different consumers only receive messages corresponding to keys.

Insert image description here

Pulsar principle architecture

Architecture

At the highest level, a Pulsar instance consists of one or more Pulsar clusters, and the clusters in the instance can replicate data to each other. In a Pulsar cluster, one or more brokers process and load incoming messages from producers, send messages to consumers, and communicate with the Pulsar configuration store to handle various coordination tasks. The Pulsar cluster architecture is as follows, including one or Multiple brokers, Zookeeper for cluster-level configuration and coordination, BookKeeper for persistent storage of messages, clusters can be replicated between clusters using geo-replication

Insert image description here

Pulsar components

Broker

Pulsar's broker is a stateless component and does not store data itself. Mainly responsible for processing producer and consumer requests, message copying and distribution, and data calculation. It can be understood that Broker is its own instance of Pulsar.

It mainly consists of 2 parts:

  1. HTTP server, exposed to producers and consumers, used to manage tasks and topic lookup REST API;

  2. Scheduler, an asynchronous TCP server, via a custom binary protocol for all data transfers;

Each cluster has its own local Zookeeper used to store cluster-specific configuration and coordination, such as ownership metadata, agent loading reports, bookkeeper ledger metadata, and more.
Pulsar uses BookKeeper for persistent message storage. BookKeeper is a distributed write-ahead log (WAL) system. Its advantages are: • Pulsar
uses multiple independent logs to become ledgers. Over time, multiple ledgers can be created for topics.
• Provides very efficient storage for replicated sequential data
• Guarantees ledger read consistency in the event of various system failures
• Provides I/O distribution for multiple Bookies
• Horizontally expands both in terms of capacity and throughput Capacity can be increased by adding more bookies to the cluster
• Bookies are used to handle ledgers read and written by thousands of colleagues. By using multiple disk devices (one for logs and one for storage), Bookies are able to Latency isolation of write operations
• In addition to message data, the consumer's subscription position cursor can also be persistently stored in BookKeeper

The partition of each topic will be assigned to a certain broker, and the producer and consumer will connect to this broker to send and consume messages to the partition of the topic. The broker is mainly responsible for the replication and distribution of messages and the calculation of data.

Insert image description here

zookeeper

It is mainly used for storing metadata, cluster configuration, task coordination (such as which broker is responsible for which topic), and service discovery (such as the address of the bookie discovered by the broker).

bookkeeper

Mainly used for persistent storage of data. In addition to message data, cursors (cursors) will also be persisted to Bookeeper. Cursors are the displacements subscribed by the consumer. Each storage node in Bookeeper is called a bookie.

BookKeeper is a storage service optimized for real-time workloads and is scalable, highly fault-tolerant, and low-latency. An enterprise-level real-time storage platform should meet the following requirements:

  • Read and write entry streams with extremely low latency (less than 5 milliseconds)
  • Ability to store data persistently, consistently, and fault-tolerantly
  • Ability to stream or tail-end when writing data
  • Efficiently store and access historical and real-time data

Insert image description here

data storage

data partition

The data written to the topic may be only a few MB, or it may be several TB. So, in some cases the throughput of the topic is very low, sometimes very high, it all depends on the number of consumers. So how to deal with the situation where the throughput of some topics is very high and some is very low? To solve this problem, Pulsar distributes the data of a topic across multiple machines, which is called partitioning.

When processing massive data, partitioning is a very common method to ensure high throughput. By default, Pulsar topics are not partitioned, but you can easily create partitioned topics through command line tools or APIs and specify the number of partitions.

After creating a partition topic, Pulsar can automatically partition the data without affecting producers and consumers. In other words, an application writes data to a topic, and after partitioning the topic, there is no need to modify the application code. Partitioning is just an operation and maintenance operation, and the application does not need to care about how partitioning is performed.

Topic partitioning operations are handled by a process called broker, and each node in the Pulsar cluster runs its own broker.

Insert image description here

Data persistence

After the Pulsar broker receives the message and confirms it, it must ensure that the message will not be lost under any circumstances. Unlike other messaging systems, Pulsar uses Apache BookKeeper to ensure durability. BookKeeper provides low-latency persistent storage. After Pulsar receives the message, it sends the message to multiple BookKeeper nodes (specifically determined by the replication coefficient). The nodes write the data into the write ahead log (write ahead log) and also save a copy in the memory. The node forces the log to be written to persistent storage before acknowledging the message, so even if there is a power failure, the data will not be lost. Because the Pulsar broker sends data to multiple nodes, it will only send a confirmation message to the producer after a majority of nodes (quorum) confirm that the write is successful. This is how Pulsar ensures that data is not lost even in the event of hardware failure, network failure or other failures. We will delve into the details of this in subsequent articles.

Pulsar installation and deployment

Environmental preparation

Pulsar project official website: http://pulsar.apache.org/

Pulsar official download address: http://archive.apache.org/dist/pulsar/

This deployment uses the latest version: http://archive.apache.org/dist/pulsar/pulsar-2.7.2/apache-pulsar-2.7.2-bin.tar.gz

JDK official website download address: https://www.oracle.com/cn/java/technologies/javase/javase-jdk8-downloads.html

Machine planning

Building a Pulsar cluster requires at least three components: ZooKeeper cluster, BookKeeper cluster and broker cluster (Broker is its own instance of Pulsar). The three components are as follows:

ZooKeeper cluster (composed of 3 ZooKeeper nodes)
BookKeeper (bookie) cluster (composed of 3 BookKeeper nodes)
broker cluster (composed of 3 Pulsar nodes)

From a component perspective, at least 3 machines are needed to reuse machines to deploy components. The official recommendation is 6 machines (zk separately separates 3 machines + 3 machines to deploy bookie and broker). If the components do not interfere with each other, 9 machines are needed. Here is an introduction to experimental deployment, just use 3 units. The difference in the number of machines is just that the machines used to start the service are different, and it does not affect the deployment understanding and process sorting.

[Note:] Pulsar's installation package already contains various component libraries required to build a cluster. There is no need to download the ZooKeeper installation package and BookKeeper installation package separately.

Machine application

Prepare 3 clean bare metal systems. For this experiment, use Alibaba Cloud to apply for machines (you can use pay-as-you-go and recycle and release after use). If it is a test production environment, the operation process is the same. It is recommended to use multiple machines in the production environment and deploy components separately.

  1. Prepare 3 machines with a clean environment; or apply for 3 Alibaba Cloud hosts (4C8G CentOS7)
    172.23.118.214 pulsar01
    172.23.118.215 pulsar02
    172.23.118.216 pulsar03

Preparation

[Note:] This build uses the root user. Non-root users can use the sudo command and perform some unauthorized operations.

1. Change the server host name

172.23.118.214 -> hostnamectl set-hostname pulsar01
172.23.118.215 -> hostnamectl set-hostname pulsar02
172.23.118.216 -> hostnamectl set-hostname pulsar03

2. Add hosts file host resolution to 3 machines to simplify operations.

[root@pulsar01 ~]# vim /etc/hosts

::1     localhost       localhost.localdomain   localhost6      localhost6.localdomain6
127.0.0.1       localhost       localhost.localdomain   localhost4      localhost4.localdomain4

# pulsar
172.23.118.214   pulsar01
172.23.118.215   pulsar02
172.23.118.216   pulsar03

3. Configuration is password-free, making it easy to transfer directory files and installation packages

[root@pulsar01 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:56uK0ddMedu48pTC639dlwE7bIJLOyqkSQttJEhPrNc root@pulsar01
The key's randomart image is:
+---[RSA 2048]----+
|  .              |
| . o         .   |
|o + .     . . o  |
|.o + E   o o = . |
|  =     S * + . o|
| . + o   @ . = .o|
|  + * . o B = ..o|
|   + + o  .= .. .|
|    . o..o+++.   |
+----[SHA256]-----+
[root@pulsar01 ~]# 
[root@pulsar01 ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
[root@pulsar01 ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
[root@pulsar01 ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
# 测试
[root@pulsar01 ~]# for i in pulsar01 pulsar02 pulsar03;do echo "=== $i ===" && ssh $i hostname;done
=== pulsar01 ===
pulsar01
=== pulsar02 ===
pulsar02
=== pulsar03 ===
pulsar03

4. Installation package preparation

# 创建工作目录 module用来安装应用,software用来存放安装包
[root@pulsar01 ~]# for i in pulsar01 pulsar02 pulsar03;do ssh $i mkdir -p /opt/{module,software};done
[root@pulsar01 ~]# ls /opt/
module  software
# 把安装包上传或者wget至/opt/software/目录
[root@pulsar01 ~]# cd /opt/software/
[root@pulsar01 software]# ll
total 490480
-rw-r--r-- 1 root root 307228973 May 29 09:04 apache-pulsar-2.7.2-bin.tar.gz
-rw-r--r-- 1 root root 195013152 May 29 09:05 jdk-8u212-linux-x64.tar.gz
[root@pulsar01 software]# scp * pulsar02:/opt/software/ 
[root@pulsar01 software]# scp * pulsar03:/opt/software/

Installation and deployment

1. Install JDK8

​ Install JDK on 3 servers (requires version no lower than JDK 8)

​ This deployment uses: jdk-8u212-linux-x64.tar.gz

# 集群机器都需要安装配置jdk
[root@pulsar01 ~]# cd /opt/software/
[root@pulsar01 software]# tar -xf jdk-8u212-linux-x64.tar.gz -C /opt/module/
[root@pulsar01 software]# cd /opt/module/jdk1.8.0_212/
[root@pulsar01 jdk1.8.0_171]# pwd
/opt/module/jdk1.8.0_212
# 增加java环境变量
[root@pulsar01 jdk1.8.0_212]# vim /etc/profile
# JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_212
export PATH=$PATH:$JAVA_HOME/bin
CLASSPATH=:$JAVA_HOME/lib/
export PATH JAVA_HOME CLASSPATH
[root@pulsar01 jdk1.8.0_212]#
# source引用环境变量
[root@pulsar01 jdk1.8.0_212]# source /etc/profile
# 验证是否生效
[root@pulsar01 jdk1.8.0_212]# java -version
java version "1.8.0_212"
Java(TM) SE Runtime Environment (build 1.8.0_212-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.212-b11, mixed mode)
[root@pulsar01 jdk1.8.0_212]# echo $JAVA_HOME
/opt/module/jdk1.8.0_212

# pulsar02 | pulsar03 操作相同

2. Unzip pulsar and configure environment variables

# 解压到/opt/module目录
[root@pulsar01 software]# tar -xf apache-pulsar-2.7.2-bin.tar.gz -C /opt/module/
[root@pulsar01 software]# cd /opt/module/
[root@pulsar01 module]# ll
total 8
drwxr-xr-x 8 root root 4096 May 29 09:18 apache-pulsar-2.7.2
drwxr-xr-x 7   10  143 4096 Apr  2  2019 jdk1.8.0_212
# 更改应用目录名,便于管理
[root@pulsar01 module]# mv apache-pulsar-2.7.2 pulsar
[root@pulsar01 module]# cd pulsar/
# 目录结构
[root@pulsar01 pulsar]# ll
total 84
drwxr-xr-x 3  501 games  4096 May  3 20:07 bin		# Pulsar 命令行工具,比如 pulsar 和 pulsar-admin
drwxr-xr-x 5  501 games  4096 May  3 20:07 conf		# 配置文件,包含ZooKeeper,Bookeeper,Pulsar 等等
drwxr-xr-x 3 root root   4096 May 27 16:12 examples
drwxr-xr-x 4 root root   4096 May 27 16:12 instances
drwxr-xr-x 3 root root  20480 May 27 16:13 lib		# Pulsar 使用的 JAR 文件
-rw-r--r-- 1  501 games 31556 May  3 20:07 LICENSE
drwxr-xr-x 2  501 games  4096 May  3 20:07 licenses
-rw-r--r-- 1  501 games  6599 May  3 20:07 NOTICE
-rw-r--r-- 1  501 games  1269 May  3 20:03 README
[root@pulsar01 pulsar]# 
# 增加pulsar环境变量
[root@pulsar01 pulsar]# vim /etc/profile
#PULSAR_HOME
export PULSAR_HOME=/opt/module/pulsar
export PATH=$PATH:$PULSAR_HOME/bin
# 引用生效
[root@pulsar01 pulsar]# source /etc/profile

# pulsar02 | pulsar03 操作相同

3. Install zookeeper

# 更改zookeeper配置文件,配置必要内容
[root@pulsar01 pulsar]# cd conf/
# 配置文件最后增加如下配置,IP需对应修改
[root@pulsar01 conf]# vim zookeeper.conf

server.1=172.23.118.214:2888:3888
server.2=172.23.118.215:2888:3888
server.3=172.23.118.216:2888:3888

# 完成配置文件内容
[root@pulsar01 conf]# cat zookeeper.conf |grep -vE "^$|^#"
tickTime=2000
initLimit=10
syncLimit=5
dataDir=data/zookeeper
clientPort=2181
admin.enableServer=true
admin.serverPort=9990
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
forceSync=yes
server.1=172.23.118.214:2888:3888
server.2=172.23.118.215:2888:3888
server.3=172.23.118.216:2888:3888

# 根据配置文件dataDir=data/zookeeper配置,建立对应目录(目录是相对路径,相对于pulsar应用目录下)
[root@pulsar01 conf]# cd /opt/module/pulsar/
[root@pulsar01 pulsar]# mkdir -p data/zookeeper
[root@pulsar01 pulsar]# cd data/zookeeper/
# 每个Zookeeper节点的ID号不能重复,并且和server.N的编号对应,N依次为1,2,3(pulsar01->1 / pulsar02->2 / pulsar03->3)
[root@pulsar01 zookeeper]# echo 1 > myid
[root@pulsar01 zookeeper]# pwd
/opt/module/pulsar/data/zookeeper
[root@pulsar01 zookeeper]# ll
total 4
-rw-r--r-- 1 root root 2 May 27 16:37 myid
[root@pulsar01 zookeeper]# cat myid 
1
[root@pulsar01 zookeeper]# 
############################
# 【pulsar02机器】
[root@pulsar02 conf]# cd /opt/module/pulsar/
[root@pulsar02 pulsar]# mkdir -p data/zookeeper
[root@pulsar02 ~]# cd /opt/module/pulsar/data/zookeeper/
[root@pulsar02 zookeeper]# ls
myid
# 把myid文件中内容改为2
[root@pulsar02 zookeeper]# vim myid 
2
[root@pulsar02 pulsar]# 

############################
# 【pulsar03机器】
[root@pulsar03 conf]# cd /opt/module/pulsar/
[root@pulsar03 pulsar]# mkdir -p data/zookeeper
[root@pulsar03 software]# cd /opt/module/pulsar/data/zookeeper/
[root@pulsar02 zookeeper]# ls
myid
# 把myid文件中内容改为3
[root@pulsar03 zookeeper]# vim myid 
3
[root@pulsar03 pulsar]# 

4. Start zookeeper

# pulsar01
[root@pulsar01 zookeeper]# pulsar-daemon start zookeeper
[root@pulsar01 zookeeper]# jps
20237 ZooKeeperStarter
20415 Jps
[root@pulsar01 zookeeper]# netstat -tnlpu|grep 20237
tcp        0      0 172.23.118.214:3888      0.0.0.0:*               LISTEN      20237/java          
tcp        0      0 0.0.0.0:8000            0.0.0.0:*               LISTEN      20237/java          
tcp        0      0 0.0.0.0:2181            0.0.0.0:*               LISTEN      20237/java          
tcp        0      0 0.0.0.0:9990            0.0.0.0:*               LISTEN      20237/java  

# pulsar02
[root@pulsar02 zookeeper]# pulsar-daemon start zookeeper
[root@pulsar02 zookeeper]# jps
20257 Jps
20071 ZooKeeperStartercd .
[root@pulsar02 zookeeper]# netstat -tnlpu|grep 20071
tcp        0      0 172.23.118.215:3888      0.0.0.0:*               LISTEN      20071/java          
tcp        0      0 0.0.0.0:8000            0.0.0.0:*               LISTEN      20071/java          
tcp        0      0 0.0.0.0:2181            0.0.0.0:*               LISTEN      20071/java          
tcp        0      0 0.0.0.0:9990            0.0.0.0:*               LISTEN      20071/java          
tcp        0      0 172.23.118.215:2888      0.0.0.0:*               LISTEN      20071/java  

# pulsar03
[root@pulsar03 zookeeper]# pulsar-daemon start zookeeper
[root@pulsar03 zookeeper]# jps
10870 ZooKeeperStarter
20250 Jps
[root@pulsar03 zookeeper]# netstat -tnlpu|grep 10870
tcp        0      0 172.23.118.216:3888      0.0.0.0:*               LISTEN      10870/java          
tcp        0      0 0.0.0.0:8000            0.0.0.0:*               LISTEN      10870/java          
tcp        0      0 0.0.0.0:2181            0.0.0.0:*               LISTEN      10870/java          
tcp        0      0 0.0.0.0:9990            0.0.0.0:*               LISTEN      10870/java 

5. Initialize cluster metadata

After the ZooKeeper cluster is started successfully, you need to write some meta-information of the Pulsar cluster to each node of the ZooKeeper cluster. Since the data will be synchronized with each other within the ZooKeeper cluster, you only need to write the meta-information to one node of ZooKeeper:

# 注意挑一台机器执行即可
[root@pulsar01 zookeeper]# pulsar initialize-cluster-metadata \
  --cluster pulsar-cluster-1 \
  --zookeeper 172.23.118.214:2181 \
  --configuration-store 172.23.118.214:2181 \
  --web-service-url http://172.23.118.214:8080,172.23.118.215:8080,172.23.118.216:8080 \
  --broker-service-url pulsar://172.23.118.214:6650,172.23.118.215:6650,172.23.118.216:6650
  
  ...
  ....
09:35:18.340 [main] INFO  org.apache.bookkeeper.stream.storage.impl.cluster.ZkClusterInitializer - Successfully initialized the stream cluster : 
num_storage_containers: 16
09:35:18.341 [Curator-Framework-0] INFO  org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
09:35:18.447 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x100002585920003 closed
09:35:18.447 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100002585920003
09:35:18.674 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x100002585920000 closed
09:35:18.674 [main-EventThread] WARN  org.apache.pulsar.zookeeper.ZookeeperClientFactoryImpl - Unexpected ZK event received: WatchedEvent state:Closed type:None path:null
09:35:18.674 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100002585920000
09:35:18.776 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x100002585920001 closed
09:35:18.776 [main-EventThread] WARN  org.apache.pulsar.zookeeper.ZookeeperClientFactoryImpl - Unexpected ZK event received: WatchedEvent state:Closed type:None path:null
09:35:18.776 [main] INFO  org.apache.pulsar.PulsarClusterMetadataSetup - Cluster metadata for 'pulsar-cluster-1' setup correctly
09:35:18.776 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100002585920001

Parameter description is as follows:

parameter illustrate
—cluster pulsar cluster name
–zookeeper The zookeeper address only needs to include any machine in the zookeeer cluster.
–configuration-store To configure the storage address, you only need to include any machine in the zookeeer cluster.
–web-service-url The URL and port of the pulsar cluster web service. The default port is 8080
–broker-service-url The URL of the broker service, used to interact with brokers in the pulsar cluster. The default port is 6650

6. Verify initialization metadata

Execute the zookeeper client connection command to verify the initialization situation

[root@pulsar01 zookeeper]# pulsar zookeeper-shell
Connecting to localhost:2181
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
ls /
[admin, bookies, ledgers, managed-ledgers, namespace, stream, zookeeper]
ls /namespace
[]
ls /admin
[clusters, partitioned-topics, policies]
ls /admin/clusters
[global, pulsar-cluster-1]
quit

WATCHER::
WatchedEvent state:Closed type:None path:null
11:13:48.699 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x1000010dffe0004
11:13:48.699 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x1000010dffe0004 closed

[Note 1:] After pressing the Enter key to enter the command line interface, there is no identifier at the beginning of the line. You can use various ZooKeeper commands, such as ls, get and other commands. Use quit command to exit

[Note 2:] If the initialization is unsuccessful and the operation fails, you can delete the two paths in zookeeper, and then troubleshoot the problem and perform the initialization again.

# zk
/namespace
/admin/clusters/pulsar-cluster-1

7. Deploy the BookKeeper cluster

Modify Bookeeper configuration file

# 配置zkServers参数 【3台机器修改】
[root@pulsar01 conf]# vim bookkeeper.conf 
zkServers=172.23.118.214:2181,172.23.118.215:2181,172.23.118.216:2181

Directory required to create bookie

# pulsar01、02、03
[root@pulsar01 conf]# cd /opt/module/pulsar/data/
[root@pulsar01 data]# mkdir bookkeeper

Execute the initialization metadata command; if prompted, enter Y and continue (only needs to be executed once on a bookie node)

[root@pulsar01 data]# bookkeeper shell metaformat
JMX enabled by default
...
...
11:19:50.778 [main-EventThread] INFO  org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is connected now.
Ledger root already exists. Are you sure to format bookkeeper metadata? This may cause data loss. (Y or N) Y
11:19:56.614 [main] INFO  org.apache.bookkeeper.discover.ZKRegistrationManager - Successfully formatted BookKeeper metadata
11:19:56.718 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x3000010a3350000 closed
11:19:56.718 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x3000010a3350000
[root@pulsar01 data]# 

Start BookKeeper

[root@pulsar01 conf]# pulsar-daemon start bookie
doing start bookie ...
starting bookie, logging to /opt/module/pulsar/logs/pulsar-bookie-pulsar01.log
Note: Set immediateFlush to true in conf/log4j2.yaml will guarantee the logging event is flushing to disk immediately. The default behavior is switched off due to performance considerations.

############################################
[root@pulsar02 conf]# pulsar-daemon start bookie
doing start bookie ...
starting bookie, logging to /opt/module/pulsar/logs/pulsar-bookie-pulsar02.log
Note: Set immediateFlush to true in conf/log4j2.yaml will guarantee the logging event is flushing to disk immediately. The default behavior is switched off due to performance considerations.

############################################
[root@pulsar03 zookeeper]# pulsar-daemon start bookie
doing start bookie ...
starting bookie, logging to /opt/module/pulsar/logs/pulsar-bookie-pulsar03.log
Note: Set immediateFlush to true in conf/log4j2.yaml will guarantee the logging event is flushing to disk immediately. The default behavior is switched off due to performance considerations.

Node startup information is stored in a VERSION file

[root@pulsar01 pulsar]# cd /opt/module/pulsar/data/bookkeeper/ledgers/current
[root@pulsar01 current]# cat VERSION 
4
bookieHost: "172.23.118.214:3181"
journalDir: "data/bookkeeper/journal"
ledgerDirs: "1\tdata/bookkeeper/ledgers"
instanceId: "f3d45b7f-f73a-4ded-acdc-3c2bad9e8311"

Verify cluster status

Use the simpletest command of the Bookeeper shell on any Bookeeper node to verify whether all bookies in the cluster have been started. 3 is the number of Bookeeper nodes.

[root@pulsar01 pulsar]# bookkeeper shell simpletest --ensemble 3 --writeQuorum 3 --ackQuorum 3 --numEntries 3

The parameter meanings are as follows:

-a,–ackQuorum Ack quorum size (default 2) When the specified number of bookie acks respond, the message is considered to be written successfully
-e,–ensemble Ensemble size (default 3) The number of bookie nodes that write data
-n,–numEntries Entries to write (default 1000) the number of messages in a batch of messages
-w,–writeQuorum Write quorum size (default 2) the number of copies of each message

This command will create the same number of ledgers as bookies on the cluster, write some entries into them, then read them, and finally delete the ledger.

8. Deploy pulsar cluster

Modify the configuration file vim broker.conf

# pulsar01 pulsar02 pulsar03
[root@pulsar01 conf]# vim broker.conf 
# 配置pulsar broker连接的zookeeper集群地址
zookeeperServers=172.23.118.214:2181,172.23.118.215:2181,172.23.118.216:2181
configurationStoreServers=172.23.118.214:2181,172.23.118.215:2181,172.23.118.216:2181
clusterName=pulsar-cluster-1

Start the Pulsar cluster

[root@pulsar01 software]# pulsar-daemon start broker
[root@pulsar02 software]# pulsar-daemon start broker
[root@pulsar03 software]# pulsar-daemon start broker

Check the status of cluster brokers nodes

[root@pulsar03 conf]# pulsar-admin brokers list pulsar-cluster-1
"pulsar01:8080"
"pulsar02:8080"
"pulsar03:8080"

9. Configure the client to connect to the Pulsar cluster

# pulsar01 pulsar02 pulsar03
[root@pulsar01 conf]# vim client.conf

webServiceUrl=http://172.23.118.214:8080,172.23.118.215:8080,172.23.118.216:8080
# URL for Pulsar Binary Protocol (for produce and consume operations)
# For TLS:
# brokerServiceUrl=pulsar+ssl://localhost:6651/
brokerServiceUrl=pulsar://172.23.118.214:6650,172.23.118.215:6650,172.23.118.216:6650

10. Command line verification of production and consumption messages

Consumption:

[root@pulsar01 conf]# pulsar-client consume \
  persistent://public/default/pulsar-test \
  -n 100 \
  -s "consumer-test" \
  -t "Exclusive"

Change a window to produce:

[root@pulsar01 conf]# pulsar-client produce \
  persistent://public/default/pulsar-test \
  -n 1 \
  -m "Hello Pulsar"

Observe the output of the consumer console. If the output content is content:Hello Pulsar, the process is completed;

11、pulsar-dashboard

Use docker to install apachepulsar/pulsar-dashboard

# 安装docker
[root@pulsar01 conf]# yum install -y docker
# 启动docker 并开机自启动
[root@pulsar01 conf]# systemctl start docker && systemctl enable docker
# 查看80端口是否被占用
[root@pulsar01 conf]# netstat -tnlpu|grep 80
tcp        0      0 0.0.0.0:8000            0.0.0.0:*               LISTEN      1882/java           
tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN      13468/java          
udp        0      0 0.0.0.0:68              0.0.0.0:*                           809/dhclient        
# 运行apachepulsar/pulsar-dashboard
[root@pulsar01 conf]# docker run --name pulsar-dashboard -dit -p 80:80 -e SERVICE_URL=http://PULSARSEVERIP:8080 apachepulsar/pulsar-dashboard
# 查看镜像服务是否正常运行
[root@pulsar01 conf]# docker ps
CONTAINER ID        IMAGE                           COMMAND              CREATED             STATUS              PORTS                NAMES
ed6e1ec3da05        apachepulsar/pulsar-dashboard   "/pulsar/start.sh"   47 seconds ago      Up 44 seconds       0.0.0.0:80->80/tcp   pulsar-dashboard
[root@pulsar01 conf]# netstat -tnlpu|grep 80
tcp        0      0 0.0.0.0:8000            0.0.0.0:*               LISTEN      1882/java           
tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN      13468/java          
tcp6       0      0 :::80                   :::*                    LISTEN      14877/docker-proxy- 
udp        0      0 0.0.0.0:68              0.0.0.0:*                           809/dhclient        

Insert image description here

Guess you like

Origin blog.csdn.net/wt334502157/article/details/117414153