DAY 75 [Distributed Application] Kafka + EFLFK Cluster Deployment of Message Queue

The official download address of the software package of the pache company: archive.apache.org/dist/

Note: Kafka no longer depends on zookeeper after version 3.0

Zookeeper overview

Official download address: https://archive.apache.org/dist/zookeeper/

 Zookeeper definition

Zookeeper is an open source distributed Apache project that provides coordination services for distributed frameworks

Zookeeper working mechanism

Zookeeper is understood from the perspective of design patterns: it is a distributed service management framework designed based on the observer pattern. It is responsible for storing and managing the data that everyone cares about, and then accepts the registration of observers. Once the state of these data changes, Zookeeper will will be responsible for notifying those observers registered with Zookeeper to react accordingly.

That is to say Zookeeper = file system + notification mechanism

Zookeeper Features

(1) Zookeeper: a leader (Leader), a cluster of multiple followers (Follower).

(2) As long as more than half of the nodes in the Zookeeper cluster survive, the Zookeeper cluster can serve normally. So Zookeeper is suitable for installing an odd number of servers.

(3) Global data consistency: each server saves a copy of the same data, and the data is consistent no matter which server the client connects to.

(4) The update requests are executed sequentially, and the update requests from the same Client are executed sequentially in the order in which they are sent, that is, first in first out.

(5) Data update atomicity, a data update either succeeds or fails.

(6) Real-time, within a certain time range, the Client can read the latest data

  Zookeeper data structure

The structure of the ZooKeeper data model is very similar to the Linux file system, which can be regarded as a tree as a whole, and each node is called a ZNode. Each ZNode can store 1MB of data by default, and each ZNode can be uniquely identified by its path

Zookeeper application scenarios

The services provided include: unified naming service, unified configuration management, unified cluster management, dynamic up and down of server nodes, soft load balancing, etc.

 Unified Naming Service

In a distributed environment, it is often necessary to uniformly name applications/services for easy identification. For example: IP is not easy to remember, but domain name is easy to remember

 Unified configuration management

(1) In a distributed environment, configuration file synchronization is very common. It is generally required that in a cluster, the configuration information of all nodes is consistent, such as Kafka cluster. After the configuration file is modified, it is hoped that it can be quickly synchronized to each node.

(2) Configuration management can be implemented by ZooKeeper. Configuration information can be written to a Znode on ZooKeeper. Each client server listens to this Znode. Once the data in Znode is modified, ZooKeeper will notify each client server

 Unified cluster management

(1) In a distributed environment, it is necessary to know the status of each node in real time. Some adjustments can be made according to the real-time status of the nodes.

(2) ZooKeeper can realize real-time monitoring of node status changes. Node information can be written to a ZNode on ZooKeeper. Listen to this ZNode to get its real-time status changes

The server goes online and offline dynamically

The client can gain real-time insight into the changes of the server going online and offline

soft load balancing

Record the number of visits of each server in Zookeeper, and let the server with the least number of visits handle the latest client requests

Zookeeper election mechanism

First launch of the election mechanism

Suppose there are 5 servers:

(1) Server 1 starts and initiates an election. Server 1 votes for itself. At this time, server 1 has one vote, less than half (3 votes), the election cannot be completed, and the status of server 1 remains LOOKING;

(2) Server 2 is started, and another election is initiated. Servers 1 and 2 cast their own votes and exchange ballot information: At this time, server 1 finds that the myid of server 2 is larger than the one currently voted for (server 1), and changes the vote to recommend server 2. At this time, server 1 has 0 votes and server 2 has 2 votes. If there is no more than half of the results, the election cannot be completed, and the status of servers 1 and 2 remains LOOKING.

(3) Server 3 starts and initiates an election. At this point, both servers 1 and 2 will change their votes to server 3. The result of this vote: Server 1 has 0 votes, Server 2 has 0 votes, and Server 3 has 3 votes. At this time, server 3 has more than half of the votes, and server 3 is elected as the leader. Servers 1 and 2 change the status to FOLLOWING, and server 3 changes the status to LEADING;

(4) Server 4 is started and an election is initiated. At this time, servers 1, 2, and 3 are no longer in the LOOKING state, and the ballot information will not be changed. The result of exchanging ballot information: Server 3 has 3 votes, and Server 4 has 1 vote. At this time, server 4 obeys the majority, changes the ballot information to server 3, and changes the state to FOLLOWING;

(5) Server 5 is started, as a younger brother like Server 4

It is not the first time to start the election mechanism

When a server in the ZooKeeper cluster has one of the following two situations, it will start to enter the Leader election:

(1) The server is initialized and started.

(2) The connection to the Leader cannot be maintained while the server is running.

2. When a machine enters the Leader election process, the current cluster may also be in the following two states:

(1) There is already a Leader in the cluster.

  • For the case where a leader already exists, when the machine tries to elect a leader, it will be informed of the leader information of the current server. For this machine, it only needs to establish a connection with the leader machine and perform state synchronization.

(2) There is indeed no Leader in the cluster.

  • Suppose ZooKeeper consists of 5 servers, the SIDs are 1, 2, 3, 4, and 5, and the ZXIDs are 8, 8, 8, 7, and 7, and the server with SID 3 is the leader. At some point, servers 3 and 5 fail, so a Leader election begins.

  • Election Leader rules:

    1. The big EPOCH wins directly
    2. EPOCH is the same, the one with the larger transaction id wins
    3. The transaction id is the same, the one with the larger server id wins

Tips:

  • SID: server ID. It is used to uniquely identify a machine in a ZooKeeper cluster. Each machine cannot be repeated, and it is consistent with myid.
  • ZXID: Transaction ID. ZXID is a transaction ID used to identify a server state change. At a certain moment, the ZXID value of each machine in the cluster may not be exactly the same, which is related to the processing logic speed of the ZooKeeper server for the client "update request".
  • Epoch: The code name for each Leader term. The logical clock value in the same round of voting process is the same when there is no Leader. This number increases each time a vote is cast

2. Deploy the Zookeeper cluster

Prepare 3 servers for Zookeeper cluster

192.168.137.10
192.168.137.15
192.168.137.20

turn off firewall

#所有节点执行
systemctl stop firewalld
systemctl disable firewalld
setenforce 0

 Install JDK

#非最小化安装一般自带
yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel
java -version

 Download the installation package

Official download address: Index of /dist/zookeeper

cd /opt
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.6.3/apache-zookeeper-3.6.3-bin.tar.gz

 Modify configuration files (all nodes)

 

cd /usr/local/zookeeper-3.6.3/conf/
cp zoo_sample.cfg zoo.cfg
 
vim zoo.cfg
tickTime=2000   #通信心跳时间,Zookeeper服务器与客户端心跳时间,单位毫秒
initLimit=10    #Leader和Follower初始连接时能容忍的最多心跳数(tickTime的数量),这里表示为10*2s
syncLimit=5     #Leader和Follower之间同步通信的超时时间,这里表示如果超过5*2s,Leader认为Follwer死掉,并从服务器列表中删除Follwer
dataDir=/usr/local/zookeeper-3.6.3/data      ●修改,指定保存Zookeeper中的数据的目录,目录需要单独创建
dataLogDir=/usr/local/zookeeper-3.6.3/logs   ●添加,指定存放日志的目录,目录需要单独创建
clientPort=2181   #客户端连接端口
#添加集群信息
server.1=192.168.52.140:3188:3288
server.2=192.168.52.110:3188:3288
server.3=192.168.52.100:3188:3288
#集群节点通信时使用端口3188,选举leader时使用的端口3288
-------------------------------------------------------------------------------------
 
server.A=B:C:D
●A是一个数字,表示这个是第几号服务器。集群模式下需要在zoo.cfg中dataDir指定的目录下创建一个文件myid,这个文件里面有一个数据就是A的值,Zookeeper启动时读取此文件,拿到里面的数据与zoo.cfg里面的配置信息比较从而判断到底是哪个server。
●B是这个服务器的地址。
●C是这个服务器Follower与集群中的Leader服务器交换信息的端口。
 
●D是万一集群中的Leader服务器挂了,需要一个端口来重新进行选举,选出一个新的Leader,而这个端口就是用来执行选举时服务器相互通信的端口。
scp zoo.cfg 192.168.137.15:`pwd`
scp zoo.cfg 192.168.137.20:`pwd`

Create a data directory and a log directory on each node

mkdir /usr/local/zookeeper-3.6.3/data
mkdir /usr/local/zookeeper-3.6.3/logs

Create a myid file in the directory specified by the dataDir of each node

echo 1 > /usr/local/zookeeper-3.6.3/data/myid
echo 2 > /usr/local/zookeeper-3.6.3/data/myid
echo 3 > /usr/local/zookeeper-3.6.3/data/myid

Configure the Zookeeper startup script

vim /etc/init.d/zookeeper
#!/bin/bash
#chkconfig:2345 20 90
#description:Zookeeper Service Control Script
ZK_HOME='/usr/local/zookeeper-3.6.3'
case $1 in
start)
	echo "---------- zookeeper 启动 ------------"
	$ZK_HOME/bin/zkServer.sh start
;;
stop)
	echo "---------- zookeeper 停止 ------------"
	$ZK_HOME/bin/zkServer.sh stop
;;
restart)
	echo "---------- zookeeper 重启 ------------"
	$ZK_HOME/bin/zkServer.sh restart
;;
status)
	echo "---------- zookeeper 状态 ------------"
	$ZK_HOME/bin/zkServer.sh status
;;
*)
    echo "Usage: $0 {start|stop|restart|status}"
esac
#将服务控制脚本传输到其他节点
scp /etc/init.d/zookeeper 192.168.137.10:/etc/init.d/
scp /etc/init.d/zookeeper 192.168.137.20:/etc/init.d/
#设置开机自启
chmod +x /etc/init.d/zookeeper
chkconfig --add zookeeper
 
#分别启动 Zookeeper
service zookeeper start
 
#查看当前状态
service zookeeper statusvim /etc/init.d/zookeeper
#!/bin/bash
#chkconfig:2345 20 90
#description:Zookeeper Service Control Script
ZK_HOME='/usr/local/zookeeper-3.6.3'
case $1 in
start)
	echo "---------- zookeeper 启动 ------------"
	$ZK_HOME/bin/zkServer.sh start
;;
stop)
	echo "---------- zookeeper 停止 ------------"
	$ZK_HOME/bin/zkServer.sh stop
;;
restart)
	echo "---------- zookeeper 重启 ------------"
	$ZK_HOME/bin/zkServer.sh restart
;;
status)
	echo "---------- zookeeper 状态 ------------"
	$ZK_HOME/bin/zkServer.sh status
;;
*)
    echo "Usage: $0 {start|stop|restart|status}"
esac
#将服务控制脚本传输到其他节点
scp /etc/init.d/zookeeper 192.168.137.10:/etc/init.d/
scp /etc/init.d/zookeeper 192.168.137.20:/etc/init.d/
#设置开机自启
chmod +x /etc/init.d/zookeeper
chkconfig --add zookeeper
 
#分别启动 Zookeeper
service zookeeper start
 
#查看当前状态
service zookeeper status

Kafka overview

Why do you need Message Queuing (MQ)

The main reason is that in a high-concurrency environment, synchronous requests are too late to process, and requests often block. For example, a large number of requests access the database concurrently, resulting in row locks and table locks. In the end, too many request threads will accumulate, which will trigger too many connection errors and cause an avalanche effect.
We use message queues to ease the pressure on the system by processing requests asynchronously. Message queues are often used in scenarios such as asynchronous processing, traffic peak shaving, application decoupling, and message communication.

Currently common middleware
middleware for web applications: nginx, tomcat, apache, haproxy, squid, varnish.

MQ message queue middleware:
the benefits of using message queues such as ActiveMQ, RabbitMQ, RocketMQ, Kafka, redis, etc.

  • decoupling

Allows you to extend or modify the processing on both sides independently, as long as they adhere to the same interface constraints.

  • Recoverability

When a part of the system fails, it does not affect the whole system. The message queue reduces the coupling between processes, so even if a process that processes messages hangs up, the messages added to the queue can still be processed after the system recovers.

  • buffer

It helps to control and optimize the speed of data flow through the system, and solve the situation that the processing speed of production messages and consumption messages is inconsistent.

  • Flexibility & Peak Handling Capabilities

In the case of a surge in traffic, the application still needs to continue to function, but such bursts of traffic are uncommon. It is undoubtedly a huge waste to invest resources on standby at all times to handle such peak access. The use of message queues can enable key components to withstand sudden access pressure without completely crashing due to sudden overload requests.

  • asynchronous communication

Many times, users don't want to and don't need to process messages right away. Message queues provide an asynchronous processing mechanism that allows users to put a message into a queue without processing it immediately. Put as many messages on the queue as you want, and process them when needed.

 Two modes of message queue

  • Point-to-point mode (one-to-one, consumers actively pull data, and the message is cleared after the message is received)
  • The message producer produces the message and sends it to the message queue, and then the message consumer takes out the message from the message queue and consumes the message. After the message is consumed, there is no more storage in the message queue, so it is impossible for the message consumer to consume the message that has already been consumed. The message queue supports multiple consumers, but for a message, only one consumer can consume it.
     
  • Publish/subscribe mode (one-to-many, also known as observer mode, consumers will not clear messages after consuming data)
  • A message producer (publisher) publishes a message to a topic, and multiple message consumers (subscribers) consume the message at the same time. Unlike the peer-to-peer method, messages published to a topic will be consumed by all subscribers. The publish/subscribe mode is to define a one-to-many dependency relationship between objects, so that whenever the state of an object (target object) changes, all objects (observer objects) that depend on it will be notified and automatically updated
     

Kafka definition

  • Kafka is a distributed publish/subscribe-based message queue (MQ, Message Queue), mainly used in the field of big data real-time processing

Introduction to Kafka

Kafka was originally developed by Linkedin. It is a distributed, partition-supporting, and replica-based distributed message middleware system coordinated by Zookeeper. Its biggest feature is that it can process large amounts of data in real time. To meet various demand scenarios, such as hadoop-based batch processing system, low-latency real-time system, Spark/Flink streaming processing engine, nginx access log, message service, etc., written in scala language, Linkedin contributed to it in 2010 Apache Foundation and become a top open source project
 

Features of Kafka

High Throughput, Low Latency

Kafka can process hundreds of thousands of messages per second, and its latency is as low as a few milliseconds. Each topic can be divided into multiple Partitions, and the Consumer Group performs consumption operations on pPartitions to improve load balancing and consumption capabilities.
scalability

Kafka cluster supports thermal expansion
persistence and reliability

Messages are persisted to the local disk, and data backup is supported to prevent data loss and
fault tolerance

Allow nodes in the cluster to fail (in the case of multiple copies, if the number of copies is n, n-1 nodes are allowed to fail)
High concurrency

Support thousands of clients to read and write at the same time
 

Kafka system architecture

Broker

A kafka server is a broker. A cluster consists of multiple brokers. A broker can hold multiple topics.
topic

It can be understood as a queue, and both the producer and the consumer are facing a topic.
Similar to the table name of the database or the index of ES

Physically different topic messages are stored separately

partition

In order to achieve scalability, a very large topic can be distributed to multiple brokers (ie servers), and a topic can be divided into one or more partitions, and each partition is an ordered queue. Kafka only guarantees that the records in the partition are in order, but does not guarantee the order of different partitions in the topic
 

Partation data routing rules

  • If patition is specified, use it directly;
  • If the patition is not specified but the key is specified (equivalent to an attribute in the message), a patition is selected by performing hash modulo on the value of the key;
  • Both patition and key are not specified, and a patition is selected by polling.
  • Each message will have a self-incrementing number, which is used to identify the offset of the message, and the identification sequence starts from 0.
  • Data in each partition is stored using multiple segment files.
  • If the topic has multiple partitions, the order of the data cannot be guaranteed when consuming data. In the scenario where the consumption order of messages is strictly guaranteed (such as flash sales of products and grabbing red envelopes), the number of partitions needs to be set to 1.
  • The broker stores topic data. If a topic has N partitions and the cluster has N brokers, each broker stores a partition of the topic.
  • If a topic has N partitions and the cluster has (N+M) brokers, then N brokers store a partition of the topic, and the remaining M brokers do not store the partition data of the topic.
  • If a topic has N partitions and the number of brokers in the cluster is less than N, then one broker stores one or more partitions of the topic. In the actual production environment, try to avoid this situation, which can easily lead to data imbalance in the Kafka cluster
     

reason for partition

  • It is convenient to expand in the cluster. Each Partition can be adjusted to adapt to the machine where it is located, and a topic can be composed of multiple Partitions, so the entire cluster can adapt to data of any size;
  • Concurrency can be improved, because it can be read and written in units of Partition

Replica

  • Copy, in order to ensure that when a node in the cluster fails, the partition data on the node will not be lost, and Kafka can still continue to work. Kafka provides a copy mechanism. Each partition of a topic has several copies, and a leader and several followers

Leader

  • Each partition has multiple copies, one and only one of which is the leader, and the leader is the partition currently responsible for reading and writing data.

Follower

  • Followers follow the Leader, all write requests are routed through the Leader, data changes are broadcast to all Followers, and Followers and Leaders maintain data synchronization. Follower is only responsible for backup, not for reading and writing data.
  • If the Leader fails, a new Leader is elected from the Followers.
  • When a Follower hangs, gets stuck, or is too slow to synchronize, the Leader will delete the Follower from the ISR (a set of Followers maintained by the Leader that is synchronized with the Leader) list, and create a new Follower.
     

Producer

  • The producer is the publisher of the data, and this role publishes the message push to the topic of Kafka.
  • After the broker receives the message sent by the producer, the broker appends the message to the segment file currently used for appending data.
  • The message sent by the producer is stored in a partition, and the producer can also specify the partition of the data storage
     

Consumer

  • Consumers can pull data from brokers. Consumers can consume data from multiple topics.

Consumer Group(CG)

  • A consumer group consists of multiple consumers.
  • All consumers belong to a consumer group, that is, a consumer group is a logical subscriber. A group name can be specified for each consumer, and if no group name is specified, it belongs to the default group.
  • Collecting multiple consumers together to process the data of a certain topic can improve the consumption capacity of data faster.
  • Each consumer in the consumer group is responsible for consuming data from different partitions. A partition can only be consumed by one consumer in the group to prevent data from being read repeatedly.
  • Consumer groups do not affect each other
     

offset  offset

  • A message can be uniquely identified.
  • The offset determines the location of the read data, and there will be no thread safety issues. The consumer uses the offset to determine the message to be read next time (that is, the consumption location).
  • After the message is consumed, it is not deleted immediately, so that multiple businesses can reuse Kafka messages.
  • A certain service can also achieve the purpose of re-reading messages by modifying the offset, which is controlled by the user.
  • The message will eventually be deleted, and the default life cycle is 1 week (7*24 hours).

Zookeeper

  • Kafka uses Zookeeper to store the meta information of the cluster. Since the consumer may experience failures such as power outages and downtime during the consumption process, after the consumer recovers, it needs to continue to consume from the location before the failure. Therefore, the consumer needs to record which offset it consumes in real time, so that it can continue to consume after the failure recovers. Before Kafka version 0.9, the consumer saved the offset in Zookeeper by default; starting from version 0.9, the consumer saved the offset in a built-in Kafka topic by default, which is __consumer_offsets. That is to say, the role of zookeeper is that when the producer pushes data to the kafka cluster, it is necessary to find out where the nodes of the kafka cluster are, and these are all found through zookeeper. Which piece of data the consumer consumes also needs the support of zookeeper. The offset is obtained from zookeeper, and the offset records where the last consumed data was consumed, so that the next piece of data can be consumed next.
     

Deploy kafka cluster

Download the installation package

Official download address: Apache Kafka

cd /opt
wget https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/2.7.1/kafka_2.13-2.7.1.tgz

Install Kafka

cd /opt/
tar zxvf kafka_2.13-2.7.1.tgz
mv kafka_2.13-2.7.1 /usr/local/kafka

Modify the configuration file

cd /usr/local/kafka/config/
cp server.properties{,.bak}
 
vim server.properties
broker.id=0                           
#21行,broker的全局唯一编号,每个broker不能重复,因此要在其他机器上配置 broker.id=1、broker.id=2
listeners=PLAINTEXT://192.168.137.10:9092    
#31行,指定监听的IP和端口,如果修改每个broker的IP需区分开来,也可保持默认配置不用修改
num.network.threads=3    #42行,broker 处理网络请求的线程数量,一般情况下不需要去修改
num.io.threads=8         #45行,用来处理磁盘IO的线程数量,数值应该大于硬盘数
socket.send.buffer.bytes=102400       #48行,发送套接字的缓冲区大小
socket.receive.buffer.bytes=102400    #51行,接收套接字的缓冲区大小
socket.request.max.bytes=104857600    #54行,请求套接字的缓冲区大小
log.dirs=/usr/local/kafka/logs        #60行,kafka运行日志存放的路径,也是数据存放的路径
num.partitions=1    #65行,topic在当前broker上的默认分区个数,会被topic创建时的指定参数覆盖
num.recovery.threads.per.data.dir=1    #69行,用来恢复和清理data下数据的线程数量
log.retention.hours=168    #103行,segment文件(数据文件)保留的最长时间,单位为小时,默认为7天,超时将被删除
log.segment.bytes=1073741824    #110行,一个segment文件最大的大小,默认为 1G,超出将新建一个新的segment文件
zookeeper.connect=192.168.239.40:2181,192.168.137.10:2181,192.168.137.20:2181    
#123行,配置连接Zookeeper集群地址
 
mkdir /usr/local/kafka/logs

Transfer kafka to other nodes

cd /usr/local
scp -r kafka/ 192.168.137.15:`pwd`
scp -r kafka/ 192.168.137.20:`pwd

Modify other node configuration files

#50节点
cd /usr/local/kafka/config/
 vim server.properties
#修改21行broker的全局唯一编号
 broker.id=1
#修改31行监听地址
listeners=PLAINTEXT://192.168.137.15:9092
 
#60节点
cd /usr/local/kafka/config/
#修改21行broker的全局唯一编号
 broker.id=2
#修改31行监听地址
listeners=PLAINTEXT://192.168.137.20:9092

Modify environment variables (all nodes)

cd /usr/local/kafka/bin
ls
vim /etc/profile
export KAFKA_HOME=/usr/local/kafka
export PATH=$PATH:$KAFKA_HOME/bin
source /etc/profile

Write Zookeeper service control script

vim /etc/init.d/kafka
#!/bin/bash
#chkconfig:2345 22 88
#description:Kafka Service Control Script
KAFKA_HOME='/usr/local/kafka'
case $1 in
start)
	echo "---------- Kafka 启动 ------------"
	${KAFKA_HOME}/bin/kafka-server-start.sh -daemon ${KAFKA_HOME}/config/server.properties
;;
stop)
	echo "---------- Kafka 停止 ------------"
	${KAFKA_HOME}/bin/kafka-server-stop.sh
;;
restart)
	$0 stop
	$0 start
;;
status)
	echo "---------- Kafka 状态 ------------"
	count=$(ps -ef | grep kafka | egrep -cv "grep|$$")
	if [ "$count" -eq 0 ];then
        echo "kafka is not running"
    else
        echo "kafka is running"
    fi
;;
*)
    echo "Usage: $0 {start|stop|restart|status}"
esac

Transfer service control scripts to other nodes

scp /etc/init.d/kafka 192.168.137.20:/etc/init.d/
scp /etc/init.d/kafka 192.168.137.15:/etc/init.d/

Set boot up and start Kafka (all nodes)

chmod +x /etc/init.d/kafka
chkconfig --add kafka
service kafka start
ps -ef | grep kafka  #查看服务是否启动

Kafka Command Line Operations

create topic

kafka-topics.sh --create --zookeeper 192.168.137.10:2181,192.168.137.15:2181,192.168.137.20:2181 --replication-factor 2 --partitions 3 --topic ky18
 
#--zookeeper:定义 zookeeper 集群服务器地址,如果有多个 IP 地址使用逗号分割,一般使用一个 IP 即可
#--replication-factor:定义分区副本数,1 代表单副本,建议为 2 
#--partitions:定义分区数 
#--topic:定义 topic 名称

View all topics in the current server

kafka-topics.sh --list --zookeeper 192.168.137.10:2181,192.168.137.20:2181,192.168.137.15:2181 

View the details of a topic

kafka-topics.sh  --describe --zookeeper 192.168.137.10:2181,192.168.137.20:2181,192.168.137.15:2181 
 

make an announcement

kafka-console-producer.sh --broker-list 192.168.137.10:9092,192.168.137.20:9092,192.168.137.15:9092  --topic ky18

consumption news

kafka-console-consumer.sh --bootstrap-server 192.168.137.10:9092,192.168.137.20:9092,192.168.137.15:9092 --topic ky18 --from-beginning
#--from-beginning:会把主题中以往所有的数据都读取出来

Modify the number of partitions

kafka-topics.sh --zookeeper 192.168.137.10:2181,192.168.137.20:2181,192.168.137.15:2181 --alter --topic ky18 --partitions 6

delete topic

kafka-topics.sh --delete --zookeeper 192.168.137.10:2181,192.168.137.20:2181,192.168.137.15:2181 --topic ky18

Summarize

kafka architecture

broker: kafka server A kafka consists of multiple brokers

topic: A message queue producer and consumer are both topic-oriented

producer: The producer push pushes the message data to the topic of the broker

consumer: Consumer pull pulls message data from the broker's topic

partition: Partition A topic can be divided into one or more partitions to speed up message transmission (reading and writing). The copy backs up the partition. The leader is responsible for reading and writing, and the follower is responsible for the backup.

The message data in the partition is ordered. The partitions are unordered. Only one partition can be used in scenes that require order, such as seckill red envelopes.

offset: The offset records the location of the consumer's consumption message, and records where the consumer's last consumed data is, so that the next piece of data can continue to be consumed

zookeeper: Save the source information of the kafka cluster and save the offset

zookeeper combined with kafka: when the producer pushes data to the kafka cluster, it needs to address the location of kafka through zk, and which data the consumer consumes also needs zk support, because the offset can be obtained from zk

Guess you like

Origin blog.csdn.net/weixin_57560240/article/details/131040832
Recommended