Zookeeper cluster + Kafka cluster

1. Overview of Zookeeper

1. Zookeeper definition

Zookeeper is an open source distributed Apache project that provides coordination services for distributed frameworks.

2. Zookeeper working mechanism

Zookeeper is understood from the perspective of design patterns: it is a distributed service management framework designed based on the observer pattern. It is responsible for storing and managing the data that everyone cares about, and then accepts the registration of observers. Once the state of these data changes, Zookeeper will will be responsible for notifying those observers registered with Zookeeper to react accordingly. That is to say Zookeeper = file system + notification mechanism.

3. Features of Zookeeper

(1) Zookeeper: a leader (Leader), a cluster of multiple followers (Follower).

(2) As long as more than half of the nodes in the Zookeeper cluster survive, the Zookeeper cluster can serve normally. So Zookeeper is suitable for installing an odd number of servers.

(3) Global data consistency: each server saves a copy of the same data, and the data is consistent no matter which server the client connects to.

(4) The update requests are executed sequentially, and the update requests from the same Client are executed sequentially in the order in which they are sent, that is, first in first out.

(5) Data update atomicity, a data update either succeeds or fails.

(6) Real-time, within a certain time range, the Client can read the latest data.

4. Zookeeper data structure

The structure of the ZooKeeper data model is very similar to the Linux file system, which can be regarded as a tree as a whole, and each node is called a ZNode. Each ZNode can store 1MB of data by default, and each ZNode can be uniquely identified by its path.

5. Zookeeper application scenarios

The services provided include: unified naming service, unified configuration management, unified cluster management, dynamic online and offline server nodes, soft load balancing, etc.

●Unified naming service

In a distributed environment, it is often necessary to uniformly name applications/services for easy identification. For example: IP is not easy to remember, but domain name is easy to remember.

●Unified configuration management

(1) In a distributed environment, configuration file synchronization is very common. It is generally required that in a cluster, the configuration information of all nodes is consistent, such as Kafka cluster. After the configuration file is modified, it is hoped that it can be quickly synchronized to each node.

(2) Configuration management can be implemented by ZooKeeper. Configuration information can be written to a Znode on ZooKeeper. Each client server listens to this Znode. Once the data in Znode is modified, ZooKeeper will notify each client server.

●Unified cluster management

(1) In a distributed environment, it is necessary to know the status of each node in real time. Some adjustments can be made according to the real-time status of the nodes.

(2) ZooKeeper can realize real-time monitoring of node status changes. Node information can be written to a ZNode on ZooKeeper. Listening to this ZNode can obtain its real-time status changes.

●The server goes online and offline dynamically

The client can gain real-time insight into the changes of the server going online and offline.

●soft load balancing

Record the number of visits of each server in Zookeeper, and let the server with the least number of visits handle the latest client requests.

6. Zookeeper election mechanism
●Start the election mechanism for the first time
(1) Server 1 starts and initiates an election. Server 1 votes for itself. At this time, server 1 has one vote, less than half (3 votes), the election cannot be completed, and the status of server 1 remains LOOKING;

(2) Server 2 is started, and another election is initiated. Servers 1 and 2 cast their own votes and exchange ballot information: At this time, server 1 finds that the myid of server 2 is larger than the one currently voted for (server 1), and changes the vote to recommend server 2. At this time, server 1 has 0 votes, and server 2 has 2 votes. If there is no more than half of the results, the election cannot be completed, and the status of servers 1 and 2 remains LOOKING

(3) Server 3 starts and initiates an election. At this point, both servers 1 and 2 will change their votes to server 3. The result of this vote: Server 1 has 0 votes, Server 2 has 0 votes, and Server 3 has 3 votes. At this time, server 3 has more than half of the votes, and server 3 is elected as the leader. Servers 1 and 2 change the status to FOLLOWING, and server 3 changes the status to LEADING;

(4) Server 4 is started and an election is initiated. At this time, servers 1, 2, and 3 are no longer in the LOOKING state, and the ballot information will not be changed. The result of exchanging ballot information: Server 3 has 3 votes, and Server 4 has 1 vote. At this time, server 4 obeys the majority, changes the ballot information to server 3, and changes the state to FOLLOWING;

(5) Server 5 is started, and it is the same as 4 as a younger brother.
insert image description here
●Not the first time to start the election mechanism
(1) When a server in the ZooKeeper cluster has one of the following two situations, it will start to enter the Leader election:

1) The server is initialized and started.

2) The connection to the Leader cannot be maintained while the server is running.

(2) When a machine enters the Leader election process, the current cluster may also be in the following two states:

1) There is already a Leader in the cluster.

For the case where a leader already exists, when the machine tries to elect a leader, it will be informed of the leader information of the current server. For this machine, it only needs to establish a connection with the leader machine and perform state synchronization.

2) There is indeed no Leader in the cluster.

Suppose ZooKeeper consists of 5 servers, the SIDs are 1, 2, 3, 4, and 5, and the ZXIDs are 8, 8, 8, 7, and 7, and the server with SID 3 is the leader. At some point, servers 3 and 5 fail, so a Leader election begins.

Election Leader rules:

1. The big EPOCH wins directly

2. EPOCH is the same, the one with the larger transaction id wins

3. The transaction id is the same, and the one with the larger server id wins


SID: server ID. It is used to uniquely identify a machine in a ZooKeeper cluster. Each machine cannot be repeated, and it is consistent with myid.

ZXID: Transaction ID. ZXID is a transaction ID used to identify a server state change. At a certain moment, the ZXID value of each machine in the cluster may not be exactly the same, which is related to the processing logic speed of the ZooKeeper server for the client "update request".

Epoch: The code name for each Leader term. The logical clock value in the same round of voting process is the same when there is no Leader. This number increases each time a vote is cast

2. Deploy the Zookeeper cluster

1. Operation steps of deploying Zookeeper cluster

Prepare 3 servers for Zookeeper cluster
192.168.229.60
192.168.229.50
192.168.229.40
1.1 Preparation before installation
//Close the firewall

systemctl stop firewalld
systemctl disable firewalld
setenforce 0 

//Install the JDK

yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel
java -version 

//Download the installation package
Official download address: https://archive.apache.org/dist/zookeeper/

cd /opt
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.5.7/apache-zookeeper-3.5.7-bin.tar.gz

1.2. Install Zookeeper

cd /opt
tar -zxvf apache-zookeeper-3.5.7-bin.tar.gz
mv apache-zookeeper-3.5.7-bin /usr/local/zookeeper-3.5.7  

1.3 Modify the configuration file

cd /usr/local/zookeeper-3.5.7/conf/
cp zoo_sample.cfg zoo.cfg
 
vim zoo.cfg
tickTime=2000 #通信心跳时间,Zookeeper服务器与客户端心跳时间,单位毫秒
initLimit=10 #Leader和Follower初始连接时能容忍的最多心跳数(tickTime的数量),这里表示为10*2s
syncLimit=5 #Leader和Follower之间同步通信的超时时间,这里表示如果超过5*2s,Leader认为Follwer死掉,并从服务器列表中删除Follwer
dataDir=/usr/local/zookeeper-3.5.7/data ●修改,指定保存Zookeeper中的数据的目录,目录需要单独创建
dataLogDir=/usr/local/zookeeper-3.5.7/logs ●添加,指定存放日志的目录,目录需要单独创建
clientPort=2181 #客户端连接端口
#添加集群信息
server.1=192.168.229.60:3188:3288
server.2=192.168.229.50:3188:3288
server.3=192.168.229.40:3188:3288 

server.A=B:C:D
●A is a number, indicating which server number this is. In cluster mode, you need to create a file myid in the directory specified by dataDir in zoo.cfg. There is a data in this file that is the value of A. When Zookeeper starts, read this file, get the data in it and the configuration in zoo.cfg Compare the information to determine which server it is.
● B is the address of this server.
●C is the port through which the Follower of this server exchanges information with the Leader server in the cluster.
●D means that in case the Leader server in the cluster hangs up, a port is needed to re-elect to elect a new Leader, and this port is the port used for communication between the servers during the election.
1.4 Copy the configured Zookeeper configuration file to other machines

scp /usr/local/zookeeper-3.5.7/conf/zoo.cfg 192.168.229.50:/usr/local/zookeeper-3.5.7/conf/
scp /usr/local/zookeeper-3.5.7/conf/zoo.cfg 192.168.229.40:/usr/local/zookeeper-3.5.7/conf/ 

1.5 Create data directory and log directory on each node

mkdir /usr/local/zookeeper-3.5.7/data
mkdir /usr/local/zookeeper-3.5.7/logs 

1.6 Create a myid file in the directory specified by the dataDir of each node

echo 1 > /usr/local/zookeeper-3.5.7/data/myid
echo 2 > /usr/local/zookeeper-3.5.7/data/myid
echo 3 > /usr/local/zookeeper-3.5.7/data/myid

1.7 Configure Zookeeper startup script

vim /etc/init.d/zookeeper
#!/bin/bash
#chkconfig:2345 20 90
#description:Zookeeper Service Control Script
ZK_HOME='/usr/local/zookeeper-3.5.7'
case $1 in
start)
echo "---------- zookeeper 启动 ------------"
$ZK_HOME/bin/zkServer.sh start
;;
stop)
echo "---------- zookeeper 停止 ------------"
$ZK_HOME/bin/zkServer.sh stop
;;
restart)
echo "---------- zookeeper 重启 ------------"
$ZK_HOME/bin/zkServer.sh restart
;;
status)
echo "---------- zookeeper 状态 ------------"
$ZK_HOME/bin/zkServer.sh status
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
esac  

1.8 Set the boot to start automatically

chmod +x /etc/init.d/zookeeper
chkconfig --add zookeeper  

1.9 Start Zookeeper separately

service zookeeper start 

1.10 View the current status
service zookeeper status

2. Instance operation: Deploy Zookeeper cluster

2.1 Preparation before installation
insert image description here
insert image description here
insert image description here
2.2. Install Zookeeper
insert image description here
2.3 Modify the configuration file
insert image description here
2.4 Copy the configured Zookeeper configuration file to other machines
insert image description here
2.5 Create a data directory and a log directory on each node
insert image description here
2.6 Create a myid in the directory specified by the dataDir of each node 2.7
insert image description here
insert image description here
insert image description here
Configure the Zookeeper startup script
insert image description here
2.8 Set up the startup and start the service
insert image description here
2.9 View the current status
One leader, two followers
insert image description here
insert image description here
insert image description here

3. Overview of Kafka

1. Why do you need a message queue (MQ)

The main reason is that in a high-concurrency environment, synchronous requests are too late to process, and requests often block. For example, a large number of requests access the database concurrently, resulting in row locks and table locks. In the end, too many request threads will accumulate, which will trigger too many connection errors and cause an avalanche effect.

We use message queues to ease the pressure on the system by processing requests asynchronously. Message queues are often used in scenarios such as asynchronous processing, traffic peak shaving, application decoupling, and message communication.

Currently, the more common MQ middleware include ActiveMQ, RabbitMQ, RocketMQ, Kafka, etc.

2. The benefits of using message queues

(1) Decoupling

Allows you to extend or modify the processing on both sides independently, as long as they adhere to the same interface constraints.

(2) Recoverability

When a part of the system fails, it does not affect the whole system. The message queue reduces the coupling between processes, so even if a process that processes messages hangs up, the messages added to the queue can still be processed after the system recovers.

(3) Buffer

It helps to control and optimize the speed of data flow through the system, and solve the situation that the processing speed of production messages and consumption messages is inconsistent.

(4) Flexibility & peak processing capacity

In the case of a surge in traffic, the application still needs to continue to function, but such bursts of traffic are uncommon. It is undoubtedly a huge waste to invest resources on standby at all times to handle such peak access. The use of message queues can enable key components to withstand sudden access pressure without completely crashing due to sudden overload requests.

(5) Asynchronous communication

Many times, users don't want to and don't need to process messages right away. Message queues provide an asynchronous processing mechanism that allows users to put a message into a queue without processing it immediately. Put as many messages on the queue as you want, and process them when needed.

3. Two modes of message queue

(1) Point-to-point mode (one-to-one, consumers actively pull data, and the message is cleared after the message is received)

The message producer produces the message and sends it to the message queue, and then the message consumer takes it out from the message queue and consumes the message. After the message is consumed, there is no more storage in the message queue, so it is impossible for the message consumer to consume the message that has already been consumed. The message queue supports multiple consumers, but for a message, only one consumer can consume it.

(2) Publish/subscribe mode (one-to-many, also known as observer mode, consumers will not clear messages after consuming data)

A message producer (publish) publishes a message to a topic, and multiple message consumers (subscribe) consume the message. Unlike the peer-to-peer method, messages published to a topic will be consumed by all subscribers.

The publish/subscribe mode defines a one-to-many dependency relationship between objects, so that whenever the state of an object (target object) changes, all objects (observer objects) that depend on it will be notified and automatically updated.

4. Kafka definition

Kafka is a distributed publish/subscribe-based message queue (MQ, Message Queue), which is mainly used in the field of big data real-time processing.

5. Introduction to Kafka

Kafka was originally developed by Linkedin. It is a distributed, partition-supporting, replica-based distributed message middleware system based on Zookeeper coordination. Its biggest feature is that it can process large amounts of data in real time. To meet various demand scenarios, such as hadoop-based batch processing system, low-latency real-time system, Spark/Flink stream processing engine, nginx access log, message service, etc., written in scala language,

Linkedin was contributed to the Apache Foundation in 2010 and became a top open source project.

6. Features of Kafka

●High throughput, low latency

Kafka can process hundreds of thousands of messages per second, and its latency is as low as a few milliseconds. Each topic can be divided into multiple Partitions, and the Consumer Group performs consumption operations on the Partitions to improve load balancing and consumption capabilities.

●Scalability

Kafka cluster supports hot expansion

●Persistence, reliability

Messages are persisted to the local disk, and data backup is supported to prevent data loss

●Fault tolerance

Allow nodes in the cluster to fail (in the case of multiple copies, if the number of copies is n, n-1 nodes are allowed to fail)

●High concurrency

Support thousands of clients to read and write at the same time

7. Kafka system architecture

(1)Broker

A kafka server is a broker. A cluster consists of multiple brokers. A broker can accommodate multiple topics.

(2)Topic

It can be understood as a queue, and both the producer and the consumer are facing a topic.

Similar to the table name of the database or the index of ES

Physically different topic messages are stored separately

(3)Partition

In order to achieve scalability, a very large topic can be distributed to multiple brokers (ie servers), a topic can be divided into one or more partitions, and each partition is an ordered queue. Kafka only guarantees that the records in the partition are in order, but does not guarantee the order of different partitions in the topic.

Each topic has at least one partition. When the producer generates data, it will select a partition according to the allocation strategy, and then append the message to the end of the queue of the specified partition.

Partation data routing rules:
if patition is specified, it will be used directly;
if patition is not specified but a key (equivalent to an attribute in the message) is specified, a patition is selected by hashing the value of the key; neither
patition nor key is specified , use polling to select a patition.
Each message will have a self-incrementing number, which is used to identify the offset of the message, and the identification sequence starts from 0.

Data in each partition is stored using multiple segment files.

If the topic has multiple partitions, the order of the data cannot be guaranteed when consuming data. In the scenario where the order of consumption of messages is strictly guaranteed (for example, flash sales of products and grabbing red envelopes), the number of partitions needs to be set to 1.

●broker stores topic data. If a topic has N partitions and the cluster has N brokers, each broker stores a partition of the topic.
●If a topic has N partitions and the cluster has (N+M) brokers, then there are N brokers that store a partition of the topic, and the remaining M brokers do not store the partition data of the topic.
● If a topic has N partitions and the number of brokers in the cluster is less than N, then one broker stores one or more partitions of the topic. In the actual production environment, try to avoid this situation, which can easily lead to data imbalance in the Kafka cluster.

The reason for partitioning
is to facilitate expansion in the cluster. Each Partition can be adjusted to suit the machine where it is located, and a topic can be composed of multiple Partitions, so the entire cluster can adapt to data of any size;

●Concurrency can be improved, because it can be read and written in Partition.

(4)Replica

Copy, in order to ensure that when a node in the cluster fails, the partition data on the node will not be lost, and Kafka can still continue to work. Kafka provides a copy mechanism. Each partition of a topic has several copies, and a leader and several followers.

(5)Leader

Each partition has multiple copies, one and only one of which is the leader, and the leader is the partition currently responsible for reading and writing data.

(6)Follower

Followers follow the Leader, all write requests are routed through the Leader, data changes are broadcast to all Followers, and Followers and Leaders maintain data synchronization. Follower is only responsible for backup, not for reading and writing data.
If the Leader fails, a new Leader is elected from the Followers.
When the Follower hangs, gets stuck, or is too slow to synchronize, the Leader will delete the Follower from the ISR (a set of Followers maintained by the Leader that is synchronized with the Leader) list, and create a new Follower.

(7)Producer

The producer is the publisher of the data, and this role publishes the message to the topic of Kafka.

After the broker receives the message sent by the producer, the broker appends the message to the segment file currently used for appending data.

The message sent by the producer is stored in a partition, and the producer can also specify the partition of the data storage.

(8)Consumer

Consumers can read data from brokers. Consumers can consume data from multiple topics.

(9)Consumer Group(CG)

A consumer group consists of multiple consumers.

All consumers belong to a consumer group, that is, a consumer group is a logical subscriber. A group name can be specified for each consumer, and if no group name is specified, it belongs to the default group.

Collecting multiple consumers together to process the data of a certain topic can improve the consumption capacity of data faster.

Each consumer in the consumer group is responsible for consuming data from different partitions. A partition can only be consumed by one consumer in the group to prevent data from being read repeatedly.

Consumer groups do not affect each other.

(10) offset offset

A message can be uniquely identified.

The offset determines the location of the read data, and there will be no thread safety issues. The consumer uses the offset to determine the message to be read next time (that is, the consumption location).

After the message is consumed, it is not deleted immediately, so that multiple businesses can reuse Kafka messages.

A certain service can also achieve the purpose of re-reading messages by modifying the offset, which is controlled by the user.

The message will eventually be deleted, and the default life cycle is 1 week (7*24 hours).

(11)Zookeeper

Kafka uses Zookeeper to store the meta information of the cluster.

Since the consumer may experience failures such as power outages and downtime during the consumption process, after the consumer recovers, it needs to continue to consume from the location before the failure. Therefore, the consumer needs to record which offset it consumes in real time, so that it can continue to consume after the failure recovers.

Before Kafka version 0.9, the consumer saved the offset in Zookeeper by default; starting from version 0.9, the consumer saved the offset in a built-in Kafka topic by default, which is __consumer_offsets.

4. Deploy zookeeper + kafka cluster

1. Deploy zookeeper + kafka cluster

1.1 Download the installation package
Official download link: http://kafka.apache.org/downloads.html

cd /opt
wget https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/2.7.1/kafka_2.13-2.7.1.tgz

1.2. Install Kafka

cd /opt/
tar zxvf kafka_2.13-2.7.1.tgz
mv kafka_2.13-2.7.1 /usr/local/kafka

1.3 Modify the configuration file

cd /usr/local/kafka/config/
cp server.properties{,.bak}
 
vim server.properties
broker.id=0                                    #21行,broker的全局唯一编号,每个broker不能重复,因此要在其他机器上配置 broker.id=1、broker.id=2
listeners=PLAINTEXT://192.168.229.60:9092       #31行,指定监听的IP和端口,如果修改每个broker的IP需区分开来,也可保持默认配置不用修改
num.network.threads=3                          #42行,broker 处理网络请求的线程数量,一般情况下不需要去修改
num.io.threads=8                               #45行,用来处理磁盘IO的线程数量,数值应该大于硬盘数
socket.send.buffer.bytes=102400                #48行,发送套接字的缓冲区大小
socket.receive.buffer.bytes=102400             #51行,接收套接字的缓冲区大小
socket.request.max.bytes=104857600             #54行,请求套接字的缓冲区大小
log.dirs=/usr/local/kafka/logs                 #60行,kafka运行日志存放的路径,也是数据存放的路径
num.partitions=1                               #65行,topic在当前broker上的默认分区个数,会被topic创建时的指定参数覆盖
num.recovery.threads.per.data.dir=1            #69行,用来恢复和清理data下数据的线程数量
log.retention.hours=168                        #103行,segment文件(数据文件)保留的最长时间,单位为小时,默认为7天,超时将被删除
log.segment.bytes=1073741824                   #110行,一个segment文件最大的大小,默认为 1G,超出将新建一个新的segment文件
zookeeper.connect=192.168.229.60:2181,192.168.229.50:2181,192.168.229.40:2181                                   #123行,配置连接Zookeeper集群地址  

1.4 Modify environment variables

vim /etc/profile
export KAFKA_HOME=/usr/local/kafka
export PATH=$PATH:$KAFKA_HOME/bin
 
source /etc/profile  

1.5 Configure Zookeeper startup script

vim /etc/init.d/kafka
#!/bin/bash
#chkconfig:2345 22 88
#description:Kafka Service Control Script
KAFKA_HOME='/usr/local/kafka'
case $1 in
start)
echo "---------- Kafka 启动 ------------"
${KAFKA_HOME}/bin/kafka-server-start.sh -daemon ${KAFKA_HOME}/config/server.properties
;;
stop)
echo "---------- Kafka 停止 ------------"
${KAFKA_HOME}/bin/kafka-server-stop.sh
;;
restart)
$0 stop
$0 start
;;
status)
echo "---------- Kafka 状态 ------------"
count=$(ps -ef | grep kafka | egrep -cv "grep|$$")
if [ "$count" -eq 0 ];then
echo "kafka is not running"
else
echo "kafka is running"
fi
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
esac  

1.6 Set the boot to start automatically

chmod +x /etc/init.d/kafka
chkconfig --add kafka 

1.7 Start Kafka separately

service kafka start 

1.8 Kafka command line operation
//create topic

kafka-topics.sh --create --zookeeper 192.168.229.60:2181,192.168.229.50:2181,192.168.229.40:2181 --replication-factor 2 --partitions 3 --topic test
 
-------------------------------------------------------------------------------------
--zookeeper:定义 zookeeper 集群服务器地址,如果有多个 IP 地址使用逗号分割,一般使用一个 IP 即可
--replication-factor:定义分区副本数,1 代表单副本,建议为 2
--partitions:定义分区数
--topic:定义 topic 名称
------------------------------------  

//View all topics in the current server

kafka-topics.sh --list --zookeeper 192.168.229.60:2181,192.168.229.50:2181,192.168.229.40:2181 

//View the details of a topic

kafka-topics.sh --describe --zookeeper 192.168.229.60:2181,192.168.229.50:2181,192.168.229.40:2181  

//make an announcement

kafka-console-producer.sh --broker-list 192.168.229.60:9092,192.168.229.50:9092,192.168.229.40:9092 --topic test

// consume message

kafka-console-consumer.sh --bootstrap-server 192.168.229.60:9092,192.168.229.50:9092,192.168.229.40:9092 --topic test --from-beginning
 
-------------------------------------------------------------------------------------
--from-beginning:会把主题中以往所有的数据都读取出来
------------------------------------------------------------------------------------- 

//Modify the number of partitions

	
kafka-topics.sh --zookeeper 192.168.229.60:2181,192.168.229.50:2181,192.168.229.40:2181 --alter --topic test --partitions 6 

//delete topic

kafka-topics.sh --delete --zookeeper 192.168.229.60:2181,192.168.229.50:2181,192.168.229.40:2181 --topic test

2. Instance operation: deploy zookeeper + kafka cluster

1.1 Install the zookeeper cluster
See the above for details, then continue with the above experiment (operate on all cluster servers)
1.2 Download the installation package and install kafka
insert image description here
insert image description here
1.3 Modify the configuration file
insert image description here
insert image description here
1.4 Modify the environment variable
insert image description here
1.5 Configure the Zookeeper startup script and set the startup to start, and start kafka
insert image description here
1.6 Kafka command line operations
Create a topic and view
insert image description here
it Publish a message and read it
insert image description here
Modify the number of partitions and delete a topicDelete
insert image description here
a topic
insert image description here

5. Deploy Filebeat+Kafka+ELK

1. Operation steps of deploying Filebeat+Kafka+ELK

1.1. Deploy Zookeeper+Kafka cluster
See above, then do the above experiment

1.2. Deploy Filebeat

cd /usr/local/filebeat
 
vim filebeat.yml
filebeat.prospectors:
- type: log
enabled: true
paths:
- /var/log/messages
- /var/log/*.log
......
#添加输出到 Kafka 的配置
output.kafka:
enabled: true
hosts: ["192.168.229.60:9092","192.168.229.50:9092","192.168.229.40:9092"] #指定 Kafka 集群配置
topic: "filebeat_test" #指定 Kafka 的 topic
 
#启动 filebeat
./filebeat -e -c filebeat.yml  

1.3. Deploy ELK and create a new Logstash configuration file on the node where the Logstash component is located

cd /etc/logstash/conf.d/
 
vim filebeat.conf
input {
kafka {
bootstrap_servers => "192.168.229.60:9092,192.168.229.50:9092,192.168.229.40:9092"
topics => "filebeat_test"
group_id => "test123"
auto_offset_reset => "earliest"
}
}
 
output {
elasticsearch {
hosts => ["192.168.229.70:9200"]
index => "filebeat_test-%{+YYYY.MM.dd}"
}
stdout {
codec => rubydebug
}
}  

1.4 start logstash

logstash -f filebeat.conf

1.5 Browser access test
Browser access http://192.168.80.90:5601 Log in to Kibana, click the "Create Index Pattern" button to add the index "filebeat_test-*", click the "create" button to create, click the "Discover" button You can view chart information and log information.

2. Operation steps of deploying Filebeat+Kafka+ELK

Environment preparation (server built before)
insert image description here
2.1. Deploy Zookeeper+Kafka cluster
See above, then do the above experiment

2.2. Deploy Filebeat (operated on Node1 node)
to build ELK, see the previous blog for details
insert image description here
insert image description here
2.3. Deploy ELK, and create a new Logstash configuration file on the node where the Logstash component is located (operated on the apache node)
insert image description hereinsert image description here

2.5 Browser Access Test
insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/weixin_59325762/article/details/130047717