Super detailed Zookeeper+Kafka+ELK cluster deployment

1. Zookeeper Overview

1. Zookeeper definition

Zookeeper is an open source distributed Apache project that provides coordination services for distributed frameworks.

2. Zookeeper working mechanism

Zookeeper is understood from the perspective of design pattern: it is a distributed service management framework designed based on the observer pattern. It is responsible for storing and managing data that everyone cares about, and then accepts the registration of observers. Once the status of these data changes, Zookeeper Will be responsible for notifying those observers registered on Zookeeper to react accordingly. In other words, Zookeeper = file system + notification mechanism.

3. Zookeeper features

(1) Zookeeper: a leader and a cluster of followers.

(2) As long as more than half of the nodes in the Zookeeper cluster survive, the Zookeeper cluster can serve normally. Therefore, Zookeeper is suitable for installing an odd number of servers.

(3) Global data consistency: Each server saves a copy of the same data. No matter which server the client connects to, the data is consistent.

(4) Update requests are executed sequentially. Update requests from the same Client are executed sequentially in the order they are sent, that is, first in, first out.

(5) Data update atomicity, a data update either succeeds or fails.

(6) Real-time, within a certain time range, the Client can read the latest data.

4. Zookeeper data structure

The structure of the ZooKeeper data model is very similar to the Linux file system. It can be regarded as a tree as a whole, and each node is called a ZNode. Each ZNode can store 1MB of data by default, and each ZNode can be uniquely identified by its path.

5. Zookeeper application scenarios

The services provided include: unified naming service, unified configuration management, unified cluster management, dynamic online and offline server nodes, soft load balancing, etc.

●Unified naming service

In a distributed environment, it is often necessary to name applications/services uniformly for easy identification. For example: IP is not easy to remember, but domain name is easy to remember.

●Unified configuration management

(1) In a distributed environment, configuration file synchronization is very common. It is generally required that the configuration information of all nodes in a cluster is consistent, such as a Kafka cluster. After modifying the configuration file, I hope it can be quickly synchronized to each node.

(2) Configuration management can be implemented by ZooKeeper. Configuration information can be written to a Znode on ZooKeeper. Each client server listens to this Znode. Once the data in the Znode is modified, ZooKeeper will notify each client server.

●Unified cluster management

(1) In a distributed environment, it is necessary to grasp the status of each node in real time. Some adjustments can be made based on the real-time status of the node.

(2) ZooKeeper can monitor node status changes in real time. Node information can be written to a ZNode on ZooKeeper. Monitor this ZNode to obtain its real-time status changes.

●Server dynamic online and offline

The client can gain real-time insight into server online and offline changes.

●Soft load balancing

Record the number of visits to each server in Zookeeper, and let the server with the least number of visits handle the latest client request.

6. Zookeeper election mechanism

●Start the election mechanism for the first time

(1) Server 1 starts and initiates an election. Server 1 casts its vote. At this time, server 1 has one vote, which is not enough for more than half (3 votes), the election cannot be completed, and the status of server 1 remains LOOKING;

(2) Server 2 starts and initiates another election. Servers 1 and 2 each vote for themselves and exchange vote information: At this time, server 1 finds that the myid of server 2 is larger than the one it is currently voting for (server 1), and changes the vote to recommend server 2. At this time, server 1 has 0 votes, server 2 has 2 votes, and there is no more than half of the results. The election cannot be completed, and the status of servers 1 and 2 remains LOOKING.

(3) Server 3 starts and initiates an election. At this time, both servers 1 and 2 will change the votes to server 3. The results of this vote: Server 1 has 0 votes, Server 2 has 0 votes, and Server 3 has 3 votes. At this time, server 3 has more than half of the votes, and server 3 is elected leader. Servers 1 and 2 change the status to FOLLOWING, and server 3 changes the status to LEADING;

(4) Server 4 starts and initiates an election. At this time, servers 1, 2, and 3 are no longer in the LOOKING state and will not change the voting information. The result of exchanging vote information: Server 3 has 3 votes and Server 4 has 1 vote. At this time, server 4 obeys the majority, changes the voting information to server 3, and changes the status to FOLLOWING;

(5) Server 5 starts and acts as the younger brother like server 4.

●This is not the first time to start the election mechanism

(1) When a server in the ZooKeeper cluster encounters one of the following two situations, it will begin to enter Leader election:

1) Server initialization starts.

2) The server cannot maintain a connection with the Leader while it is running.

(2) When a machine enters the Leader election process, the current cluster may also be in the following two states:

1) There is already a Leader in the cluster.

For situations where a leader already exists, when the machine attempts to elect a leader, it will be informed of the leader information of the current server. For this machine, it only needs to establish a connection with the leader machine and synchronize the status.

2) There is indeed no Leader in the cluster.

Assume that ZooKeeper consists of 5 servers, with SIDs 1, 2, 3, 4, and 5, and ZXIDs 8, 8, 8, 7, and 7, and the server with SID 3 is the Leader. At some point, servers 3 and 5 failed, so Leader election began.

Leader election rules:

1. The one with the bigger EPOCH wins directly.

2. EPOCH is the same, the one with the larger transaction ID wins.

3. If the transaction IDs are the same, the one with the larger server ID wins.

-------------------------------------------------- ----------------------------------
SID: Server ID. Used to uniquely identify a machine in the ZooKeeper cluster. Each machine cannot be repeated and is consistent with myid.

ZXID: Transaction ID. ZXID is a transaction ID used to identify a change in server status. At a certain moment, the ZXID value of each machine in the cluster may not be exactly the same. This is related to the logical speed of the ZooKeeper server's processing of the client's "update request".

Epoch: The code name of each Leader term. When there is no leader, the logical clock value in the same round of voting is the same. This data will increase every time a vote is cast.

2. Deploy Zookeeper cluster

1. Steps to deploy a Zookeeper cluster

Prepare 3 servers for Zookeeper cluster

192.168.229.33
192.168.229.77
192.168.229.99

1.1 Preparation before installation

//Turn off the firewall

systemctl stop firewalld
systemctl disable firewalld
setenforce 0  

//Install JDK

yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel
java -version  

//Download the installation package.
Official download address: Index of /dist/zookeeper

cd /opt
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.5.7/apache-zookeeper-3.5.7-bin.tar.gz  

1.2. Install Zookeeper

cd /opt
tar -zxvf apache-zookeeper-3.5.7-bin.tar.gz
mv apache-zookeeper-3.5.7-bin /usr/local/zookeeper-3.5.7  

1.3 Modify configuration file

cd /usr/local/zookeeper-3.5.7/conf/
cp zoo_sample.cfg zoo.cfg
 
vim zoo.cfg
tickTime=2000 #通信心跳时间,Zookeeper服务器与客户端心跳时间,单位毫秒
initLimit=10 #Leader和Follower初始连接时能容忍的最多心跳数(tickTime的数量),这里表示为10*2s
syncLimit=5 #Leader和Follower之间同步通信的超时时间,这里表示如果超过5*2s,Leader认为Follwer死掉,并从服务器列表中删除Follwer
dataDir=/usr/local/zookeeper-3.5.7/data ●修改,指定保存Zookeeper中的数据的目录,目录需要单独创建
dataLogDir=/usr/local/zookeeper-3.5.7/logs ●添加,指定存放日志的目录,目录需要单独创建
clientPort=2181 #客户端连接端口
#添加集群信息
server.1=192.168.2.33:3188:3288
server.2=192.168.2.77:3188:3288
server.3=192.168.2.99:3188:3288 

server.A=B:C:D
●A is a number, indicating which server number this is. In cluster mode, you need to create a file myid in the directory specified by dataDir in zoo.cfg. There is a data in this file which is the value of A. Zookeeper reads this file when it starts and gets the data inside and the configuration in zoo.cfg. The information is compared to determine which server it is.
●B is the address of this server.
●C is the port through which this server Follower exchanges information with the Leader server in the cluster.
●D is if the Leader server in the cluster hangs up, a port is needed to re-elect and select a new Leader, and this port is the port used by the servers to communicate with each other during the election.
-------------------------------------------------- ----------------------------------

1.4 Copy the configured Zookeeper configuration file to other machines

scp /usr/local/zookeeper-3.5.7/conf/zoo.cfg 192.168.229.50:/usr/local/zookeeper-3.5.7/conf/
scp /usr/local/zookeeper-3.5.7/conf/zoo.cfg 192.168.229.40:/usr/local/zookeeper-3.5.7/conf/

1.5 Create data directory and log directory on each node

mkdir /usr/local/zookeeper-3.5.7/data
mkdir /usr/local/zookeeper-3.5.7/logs  

1.6 Create a myid file in the directory specified by dataDir of each node

echo 1 > /usr/local/zookeeper-3.5.7/data/myid
echo 2 > /usr/local/zookeeper-3.5.7/data/myid
echo 3 > /usr/local/zookeeper-3.5.7/data/myid  

1.7 Configure Zookeeper startup script

vim /etc/init.d/zookeeper
#!/bin/bash
#chkconfig:2345 20 90
#description:Zookeeper Service Control Script
ZK_HOME='/usr/local/zookeeper-3.5.7'
case $1 in
start)
echo "---------- zookeeper 启动 ------------"
$ZK_HOME/bin/zkServer.sh start
;;
stop)
echo "---------- zookeeper 停止 ------------"
$ZK_HOME/bin/zkServer.sh stop
;;
restart)
echo "---------- zookeeper 重启 ------------"
$ZK_HOME/bin/zkServer.sh restart
;;
status)
echo "---------- zookeeper 状态 ------------"
$ZK_HOME/bin/zkServer.sh status
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
esac   

1.8 Set up auto-start at boot

chmod +x /etc/init.d/zookeeper
chkconfig --add zookeeper

1.9 Start Zookeeper separately

service zookeeper start   

1.10 View current status

service zookeeper status

2. Instance operation: Deploy Zookeeper cluster

2.1 Preparation before installation

2.2. Install Zookeeper

2.3 Modify configuration file

1.4 Copy the configured Zookeeper configuration file to other machines

1.5 Create data directory and log directory on each node

1.6 Create a myid file in the directory specified by dataDir of each node

1.7 Configure Zookeeper startup script

1.8 Set up auto-start at boot and start the service

1.9 View current status

One leader, two followers

3. Kafka Overview

1. Why do you need Message Queuing (MQ)?

The main reason is that in a high-concurrency environment, synchronization requests are not processed in time, and requests are often blocked. For example, a large number of requests concurrently access the database, resulting in row locks and table locks. Finally, too many request threads will accumulate, triggering too many connection errors and triggering an avalanche effect.

We use message queues to handle requests asynchronously to relieve the pressure on the system. Message queues are often used in asynchronous processing, traffic peak shaving, application decoupling, message communication and other scenarios.

Currently, the more common MQ middlewares include ActiveMQ, RabbitMQ, RocketMQ, Kafka, etc.

2. Benefits of using message queues

(1) Decoupling

Allows you to extend or modify the processes on both sides independently, as long as they adhere to the same interface constraints.

(2) Restorability

When one component of the system fails, it does not affect the entire system. The message queue reduces the coupling between processes, so even if a process processing messages hangs up, the messages added to the queue can still be processed after the system recovers.

(3) Buffering

It helps to control and optimize the speed of data flow through the system, and solve the problem of inconsistent processing speeds of production messages and consumer messages.

(4) Flexibility & peak processing capability

The application still needs to continue to function despite a surge in traffic, but such bursts of traffic are uncommon. It would be a huge waste to invest resources in being on standby based on the standard of being able to handle such peak visits. Using message queues can enable key components to withstand sudden access pressure without completely collapsing due to sudden overloaded requests.

(5) Asynchronous communication

Many times, users don't want or need to process messages immediately. Message queue provides an asynchronous processing mechanism that allows users to put a message into the queue but not process it immediately. Put as many messages as you want into the queue and process them when needed.

3. Two modes of message queue

(1) Point-to-point mode (one-to-one, consumers actively pull data, and messages are cleared after they are received)

The message producer produces the message and sends it to the message queue, and then the message consumer takes it out of the message queue and consumes the message. After the message is consumed, there is no longer storage in the message queue, so it is impossible for the message consumer to consume the message that has been consumed. The message queue supports the existence of multiple consumers, but for a message, only one consumer can consume it.

(2) Publish/subscribe mode (one-to-many, also called observer mode, the consumer will not clear the message after consuming the data)

A message producer (publisher) publishes a message to a topic, and multiple message consumers (subscriptions) consume the message at the same time. Unlike the point-to-point method, messages published to a topic will be consumed by all subscribers.

The publish/subscribe pattern defines a one-to-many dependency relationship between objects, so that whenever the state of an object (target object) changes, all objects that depend on it (observer objects) will be notified and automatically updated.

4. Kafka definition

Kafka is a distributed message queue (MQ, Message Queue) based on the publish/subscribe model, which is mainly used in the field of real-time processing of big data.

5. Introduction to Kafka

Kafka was originally developed by Linkedin Company. It is a distributed message middleware system that supports partitions, multiple replicas, and is coordinated by Zookeeper. Its biggest feature is that it can process large amounts of data in real time. To meet various demand scenarios, such as hadoop-based batch processing system, low-latency real-time system, Spark/Flink streaming engine, nginx access log, message service, etc., written in scala language,

Linkedin was contributed to the Apache Foundation in 2010 and became a top open source project.

6. Features of Kafka

●High throughput, low latency

Kafka can process hundreds of thousands of messages per second, and its latency is as low as a few milliseconds. Each topic can be divided into multiple Partitions, and the Consumer Group performs consumption operations on Partitions to improve load balancing capabilities and consumption capabilities.

●Scalability

Kafka cluster supports hot expansion

●Durability and reliability

Messages are persisted to local disk, and data backup is supported to prevent data loss.

●Fault tolerance

Allow node failures in the cluster (in the case of multiple copies, if the number of copies is n, n-1 nodes are allowed to fail)

●High concurrency

Support thousands of clients reading and writing at the same time

7. Kafka system architecture

(1)Broker

A kafka server is a broker. A cluster consists of multiple brokers. A broker can accommodate multiple topics.

(2)Topic

It can be understood as a queue, and both producers and consumers are oriented to the same topic.

Similar to the table name of the database or the index of ES

Messages from physically different topics are stored separately.

(3)Partition

In order to achieve scalability, a very large topic can be distributed to multiple brokers (ie servers). A topic can be divided into one or more partitions, and each partition is an ordered queue. Kafka only guarantees that the records within the partition are ordered, but does not guarantee the order of different partitions in the topic.

Each topic has at least one partition. When the producer generates data, it will select the partition according to the distribution strategy, and then append the message to the end of the queue of the specified partition.

Partation data routing rules:

  • If patition is specified, use it directly;
  • If the patition is not specified but the key is specified (equivalent to an attribute in the message), a patition is selected by hashing the value of the key modulo;
  • Neither the patition nor the key is specified, and polling is used to select a patition.

Each message will have a self-increasing number used to identify the offset of the message, and the identification sequence starts from 0.

The data in each partition is stored using multiple segment files.

If a topic has multiple partitions, the order of data cannot be guaranteed when consuming data. In scenarios where the order of message consumption is strictly guaranteed (such as product flash sales and red envelope grabbing), the number of partitions needs to be set to 1.

●Broker stores topic data. If a topic has N partitions and the cluster has N brokers, then each broker stores one partition of the topic.
●If a topic has N partitions and the cluster has (N+M) brokers, then N brokers store a partition of the topic, and the remaining M brokers do not store partition data of the topic.
●If a topic has N partitions and the number of brokers in the cluster is less than N, then one broker stores one or more partitions of the topic. In actual production environments, try to avoid this situation, which can easily lead to Kafka cluster data imbalance.

Reason for partitioning

●It is convenient to expand in the cluster. Each Partition can be adjusted to adapt to the machine where it is located, and a topic can be composed of multiple Partitions, so the entire cluster can adapt to data of any size;

●Concurrency can be improved because it can be read and written in Partition units.

(4)Replica

Replica. In order to ensure that when a node in the cluster fails, the partition data on the node is not lost and Kafka can still continue to work. Kafka provides a replica mechanism. Each partition of a topic has several replicas and a leader. and several followers.

(5)Leader

Each partition has multiple copies, of which one and only one serves as the Leader. The Leader is the partition currently responsible for reading and writing data.

(6)Follower

Follower follows Leader, all write requests are routed through Leader, data changes are broadcast to all Followers, and Followers keep data synchronized with Leader. Follower is only responsible for backup and not responsible for reading and writing data.
If the Leader fails, a new Leader is elected from the Followers.
When a Follower hangs up, gets stuck, or synchronizes too slowly, the Leader will delete the Follower from the ISR (a Follower collection maintained by the Leader that is synchronized with the Leader) list and create a new Follower.

(7)Producer

The producer is the publisher of data. This role publishes messages to the Kafka topic.

After the broker receives the message sent by the producer, the broker appends the message to the segment file currently used to append data.

Messages sent by the producer are stored in a partition. The producer can also specify the partition for data storage.

(8)Consumer

Consumers can read data from the broker. Consumers can consume data from multiple topics.

(9)Consumer Group(CG)

A consumer group consists of multiple consumers.

All consumers belong to a certain consumer group, that is, the consumer group is a logical subscriber. A group name can be specified for each consumer. If no group name is specified, it will belong to the default group.

Bringing multiple consumers together to process the data of a certain Topic can quickly improve data consumption capabilities.

Each consumer in the consumer group is responsible for consuming data from different partitions. A partition can only be consumed by consumers in one group to prevent data from being read repeatedly.

Consumer groups have no influence on each other.

(10) offset offset

Can uniquely identify a message.

The offset determines the location of the read data, and there will be no thread safety issues. The consumer uses the offset to determine the message to be read next (ie, the consumption location).

After the message is consumed, it is not deleted immediately, so that multiple businesses can reuse Kafka messages.

A certain business can also re-read messages by modifying the offset, which is controlled by the user.

Messages will eventually be deleted, and the default life cycle is 1 week (7*24 hours).

(11)Zookeeper

Kafka uses Zookeeper to store meta information of the cluster.

Since the consumer may experience power outages and other failures during the consumption process, after the consumer recovers, it needs to continue consuming from the position before the failure. Therefore, the consumer needs to record in real time which offset it has consumed so that it can continue consuming after the fault is restored.

Before Kafka version 0.9, the consumer saves offsets in Zookeeper by default; starting from version 0.9, the consumer saves offsets in a built-in topic of Kafka by default, which is __consumer_offsets.

4. Deploy zookeeper + kafka cluster

1. Deploy zookeeper + kafka cluster

1.1 Download the installation package

Official download address: Apache Kafka

cd /opt
wget https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/2.7.1/kafka_2.13-2.7.1.tgz 

1.2. Install Kafka

cd /opt/
tar zxvf kafka_2.13-2.7.1.tgz
mv kafka_2.13-2.7.1 /usr/local/kafka 

1.3 Modify configuration file

cd /usr/local/kafka/config/
cp server.properties{,.bak}
 
vim server.properties
broker.id=0                                    #21行,broker的全局唯一编号,每个broker不能重复,因此要在其他机器上配置 broker.id=1、broker.id=2
listeners=PLAINTEXT://192.168.2.33:9092       #31行,指定监听的IP和端口,如果修改每个broker的IP需区分开来,也可保持默认配置不用修改
num.network.threads=3                          #42行,broker 处理网络请求的线程数量,一般情况下不需要去修改
num.io.threads=8                               #45行,用来处理磁盘IO的线程数量,数值应该大于硬盘数
socket.send.buffer.bytes=102400                #48行,发送套接字的缓冲区大小
socket.receive.buffer.bytes=102400             #51行,接收套接字的缓冲区大小
socket.request.max.bytes=104857600             #54行,请求套接字的缓冲区大小
log.dirs=/usr/local/kafka/logs                 #60行,kafka运行日志存放的路径,也是数据存放的路径
num.partitions=1                               #65行,topic在当前broker上的默认分区个数,会被topic创建时的指定参数覆盖
num.recovery.threads.per.data.dir=1            #69行,用来恢复和清理data下数据的线程数量
log.retention.hours=168                        #103行,segment文件(数据文件)保留的最长时间,单位为小时,默认为7天,超时将被删除
log.segment.bytes=1073741824                   #110行,一个segment文件最大的大小,默认为 1G,超出将新建一个新的segment文件
zookeeper.connect=192.168.2.33:2181,192.168.2.77:2181,192.168.2.99:2181                                   #123行,配置连接Zookeeper集群地址

1.4 Modify environment variables

vim /etc/profile
export KAFKA_HOME=/usr/local/kafka
export PATH=$PATH:$KAFKA_HOME/bin
 
source /etc/profile 

1.5 Configure Zookeeper startup script

vim /etc/init.d/kafka
#!/bin/bash
#chkconfig:2345 22 88
#description:Kafka Service Control Script
KAFKA_HOME='/usr/local/kafka'
case $1 in
start)
echo "---------- Kafka 启动 ------------"
${KAFKA_HOME}/bin/kafka-server-start.sh -daemon ${KAFKA_HOME}/config/server.properties
;;
stop)
echo "---------- Kafka 停止 ------------"
${KAFKA_HOME}/bin/kafka-server-stop.sh
;;
restart)
$0 stop
$0 start
;;
status)
echo "---------- Kafka 状态 ------------"
count=$(ps -ef | grep kafka | egrep -cv "grep|$$")
if [ "$count" -eq 0 ];then
echo "kafka is not running"
else
echo "kafka is running"
fi
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
esac  

1.6 Set up auto-start at boot

chmod +x /etc/init.d/kafka
chkconfig --add kafka 

1.7 Start Kafka separately

service kafka start  

1.8 Kafka command line operations

//Create topic

kafka-topics.sh --create --zookeeper 192.168.2.33:2181,192.168.23.77:2181,192.168.2.99:2181 --replication-factor 2 --partitions 3 --topic test
 
-------------------------------------------------------------------------------------
--zookeeper:定义 zookeeper 集群服务器地址,如果有多个 IP 地址使用逗号分割,一般使用一个 IP 即可
--replication-factor:定义分区副本数,1 代表单副本,建议为 2
--partitions:定义分区数
--topic:定义 topic 名称
------------------------------------ 

//View all topics in the current server

kafka-topics.sh --list --zookeeper 192.168.2.33:2181,192.168.2.77:2181,192.168.2.99:2181 

//View details of a topic

kafka-topics.sh --describe --zookeeper 192.168.2.33:2181,192.168.2.77:2181,192.168.2.99:2181  

//make an announcement

kafka-console-producer.sh --broker-list 192.168.2.33:9092,192.168.2.77:9092,192.168.2.99:9092 --topic test  

//Consume messages

kafka-console-consumer.sh --bootstrap-server 192.168.2.33:9092,192.168.2.77:9092,192.168.2.99:9092 --topic test --from-beginning
 
-------------------------------------------------------------------------------------
--from-beginning:会把主题中以往所有的数据都读取出来
-------------------------------------------------------------------------------------  

//Modify the number of partitions

kafka-topics.sh --zookeeper 192.168.2.33:2181,192.168.2.77:2181,192.168.2.99:2181 --alter --topic test  --partitions 6  

//Delete topic

kafka-topics.sh --delete --zookeeper 192.168.2.33:2181,192.168.2.77:2181,192.168.2.99:2181 --topic  test

2. Instance operation: deploy zookeeper + kafka cluster

1.1 Install zookeeper cluster

See the above article in this blog for details, and continue with the above experiment (operating on all cluster servers)

1.2 Download the installation package and install kafka

1.3 Modify configuration file

1.4 Modify environment variables

1.5 Configure kafka startup script and set auto-start at boot to start kafka

1.6 Kafka command line operations

Create a topic and view it

Post and read messages

Modify the number of partitions

Delete topic

5. Deploy Filebeat+Kafka+ELK

1. Operation steps for deploying Filebeat+Kafka+ELK

1.1. Deploy Zookeeper+Kafka cluster

See above, followed by the above experiment

1.2. Deploy Filebeat

To build ELK, please see the previous blog for details

cd /usr/local/filebeat
 
vim filebeat.yml
filebeat.prospectors:
- type: log
enabled: true
paths:
- /var/log/messages
- /var/log/*.log
......
#添加输出到 Kafka 的配置
output.kafka:
enabled: true
hosts: ["192.168.2。33:9092","192.168.2.77:9092","192.168.2.99:9092"] #指定 Kafka 集群配置
topic: "filebeat_test" #指定 Kafka 的 topic
 
#启动 filebeat
./filebeat -e -c filebeat.yml  

1.3. Deploy ELK and create a new Logstash configuration file on the node where the Logstash component is located.

cd /etc/logstash/conf.d/
 
vim filebeat.conf
input {
kafka {
bootstrap_servers => "192.168.229.60:9092,192.168.229.50:9092,192.168.229.40:9092"
topics => "filebeat_test"
group_id => "test123"
auto_offset_reset => "earliest"
}
}
 
output {
elasticsearch {
hosts => ["192.168.229.70:9200"]
index => "filebeat_test-%{+YYYY.MM.dd}"
}
stdout {
codec => rubydebug
}
}  

1.4 Start logstash

logstash -f filebeat.conf  

1.5 Browser access test

Visit http://192.168.2.22:5601 with the browser   to log in to Kibana, click the "Create Index Pattern" button to add the index "filebeat_test-*", click the "create" button to create, and click the "Discover" button to view chart information and Log information.

Guess you like

Origin blog.csdn.net/weixin_69148277/article/details/130923352