[Elastic (ELK) Stack Practical Tutorial] 10. ELK Architecture Upgrade - Introducing Message Queue Redis, Kafka

Table of contents

1. Problems faced by ELK architecture

1.1 Too high coupling

1.2 Performance bottleneck

2. Practice of connecting ELK to Redis

2.1 Configure Redis

2.1.1 Install Redis

2.1.2 Configuring Redis

2.1.3 Start Redis

2.2 Configure Filebeat 

2.3 Configuring Logstash

2.4 Data Consumption

2.5 configure kibana

3. Basic overview of message queue

3.1 What is a message queue

3.2 Classification of message queues

3.3 Message queue usage scenarios

3.3.1 Decoupling

3.3.2 Asynchronous

3.3.3 Clipping

4. Kafka overview and cluster deployment

4.1 Kafka cluster installation

4.2 Zookeeper cluster installation 

Five, Kafka-eagle graphical interface installation

5.1 Install JDK 

5.2 Install Kafka-eagle

5.3 Deployment Kafka-eagle

5.4 Start Kafka-eagle

5.5 Enable eagle monitoring

5.6 Access Kafka-eagle

5.7 Small pits encountered 

6. Connect ELK to Kafka

6.1 Configure Filebeat 

6.2 Configuring Logstash

6.3 Configure kibana


1. Problems faced by ELK architecture

1.1 Too high coupling

        Scenario description: Assuming that the current system log output is very frequent, about 5Gb in ten minutes, then 30Gb in one hour; and the storage space of the application server is generally 40Gb by default, so the application server logs are usually rotated on an hourly basis. If our Logstash fails for 1 hour, Filebeat cannot send logs to Logstash, but our application server will cut the log every hour, which means that we will lose 1 hour of log data.

        Solution: Use a message queue. As long as your filebeat can collect logs and the queue can store data for a long enough time, don’t worry if logstash fails later. After Logstash is repaired, logs can still be written normally without causing any problems. The data is lost, so that the decoupling is completed.

1.2 Performance bottleneck

        Scenario Description: Use filebeat or logstash to directly write to ES, and if logs are frequently written to ES, it may cause ES to timeout or be lost. Because ES needs to process data and store data, the performance will become very slow.

        Solution: use the message queue, filebeat or Logstash can directly write to the message queue, because the queue can act as a buffer, and finally our logstash consumes data according to the processing capacity of ES, and writes to the ES cluster at a uniform speed, which can Effectively alleviate the bottleneck of ES write performance.

2. Practice of connecting ELK to Redis

Use Redis as a message queue service:

2.1 Configure Redis

2.1.1 Install Redis

Use binary installation Redis in the production environment:  CentOS 7 detailed installation Redis 6 graphic tutorial_What environment dependencies are required for centos 7 installation redis6_Stars.Sky's Blog-CSDN Blog

In this experimental environment, we use yum to install more conveniently and quickly:

[root@es-node2 ~]# yum install -y redis

2.1.2 Configuring Redis

[root@es-node2 ~]# vim /etc/redis.conf 
bind 0.0.0.0
requirepass Qwe123456

2.1.3 Start Redis

[root@es-node2 ~]# systemctl enable --now redis

2.2 Configure Filebeat 

[root@se-node3 ~]# vim /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log                    # 收集日志的类型
  enabled: true                # 启用日志收集
  paths:
    - /var/log/nginx/access.log          # 日志所在路径
  tags: ["access"]

- type: log                    # 收集日志的类型
  enabled: true                # 启用日志收集
  paths:
    - /var/log/nginx/error.log          # 日志所在路径
  tags: ["error"]

output.redis:
  hosts: ["192.168.170.133:6379"]    # redis地址
  password: "Qwe123456"                              #redis密码
  timeout: 5                                    #连接超时时间
  db: 0                                                 #写入db0库中
  keys:                                                 #存储的key名称
    - key: "nginx_access"
      when.contains:
        tags: "access"
    - key: "nginx_error"
      when.contains:
        tags: "error"

[root@se-node3 ~]# systemctl restart filebeat.service 

2.3 Configuring Logstash

[root@es-node1 ~]# vim /etc/logstash/conf.d/test6.conf 
input {
       redis {
                host => ["192.168.170.133"]
                port => "6379"
                password => "Qwe123456"
                data_type => "list"
                key => "nginx_access"
                db => "0"
        }

        redis {
                host => ["192.168.170.133"]
                port => "6379"
                password => "Qwe123456"
                data_type => "list"
                key => "nginx_error"
                db => "0"
        }
}

filter {
    if "access" in [tags][0] {
        grok {
            match => { "message" => "%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] \"%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:hostname} (?:%{QS:referrer}|-) (?:%{NOTSPACE:post_args}|-) %{QS:useragent} (?:%{QS:x_forward_for}|-) (?:%{URIHOST:upstream_host}|-) (?:%{NUMBER:upstream_response_code}|-) (?:%{NUMBER:upstream_response_time}|-) (?:%{NUMBER:response_time}|-)" }
        }
        
        useragent {
            source => "useragent"
            target => "useragent"
        }
        
        geoip {
            source => "clientip"
        }
        
        date {
            match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
            target => "@timestamp"
            timezone => "Asia/Shanghai"
        }
        
        mutate {
            convert => ["bytes","integer"]
            convert => ["response_time", "float"]
            convert => ["upstream_response_time", "float"]
            remove_field => ["message"]
            add_field => { "target_index" => "redis-logstash-nginx-access-%{+YYYY.MM.dd}" }	   
	}

        # 提取 referrer 具体的域名 /^"http/
        if [referrer] =~ /^"http/ {
            grok {
                match => { "referrer" => '%{URIPROTO}://%{URIHOST:referrer_host}' }
            }
        }
    
        # 提取用户请求资源类型以及资源 ID 编号
        if "sky.com" in [referrer_host] {
            grok {
                match => { "referrer" => '%{URIPROTO}://%{URIHOST}/(%{NOTSPACE:sky_type}/%{NOTSPACE:sky_res_id})?"' }
            }
        }
	}

    else if "error" in [tags][0] {
        date {
            match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
                target => "@timestamp"
                timezone => "Asia/Shanghai"
        }
        mutate {
            add_field => { "target_index" => "redis-logstash-nginx-error-%{+YYYY.MM.dd}" }
        }
    }
}

output {
	stdout {
		codec => rubydebug
	}

    elasticsearch {
        hosts => ["192.168.170.132:9200","192.168.170.133:9200","192.168.170.134:9200"]
        index => "%{[target_index]}"
        template_overwrite => true
    }
}

[root@es-node1 ~]# logstash -f /etc/logstash/conf.d/test6.conf -r 

2.4 Data Consumption

        In the scenario described above, the log file data collected by Filebeat will be stored in Redis. Next, Logstash fetches the data from Redis and transfers it to Elasticsearch. This is a pipelined process where data is consumed as it flows.

        Redis acts as an intermediate storage. When Logstash successfully reads data from Redis and transfers it to Elasticsearch, Logstash deletes the data from Redis. This is because of the use of in your config file data_type => "list", which means that when Logstash fetches data from Redis, LPOPit RPOPpops the data off the list using a command like or . In this way, the data in Redis will be continuously consumed, so you may not see the data when you use keys *the command to query.

        If you want to check whether there is data coming into Redis, you can query while Filebeat is sending data to Redis. However, be aware that while Logstash is consuming data, that data is likely to be quickly deleted from Redis. So, you may need to adjust the data sending rate between Filebeat and Logstash to view the data in Redis. However, this approach is not recommended for long-term monitoring of Redis data, as it may affect the performance of the entire pipeline.

2.5 configure kibana

Create kibana index:

3. Basic overview of message queue

3.1 What is a message queue

  • Message Message: For example, two devices transmit data, and any data transmitted can be called a message.

  • Queue Queue: It is a first-in, first-out data structure, similar to the mechanism of queuing to buy tickets.

The message queue MQ: is a container used to store messages; the message queue needs to provide two functional interfaces for external calls.

  • Producer Producer: Putting data into the message queue is called a producer.

  • Consumer Consumer: fetching data from the message queue is called a consumer.

3.2 Classification of message queues

MQ is mainly divided into two categories: point-to-point and publish/subscribe.

  • Point-to-point: message queue Queue, sender sender, receiver Receiver.

        A message produced by a producer can only have one consumer. Once the message is consumed, the message is no longer in the message queue. For example, when making a phone call, when the message is sent to the message queue, it can only be received by one receiver, and when the message is received, it will be destroyed.

  • Publish/subscribe: message queue Queue, publisher PubTisher, subscriber subscriber, topic Topic.

        Each message can have multiple consumers, independent of each other. For example, if I use the official account to publish an article, people who follow me can see it, that is, the message published to the message queue can be received by multiple receivers (subscribers).

3.3 Message queue usage scenarios

There are three main scenarios for message queues, which can be summarized in 6 words: decoupling, asynchronous, and peak shaving.

3.3.1 Decoupling

Scenario description: After the user places an order, the order system needs to notify the inventory system. Traditionally, the order system calls the interface of the inventory system.

Disadvantages of the traditional model:

  • If the inventory system cannot be accessed, the order will fail to reduce the inventory, resulting in the order failure;

  • The order system is coupled with the inventory system.

Middleware mode:

  • Order system: After the user places an order, the order system writes the message into the message queue and returns the user's order successfully.

  • Inventory system: Subscribe to the news of the order, obtain the order information, and the inventory system performs inventory operations according to the order information.

  • If: The inventory system is not working properly when placing an order. It does not affect the normal order placement, because after the order is placed, the order system writes to the message queue and no longer cares about other follow-up operations. Realize the application decoupling of the order system and the inventory system. Program decoupling.

3.3.2 Asynchronous

Scenario description: After the user registers, he needs to send a registration email and a registration SMS. After successfully writing the registration information into the database, send the registration email and then the registration SMS. After the above three tasks are all completed, return to the client.

Disadvantages of the traditional model: system performance (concurrency, throughput, response time) will have bottlenecks.

Middleware mode: It will not require business logic, and it will be processed asynchronously. The modified structure is as follows:

        According to the above agreement, the user's response time is equivalent to the time for the registration information to be written into the database, which is 50 milliseconds. After registering an email, sending a short message and writing it into the message queue, it returns directly. Therefore, the speed of writing to the message queue is very fast and can basically be ignored, so the user's response time may be 50ms or 55ms.

3.3.3 Clipping

Scenario Description: In seckill activities, usually due to excessive traffic, the traffic will increase sharply and the application will hang up.

Middleware mode:

  1. After the user's request is received by the server, it is first written into the message queue. If the message queue length exceeds the maximum limit, the user request will be discarded directly or the error page will be redirected.

  2. The seckill business can obtain message queue data according to its own processing capabilities, and then do subsequent processing. In this way, even if there are 8,000 requests, it will not cause the seckill business to crash.

4. Kafka overview and cluster deployment

PS: I installed kafka and Zookeeper on the original es-node1 and es-node3 two days ago.

4.1 Kafka cluster installation

You can check my article to learn about the installation and use of kafka and kafka clusters: [Kafka 3.x Primary] 01. Kafka Overview and Getting Started_Stars.Sky's Blog-CSDN Blog

4.2 Zookeeper cluster installation 

You can check my article to understand the installation and use of Zookeeper and zookeeper cluster: 

[Zookeeper Primary] 02. Zookeeper Cluster Deployment_Stars.Sky's Blog-CSDN Blog

Five, Kafka-eagle graphical interface installation

Official installation documentation: 2.Install on Linux/macOS - Kafka Eagle (kafka-eagle.org)

Kafka-eagle download address: Tags smartloli/kafka-eagle-bin GitHub

5.1 Install JDK 

You can check my article: Detailed process of deploying JDK+MySQL+Tomcat in Linux_Transplanting mysql+tomcat_Stars.Sky's Blog-CSDN Blog

5.2 Install Kafka-eagle

[root@es-node2 ~]# tar -zxvf kafka-eagle-bin-3.0.2.tar.gz -C /usr/local/
[root@es-node2 ~]# cd /usr/local/kafka-eagle-bin-3.0.2/
[root@es-node2 /usr/local/kafka-eagle-bin-3.0.2]# tar -zxvf efak-web-3.0.2-bin.tar.gz 

[root@es-node2 ~]# vim /etc/profile
export KE_HOME=/usr/local/kafka-eagle-bin-3.0.2/efak-web-3.0.2
export PATH=$KE_HOME/bin:$PATH
[root@es-node2 ~]# source /etc/profile

5.3 Deployment Kafka-eagle

[root@es-node2 ~]# vim /usr/local/kafka-eagle-bin-3.0.2/efak-web-3.0.2/conf/system-config.properties 
######################################
# 填写 zookeeper 集群环境信息,我们只有一套 zookeeper 集群,所以把 cluster2 注释掉
efak.zk.cluster.alias=cluster1
cluster1.zk.list=es-node1:2181,es-node3:2181/kafka
#cluster2.zk.list=xdn10:2181,xdn11:2181,xdn12:2181

######################################
# kafka sqlite jdbc driver address
######################################
# kafka sqlite 数据库地址(需要修改存储路径)
efak.driver=org.sqlite.JDBC
efak.url=jdbc:sqlite:/usr/local/kafka-eagle-bin-3.0.2/efak-web-3.0.2/db/ke.db
efak.username=root
efak.password=www.kafka-eagle.org

######################################
# kafka mysql jdbc driver address
######################################
# mysql 数据库地址(需要提前创建好 ke 库,咱不是有 mysql 的存储方式,所以这段内容注释掉)
#efak.driver=com.mysql.cj.jdbc.Driver
#efak.url=jdbc:mysql://127.0.0.1:3306/ke?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull
#efak.username=root
#efak.password=123456

5.4 Start Kafka-eagle

[root@es-node2 ~]# /usr/local/kafka-eagle-bin-3.0.2/efak-web-3.0.2/bin/ke.sh start

5.5 Enable eagle monitoring

        Obtain data through JMX, and monitor the performance of data visualization such as Kafka client, production end, number of messages, number of requests, and processing time.

# 开启 Kafka 的 JMX(所有 Kafka 集群节点都需要)
[root@es-node1 /opt/kafka]# vim /opt/kafka/bin/kafka-server-start.sh 
if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then
    export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G"
    export JMX_PORT="9999"
fi

# 重启 Kafka
[root@es-node1 /opt/kafka]# kf.sh stop
[root@es-node1 /opt/kafka]# kf.sh start

5.6 Access Kafka-eagle

http://192.168.170.133:8048

Click TV Dashboard in the list on the right: 

5.7 Small pits encountered 

If no information is monitored on the eagle dashboard, check the eagle error log:

[root@es-node2 ~]# cd /usr/local/kafka-eagle-bin-3.0.2/efak-web-3.0.2/logs/
[root@es-node2 /usr/local/kafka-eagle-bin-3.0.2/efak-web-3.0.2/logs]# tail -f error.log
[2023-04-11 15:17:00] KafkaServiceImpl.Thread-351 - ERROR - Get kafka consumer has error,msg is Failed create new KafkaAdminClient
 [2023-04-11 15:17:00] MetricsSubTask.Thread-351 - ERROR - Collector consumer topic data has error, msg is 
 java.lang.NullPointerException
	at org.smartloli.kafka.eagle.core.factory.KafkaServiceImpl.getKafkaConsumer(KafkaServiceImpl.java:749)
	at org.smartloli.kafka.eagle.web.quartz.MetricsSubTask.bscreenConsumerTopicStats(MetricsSubTask.java:113)
	at org.smartloli.kafka.eagle.web.quartz.MetricsSubTask.metricsConsumerTopicQuartz(MetricsSubTask.java:73)
	at org.smartloli.kafka.eagle.web.quartz.MetricsSubTask.run(MetricsSubTask.java:68)

Solution: Make sure zookeeper.connect=192.168.170.132:2181,192.168.170.134:2181/kafka in your own kafka configuration file and cluster1.zk.list=192.168.170.132:2181,192.168.170.134:218 in eagle configuration file /kafka is consistent, and then restart eagle.

6. Connect ELK to Kafka

6.1 Configure Filebeat 

[root@es-node3 ~]# vim /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log                    # 收集日志的类型
  enabled: true                # 启用日志收集
  paths:
    - /var/log/nginx/access.log          # 日志所在路径
  tags: ["access"]

- type: log                    # 收集日志的类型
  enabled: true                # 启用日志收集
  paths:
    - /var/log/nginx/error.log          # 日志所在路径
  tags: ["error"]

output.kafka:
  hosts: ["192.168.170.132:9092", "192.168.170.134:9092"]
  topic: nginx_kafka_prod
  required_acks: 1              # 保证消息可靠,0不保证,1等待写入主分区(默认),-1等待写入副本分区
  compression: gzip             # 压缩
  max_message_bytes: 10000      # 每条消息最大的长度,多余的被删除

[root@es-node3 ~]# systemctl restart filebeat.service 

6.2 Configuring Logstash

[root@es-node1 ~]# vim /etc/logstash/conf.d/test6.conf 
input {
    kafka {
        bootstrap_servers => "192.168.170.132:9092,192.168.170.134:9092"
        topics => ["nginx_kafka_prod"]  # topic 名称
        group_id => "logstash"          # 消费者组名称
        client_id => "node1"            # 消费者组实例名称
        consumer_threads => "2"         # 理想情况下,您应该拥有与分区数一样多的线程,以实现完美的平衡,线程多于分区意味着某些线程将处于空闲状态
        #topics_pattern => "app_prod*"  # 通过正则表达式匹配要订阅的主题
        codec => "json"
    }
}

filter {
    if "access" in [tags][0] {
        grok {
            match => { "message" => "%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] \"%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:hostname} (?:%{QS:referrer}|-) (?:%{NOTSPACE:post_args}|-) %{QS:useragent} (?:%{QS:x_forward_for}|-) (?:%{URIHOST:upstream_host}|-) (?:%{NUMBER:upstream_response_code}|-) (?:%{NUMBER:upstream_response_time}|-) (?:%{NUMBER:response_time}|-)" }
        }
        
        useragent {
            source => "useragent"
            target => "useragent"
        }
        
        geoip {
            source => "clientip"
        }
        
        date {
            match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
            target => "@timestamp"
            timezone => "Asia/Shanghai"
        }
        
        mutate {
            convert => ["bytes","integer"]
            convert => ["response_time", "float"]
            convert => ["upstream_response_time", "float"]
            remove_field => ["message", "agent", "tags"]
            add_field => { "target_index" => "kafka-logstash-nginx-access-%{+YYYY.MM.dd}" }	   
	}

        # 提取 referrer 具体的域名 /^"http/
        if [referrer] =~ /^"http/ {
            grok {
                match => { "referrer" => '%{URIPROTO}://%{URIHOST:referrer_host}' }
            }
        }
    
        # 提取用户请求资源类型以及资源 ID 编号
        if "sky.com" in [referrer_host] {
            grok {
                match => { "referrer" => '%{URIPROTO}://%{URIHOST}/(%{NOTSPACE:sky_type}/%{NOTSPACE:sky_res_id})?"' }
            }
        }
	}

    else if "error" in [tags][0] {
        date {
            match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
                target => "@timestamp"
                timezone => "Asia/Shanghai"
        }
        mutate {
            add_field => { "target_index" => "kafka-logstash-nginx-error-%{+YYYY.MM.dd}" }
        }
    }
}

output {
	stdout {
		codec => rubydebug
	}

    elasticsearch {
        hosts => ["192.168.170.132:9200","192.168.170.133:9200","192.168.170.134:9200"]
        index => "%{[target_index]}"
        template_overwrite => true
    }
}

6.3 Configure kibana

Create Kibana index:

Previous article: [Elastic (ELK) Stack practical tutorial] 09, Kibana analysis site business logs_Stars.Sky's Blog-CSDN Blog

Next article: [Elastic (ELK) Stack Practical Tutorial] 11. Use ElastAlert to Realize ES DingTalk Group Log Alarm_Stars.Sky's Blog-CSDN Blog

Guess you like

Origin blog.csdn.net/weixin_46560589/article/details/130015129