In-depth analysis of cloud native How to quickly build an enterprise-level cloud native log system under Kubernetes

I. Overview

ELK is the abbreviation of three open source software, which respectively means Elasticsearch, Logstash, Kibana, which are all open source software. A new FileBeat has been added, which is a lightweight log collection and processing tool (Agent). Filebeat takes up less resources and is suitable for collecting logs on each server and transferring them to Logstash. This tool is also recommended by the official.
The general flow chart is as follows:

insert image description here

① Elasticsearch storage

Elasticsearch is an open source distributed search engine that provides three functions of collecting, analyzing and storing data. Its features are: distributed, zero configuration, automatic discovery, automatic index fragmentation, index copy mechanism, restful style interface, multiple data sources, automatic search load, etc.

② Filebeat log data collection

filebeat is a member of Beats. Beats is a lightweight log collector. In fact, the Beats family has 6 members. In the early ELK architecture, Logstash was used to collect and parse logs, but Logstash consumes resources such as memory, cpu, and io relatively high. Compared to Logstash, Beats occupies almost negligible system CPU and memory.
Filebeat is a lightweight delivery tool for forwarding and centralizing log data. Filebeat monitors specified log files or locations and collects log events.
Beats currently includes six tools:
- Packetbeat: Network Data (collects network traffic data);
- Metricbeat: Metrics (collects data such as CPU and memory usage at the system, process, and file system levels);
- Filebeat: log files (collect file data);
- Winlogbeat: windows event log (collects Windows event log data);
- Auditbeat: audit data (collect audit logs);
- Heartbeat: Runtime monitoring (collects data while the system is running).
The flow chart of the work is as follows:

insert image description here

Pros: Filebeat is just a binary without any dependencies. It takes up very little resources.
Disadvantages: Filebeat's application range is very limited, so it will encounter problems in some scenarios. In version 5.x, it also has the ability to filter.

③ Kafka

Kafka can cut peaks, and ELK can use redis as a message queue, but redis is not a strong point as a message queue, and the redis cluster is not as good as the professional message publishing system kafka.

④ Logstash filtering

Logstash is mainly a tool for collecting, analyzing, and filtering logs, and supports a large number of data acquisition methods. The general working mode is c/s architecture, the client is installed on the host that needs to collect logs, and the server is responsible for filtering and modifying the received logs of each node and sending them to elasticsearch.
advantage:
- Scalability, Beats should be load balanced across a set of Logstash nodes, at least two Logstash nodes are recommended for high availability, it is common to deploy only one Beats input per Logstash point, but each Logstash node is fine too Deploy multiple Beats inputs to expose separate endpoints for different data sources;
- Resilience: Logstash persistent queues provide protection across node failures, and for disk-level resiliency in Logstash it is important to ensure disk redundancy. For on-premise deployments, it is recommended that you configure RAID, and when running in a cloud or containerized environment, it is recommended to use persistent disks with a replication policy that reflects the data SLA;
- Filterable: Performs regular transformations on event fields, fields in events can be renamed, deleted, replaced and modified;
Disadvantages: Logstash consumes a lot of resources, and takes up a lot of CPU and memory; in addition, there is no message queue cache, and there is a risk of data loss.

⑤ Kibana Exhibition

Kibana is also an open source and free tool. Kibana can provide a log analysis friendly web interface for Logstash and ElasticSearch, which can help summarize, analyze and search important data logs.
The relationship between filebeat and logstash: Because logstash is run by jvm and consumes a lot of resources, the author later wrote a lightweight logstash-forwarder with less functions but less resource consumption in golang. However, the author is just one person. After joining http://elastic.co, because es company itself acquired another open source project packetbeat, and this project is dedicated to using golang, there is a whole team, so es company simply put logstash- The development of forwarder is also merged into the same golang team, so the new project is called filebeat.

Two, helm3 install ELK

The overall flow chart is as follows:

insert image description here

① Preparatory conditions

Add helm repository:

$ helm repo add elastic   https://helm.elastic.co

② helm3 install elasticsearch

Custom values: mainly to set storage class persistence and resource limits, if the computer resources are limited, you can reduce the resources:

# 集群名称
clusterName: "elasticsearch"
# ElasticSearch 6.8+ 默认安装了 x-pack 插件，部分功能免费，这里选禁用
esConfig:
 elasticsearch.yml: |
    network.host: 0.0.0.0
    cluster.name: "elasticsearch"
    xpack.security.enabled: false
resources:
  requests:
    memory: 1Gi
volumeClaimTemplate:
  storageClassName: "bigdata-nfs-storage"
  accessModes: [ "ReadWriteOnce" ]
  resources:
    requests:
      storage: 3Gi
service:
  type: NodePort
  port: 9000
  nodePort: 31311

Disable Kibana security prompt (Elasticsearch built-in security features are not enabled) xpack.security.enabled: false.
Start the installation of Elasitcsearch:

$ helm install es elastic/elasticsearch -f my-values.yaml  --namespace bigdata

insert image description here

W1207 23:10:57.980283   21465 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1207 23:10:58.015416   21465 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
NAME: es
LAST DEPLOYED: Tue Dec  7 23:10:57 2021
NAMESPACE: bigdata
STATUS: deployed
REVISION: 1
NOTES:
1. Watch all cluster members come up.
  $ kubectl get pods --namespace=bigdata -l app=elasticsearch-master -w2. Test cluster health using Helm test.
  $ helm --namespace=bigdata test es

To check, all pods need to be running normally to be normal. Downloading the image is a bit slow, so you need to wait for a while before checking:

$ kubectl get pod -n bigdata -l app=elasticsearch-master
$ kubectl get pvc -n bigdata
$ watch kubectl get pod -n bigdata -l app=elasticsearch-master

insert image description here

verify:

$ helm --namespace=bigdata test es
$ kubectl get pod,svc -n bigdata -l app=elasticsearch-master -o wide
$ curl 192.168.0.113:31311/_cat/health
$ curl 192.168.0.113:31311/_cat/nodes

insert image description here

clean up:

$ helm uninstall es -n bigdata
$ kubectl delete pvc elasticsearch-master-elasticsearch-master-0 -n bigdata
$ kubectl delete pvc elasticsearch-master-elasticsearch-master-1 -n bigdata
$ kubectl delete pvc elasticsearch-master-elasticsearch-master-2 -n bigdata

③ helm3 install Kibana

Custom values:

$ cat <<EOF> my-values.yaml
#此处修改了kibana的配置文件，默认位置/usr/share/kibana/kibana.yaml
kibanaConfig:
   kibana.yml: |
     server.port: 5601
     server.host: "0.0.0.0"
     elasticsearch.hosts: [ "elasticsearch-master-headless.bigdata.svc.cluster.local:9200" ]
resources:
  requests:
    cpu: "1000m"
    memory: "256Mi"
  limits:
    cpu: "1000m"
    memory: "1Gi"
service:
  #type: ClusterIP
  type: NodePort
  loadBalancerIP: ""
  port: 5601
  nodePort: "30026"
EOF

Start installing Kibana:

$ helm install kibana elastic/kibana -f my-values.yaml  --namespace bigdata

insert image description here

verify:

$ kubectl get pod,svc -n bigdata -l app=kibana

Browser access: http://192.168.0.113:30026/

insert image description here

clean up:

$ helm uninstall kibana -n bigdata

④ helm3 install Filebeat

Filebeat collects the log path of docker on the host by default: /var/lib/docker/containers, how to collect if the installation path of docker is modified? Very simple, modify the hostPath parameter in the DaemonSet file in the chart:

- name: varlibdockercontainers
  hostPath:
    path: /var/lib/docker/containers   #改为docker安装路径

Of course, you can also customize the values to modify. Here, it is recommended to modify the collection log path by customizing the values.
Custom values: By default, the data is stored in ES, and the modified data is stored in Kafka here:

$ cat <<EOF> my-values.yaml
daemonset:
  filebeatConfig:
    filebeat.yml: |
      filebeat.inputs:
      - type: container
        paths:
          - /var/log/containers/*.log

      output.elasticsearch:
        enabled: false
        host: '${
    
    NODE_NAME}'
        hosts: '${
    
    ELASTICSEARCH_HOSTS:elasticsearch-master:9200}'
      output.kafka:
       enabled: true
       hosts: ["kafka-headless.bigdata.svc.cluster.local:9092"]
       topic: test
EOF

Start installing Filefeat:

$ helm install filebeat elastic/filebeat -f my-values.yaml  --namespace bigdata
$ kubectl get pods --namespace=bigdata -l app=filebeat-filebeat -w

insert image description here

verify:

# 先登录kafka客户端
$ kubectl exec --tty -i kafka-client --namespace bigdata -- bash
# 再消费数据
$ kafka-console-consumer.sh --bootstrap-server kafka.bigdata.svc.cluster.local:9092 --topic test

insert image description here

It can be seen that the data can be consumed, indicating that the data has been stored in kafka. Check the kafka data backlog:

$ kubectl exec --tty -i kafka-client --namespace bigdata -- bash
$ kafka-consumer-groups.sh --bootstrap-server kafka-0.kafka-headless.bigdata.svc.cluster.local:9092 --describe --group mygroup

It is found that a large amount of data is in a backlog state:

insert image description here

The next step is to deploy logstash to consume kafka data, and finally store it in ES.
clean up:

$ helm uninstall filebeat -n bigdata

⑤ install Logstash on helm3

Custom values (replace the addresses of ES and kafka with your own environment):

$ cat <<EOF> my-values.yaml
logstashConfig:
  logstash.yml: |
    xpack.monitoring.enabled: false

logstashPipeline: 
   logstash.yml: |
    input {
    
    
      kafka {
    
    
            bootstrap_servers => "kafka-headless.bigdata.svc.cluster.local:9092"
            topics => ["test"]
            group_id => "mygroup"
            #如果使用元数据就不能使用下面的byte字节序列化，否则会报错
            #key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
            #value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
            consumer_threads => 1
            #默认为false，只有为true的时候才会获取到元数据
            decorate_events => true
            auto_offset_reset => "earliest"
         }
    }
    filter {
    
    
      mutate {
    
    
        #从kafka的key中获取数据并按照逗号切割
        split => ["[@metadata][kafka][key]", ","]
        add_field => {
    
    
            #将切割后的第一位数据放入自定义的“index”字段中
            "index" => "%{[@metadata][kafka][key][0]}"
        }
      }
    }
    output {
    
     
      elasticsearch {
    
    
          pool_max => 1000
          pool_max_per_route => 200
          hosts => ["elasticsearch-master-headless.bigdata.svc.cluster.local:9200"]
          index => "test-%{+YYYY.MM.dd}"
      }
    }

# 资源限制
resources:
  requests:
    cpu: "100m"
    memory: "256Mi"
  limits:
    cpu: "1000m"
    memory: "1Gi"

volumeClaimTemplate:
  accessModes: ["ReadWriteOnce"]
  resources:
    requests:
      storage: 3Gi
EOF

output plugin output plugin, send events to a specific target:

stdout {
    
     codec => rubydebug }            // 开启 debug 模式，可在控制台输出

stdout : Standard output, outputs events to the screen:

output{
    
    
    stdout{
    
    
        codec => "rubydebug"
    }
}

file: write events to a file:

output{
    
    
   file {
    
    
       path => "/data/logstash/%{host}/{application}
       codec => line { format => "%{
    
    message}"} }
    }
}

kafka: Send events to kafka:

output{
    
    
   kafka{
    
    
        bootstrap_servers => "localhost:9092"
        topic_id => "test_topic"  #必需的设置。生成消息的主题
    }
}

elasticseach: store logs in es:

output{
    
    
   elasticsearch {
    
    
        #user => elastic
        #password => changeme
        hosts => "localhost:9200"
        index => "nginx-access-log-%{+YYYY.MM.dd}"  
    }
}

Start installing Logstash:

$ helm install logstash elastic/logstash -f my-values.yaml  --namespace bigdata

insert image description here

$ kubectl get pods --namespace=bigdata -l app=logstash-logstash

insert image description here

verify:
- Log in to kibana to see if the index is created:

insert image description here

- View logs:

$ kubectl logs -f  logstash-logstash-0 -n bigdata >logs
$ tail -100 logs

insert image description here

- View kafka consumption:

$ kubectl exec --tty -i kafka-client --namespace bigdata -- bash
$ kafka-consumer-groups.sh --bootstrap-server kafka-0.kafka-headless.bigdata.svc.cluster.local:9092 --describe --group mygroup

insert image description here

View index data through kibana (Kibana version: 7.15.0) Create index mode:

insert image description here

Query data (Discover) through the index mode created above:

insert image description here

clean up:

$ helm uninstall logstash -n bigdata

3. ELK-related backup components and backup methods

Elasticsearch backs up in two ways:
- Export the data into a text file, such as exporting the data stored in Elasticsearch to a file through tools such as elasticdump and esm, which is suitable for scenarios with small data volumes;
- Back up the files in the elasticsearch data directory to take snapshots, and use the functions implemented by the snapshot interface in Elasticsearch to apply to scenarios with large amounts of data.

① Elasticsearch snapshot snapshot backup

Advantages and disadvantages of snapshot snapshot backup:
- Advantages: Take snapshots through snapshots, and then define snapshot backup strategies, which can realize automatic storage of snapshots, and can define various strategies to meet your own different backups;
- Disadvantages: The restoration is not flexible enough, taking a snapshot for backup is fast, but there is no way to restore it at will when restoring, similar to virtual machine snapshots.
Configure the backup directory:
- In the configuration file of elasticsearch.yml, indicate that path.repo can be used as a backup path, as follows:

path.repo: ["/mount/backups", "/mount/longterm_backups"]

- After configuration, you can use the snapshot api to create a repository, as follows to create a repository named my_backup:

PUT /_snapshot/my_backup
{
    
    
  "type": "fs",
  "settings": {
    
    
    "location": "/mount/backups/my_backup"
  }
}

Start backup through the API interface:
- With repostiroy, you can make a backup, also called a snapshot, which is to record the current state of the data. Create a snapshot named snapshot_1 as follows:

PUT /_snapshot/my_backup/snapshot_1?wait_for_completion=true

- wait_for_completion is true means that the api will return the result after the backup is executed, otherwise it will be executed asynchronously by default. Here, in order to see the effect immediately, this parameter is set. It is not necessary to set this parameter when executing online, and let it be asynchronous in the background Execute it.
Incremental backup: After the execution is completed, you can find that the volume of /mount/backups/my_backup has become larger, which means that new data backup has come in. One thing to note is that when multiple snapshots are made in the same repository, elasticsearch will check Whether the data segment file to be backed up has changed, if there is no change, it will not be processed, otherwise only the changed segment file will be backed up, which actually realizes incremental backup:

PUT /_snapshot/my_backup/snapshot_2?wait_for_completion=true

Data recovery: The recovery function can be quickly realized by calling the following API:

POST /_snapshot/my_backup/snapshot_1/_restore?wait_for_completion=true
{
    
    
  "indices": "index_1",
  "rename_replacement": "restored_index_1"
}

② elasticdump backup migration es data

Index data is exported as a file (backup):

# 导出索引Mapping数据
$ elasticdump \
  --input=http://es实例IP:9200/index_name/index_type \
  --output=/data/my_index_mapping.json \    # 存放目录
  --type=mapping 
# 导出索引数据
$ elasticdump \
  --input=http://es实例IP:9200/index_name/index_type \
  --output=/data/my_index.json \
  --type=data

Index data files are imported into the index (restore):

# Mapping 数据导入至索引
$ elasticdump \
  --output=http://es实例IP:9200/index_name \
  --input=/home/indexdata/roll_vote_mapping.json \ # 导入数据目录
  --type=mapping
# ES文档数据导入至索引
$ elasticdump \
  --output=http:///es实例IP:9200/index_name \
  --input=/home/indexdata/roll_vote.json \ 
  --type=data

The backup data can be directly imported into another es cluster:

$ elasticdump --input=http://127.0.0.1:9200/test_event   --output=http://127.0.0.2:9200/test_event --type=data

type is the ES data export and import type, and the Elasticdump tool supports the following data types:


type type	illustrate
mapping	Index mapping structure data of ES
data	ES data
settings	ES index library default configuration
analyzer	ES tokenizer
template	ES template structure data
alias	Index aliases for ES

③ esm backup and migration es data

Backup es data:

$ esm -s http://10.33.8.103:9201 -x "petition_data" -b 5 --count=5000 --sliced_scroll_size=10 --refresh -o=./es_backup.bin

-w indicates the number of threads -b indicates the data size of a bulk request, the unit is MB, the default is 5M -c the number of scroll requests to import and restore es data:

$ esm -d http://172.16.20.20:9201 -y "petition_data6" -c 5000 -b 5 --refresh -i=./dump.bin