[Record of production environment K8S from construction to operation and maintenance (5)] K8S environment log collection plan

[Record of production environment K8S from construction to operation and maintenance (5)] K8S environment log collection plan

1. The foregoing

  Today I want to talk to you about the log collection solution in the k8s environment. Mainly introduce what log files our system has, and how various log files are collected and applied.

2. Log types

  The classification here is based on the log processed by our existing production environment. According to the cluster classification, it can be divided into the following three types:

  • K8S cluster log
  • PKS cluster log
  • Monitor machine log

According to the business type, it can be divided into the following two types:

  • Business log
  • System log

3.Log Systerm composition

  Please refer to other chapters of this series of articles for detailed introduction of PKS Cluster, K8S Cluster, and Monitor System in the following composition diagram.
Insert picture description here

3-1 Log collection

  1. PKS cluster log collection and backup

     The system logs of all servers in the PKS cluster are transferred from the cluster to the NAS mounted on the monitoring server through the syslog service.

  2. K8S cluster log collection and backup

     The K8S cluster, that is, the log on the so-called Node node is divided into two types: system log and business log.

     The system log, like the PKS cluster, is also forwarded through the syslog service, and the system log on the Node node is forwarded to the NAS mounted on the monitoring server.

     The business log is collected through the fluent-bit service, then transferred to the monitoring server, and then aggregated by the fluentd service on the monitoring server. The final log file is stored in the NAS. What we need to note here is that the log files that are finally stored in the NAS are not the original files in the K8S cluster, but are processed by fluent-bit and fluentd.

  3. Monitoring server log collection and backup

     The monitoring server is a virtual server independent of the K8S cluster and the PKS cluster. The log collection and backup on this machine is implemented through scripts. We set the crontab to execute it regularly, and finally back up to the NAS.

3-2 fluent-bit VS fluentd

  We mentioned above that the business logs on each node of the K8S cluster are collected and aggregated through fluent-bit and fluentd, so let's briefly introduce these two services.

  Fluent bit is a plug-in, lightweight, multi-platform open source log collection tool written in C. It allows data to be collected from different sources and sent to multiple destinations. Fully compatible with docker and kubernetes ecological environment. In a nutshell, fluent-bit is a simple log collection and processing tool. And fluentd, similar to fluent-bit, is also a log collection and processing tool, but compared to fluent-bit, fluentd has a more powerful aggregation function. Here I list their comparison for reference.

fluentd fluent-bit
range Container/server Container/server
Language C和Ruby C
size About 40MB About 450KB
performance high performance high performance
Dependency Built as Ruby Gem, it mainly relies on gems Except for some installation and compilation plug-ins (GCC, CMAKE) other zero dependencies.
Plugin support More than 650 available plugins About 35 available plugins
license Apache License Version 2.0 Apache License Version 2.0

3-3 Log Monitoring

  One of the most critical functions of log collection is for monitoring. Regarding monitoring, we have a special chapter to explain. Here I will only briefly introduce the design outline of log collection and monitoring. A flunet-bit service is running on each Node node in the K8S cluster to collect log information. Fluntd on the monitoring server aggregates and filters the log information collected by fluent-bit. Prometheus monitors the information processed by fluntd, and finally displays the monitoring information visually through grafana. One thing that needs to be explained here is that the fluent-bit in the K8S cluster exists in the form of Pod, and the fluentd, prometheus, and grafana in the monitoring server are all running in the form of containers.
Insert picture description here

4 Conclusion

 There are many tools and solutions for log collection and analysis, but no matter what method is used for collection, log deletion is also a problem that needs to be considered. If it is deleted irregularly, the Node node will crash because the disk space is full of log files. Our environment encountered such a problem during the operation process. Because some business logs were not deleted in time, other Pods running on the Node node could not work normally, including flunt-bit, so the log could not be normal. Be collected.

Author: rm * Group
Date: 2020/9/20

Guess you like

Origin blog.csdn.net/ashdfoiuasdhfoief/article/details/108437204