[Record of production environment K8S from construction to operation and maintenance (5)] K8S environment log collection plan
1. The foregoing
Today I want to talk to you about the log collection solution in the k8s environment. Mainly introduce what log files our system has, and how various log files are collected and applied.
2. Log types
The classification here is based on the log processed by our existing production environment. According to the cluster classification, it can be divided into the following three types:
- K8S cluster log
- PKS cluster log
- Monitor machine log
According to the business type, it can be divided into the following two types:
- Business log
- System log
3.Log Systerm composition
Please refer to other chapters of this series of articles for detailed introduction of PKS Cluster, K8S Cluster, and Monitor System in the following composition diagram.
3-1 Log collection
-
PKS cluster log collection and backup
The system logs of all servers in the PKS cluster are transferred from the cluster to the NAS mounted on the monitoring server through the syslog service.
-
K8S cluster log collection and backup
The K8S cluster, that is, the log on the so-called Node node is divided into two types: system log and business log.
The system log, like the PKS cluster, is also forwarded through the syslog service, and the system log on the Node node is forwarded to the NAS mounted on the monitoring server.
The business log is collected through the fluent-bit service, then transferred to the monitoring server, and then aggregated by the fluentd service on the monitoring server. The final log file is stored in the NAS. What we need to note here is that the log files that are finally stored in the NAS are not the original files in the K8S cluster, but are processed by fluent-bit and fluentd.
-
Monitoring server log collection and backup
The monitoring server is a virtual server independent of the K8S cluster and the PKS cluster. The log collection and backup on this machine is implemented through scripts. We set the crontab to execute it regularly, and finally back up to the NAS.
3-2 fluent-bit VS fluentd
We mentioned above that the business logs on each node of the K8S cluster are collected and aggregated through fluent-bit and fluentd, so let's briefly introduce these two services.
Fluent bit is a plug-in, lightweight, multi-platform open source log collection tool written in C. It allows data to be collected from different sources and sent to multiple destinations. Fully compatible with docker and kubernetes ecological environment. In a nutshell, fluent-bit is a simple log collection and processing tool. And fluentd, similar to fluent-bit, is also a log collection and processing tool, but compared to fluent-bit, fluentd has a more powerful aggregation function. Here I list their comparison for reference.
fluentd | fluent-bit | |
---|---|---|
range | Container/server | Container/server |
Language | C和Ruby | C |
size | About 40MB | About 450KB |
performance | high performance | high performance |
Dependency | Built as Ruby Gem, it mainly relies on gems | Except for some installation and compilation plug-ins (GCC, CMAKE) other zero dependencies. |
Plugin support | More than 650 available plugins | About 35 available plugins |
license | Apache License Version 2.0 | Apache License Version 2.0 |
3-3 Log Monitoring
One of the most critical functions of log collection is for monitoring. Regarding monitoring, we have a special chapter to explain. Here I will only briefly introduce the design outline of log collection and monitoring. A flunet-bit service is running on each Node node in the K8S cluster to collect log information. Fluntd on the monitoring server aggregates and filters the log information collected by fluent-bit. Prometheus monitors the information processed by fluntd, and finally displays the monitoring information visually through grafana. One thing that needs to be explained here is that the fluent-bit in the K8S cluster exists in the form of Pod, and the fluentd, prometheus, and grafana in the monitoring server are all running in the form of containers.
4 Conclusion
There are many tools and solutions for log collection and analysis, but no matter what method is used for collection, log deletion is also a problem that needs to be considered. If it is deleted irregularly, the Node node will crash because the disk space is full of log files. Our environment encountered such a problem during the operation process. Because some business logs were not deleted in time, other Pods running on the Node node could not work normally, including flunt-bit, so the log could not be normal. Be collected.
Author: rm * Group
Date: 2020/9/20