Kubernetes business log collection and monitoring

With the maturity of container technology and container orchestration technology, more and more small and medium-sized enterprises are also migrating services to Kubernetes, providing smarter and more convenient management services and reducing operation and maintenance costs. During the migration process, business logs The collection and monitoring of business services, the monitoring of cluster status becomes very important. This article mainly describes how to collect business logs and monitor cluster and business status when migrating services to Kubernetes.


Cluster business log collection

89feb2e1b5c65507dc2d940bbab653b2.png


Our company is not large in scale and has fewer operation and maintenance personnel, so the overall log collection process in the business migration to Kubernetes has not been greatly changed, only some adjustments have been made. Since the early log directory and log name specifications are very good, it is also very convenient to adjust. The log collection process before business migration to Kubernetes is shown in Figure 1.

70ee5f5c87d3604ee2c7bca497852581.png

Figure 1 The old ELK architecture The
old log collection architecture actually has some problems:
  1. When Logstash hangs, there is a problem of losing logs. Before upgrading Logstash across major versions, the logs are lost for a period of time.

  2. When one of the consumer programs behind Logstash hangs, for example, ES hangs, it will cause problems with log writing to files and writing to InfluxDB. Filebeat will continue to push logs to Logstash, resulting in log loss.


The above two problems show that the coupling between components is serious, so we made some adjustments to the architecture when migrating Kubernetes. The adjusted log collection architecture is shown in Figure 2.

bf54b0efe6dd2d2960e90874bc37236e.png

Figure 2 New ELK architecture
  1. Kafka is added between Filebeat and Logstash to temporarily cache logs and to decouple log consumers.

  2. Start multiple Logstash instances to process file writing, connect ES, and write InfluxDB to decouple the log consumption program.

  3. Because the business migration requires fast and short time, and there is no time to print all the logs on the container stdout, the container logs are mapped to the host machine, and then collected by Filebeat to realize the business log collection in Kubernetes.


Here are the configuration changes of point 1 and 3:
The configuration changes brought by point 1 are compared as shown in Figure 3.

b4122f698197f65518f2f57ca6e40ea2.png

Figure 3 After accessing Kafka, the configuration change
focuses on sharing the third point change: At the beginning, a new service within the company needs to directly write the collection rules of all log types of the service into the filebeat.yml file, and then use Ansible to deliver Restart and Filebeat on all business machines. This causes the filebeat.yml file to become longer and longer, difficult to maintain, and an incorrect configuration will cause Filebeat to fail to start, resulting in logs that cannot be collected in time. Later, we split each business log collection rule into small yaml files and store them in /etc/filebeat/inputs.d/. The business Filebeat collection rules are generated by an automatically generated configuration script to obtain the business namespace and service logs from the Kubernetes cluster Path to automatically generate business Filebeat configuration and then send the configuration to the specified directory through Ansible. The third point configuration changes are compared with the 3 parts in Figure 4 below.

89feb2e1b5c65507dc2d940bbab653b2.png

Figure 4 Filebeat collection changes
After the above points are modified, we quickly completed the collection of business logs in the Kubernetes cluster, and improved the stability of the log system, which brought convenience to the regular upgrade of the log system in the future.
Of course, our system also has shortcomings. Now our Filebeat is still started with Systemd. Every time a new Node is added, Filebeat needs to be installed once, and the automatic script needs to retrieve this node and send the configuration to the node. Later we plan to use Kubernetes DaemonSet mode is used to deploy Filebeat and automatically update Filebeat configuration when new services are added, and use Kubernetes to complete Filebeat installation and configuration updates.


Cluster status monitoring and business status monitoring

b7a2a94f77ef195f7e74a6b38e175e0b.png


We use Prometheus Operator's package of open source tool solutions for cluster status monitoring and business status monitoring.
First introduce the role of each component:
  • Prometheus Operator: Prometheus takes the initiative to pull monitoring data. In Kubernetes, Pod causes the IP to change constantly due to scheduling reasons. It is impossible to maintain it manually. Automatic discovery is based on DNS, but the addition is still a bit troublesome. The job of Prometheus Operator is the implementation of a set of user-defined CRD resources and Controller. The controller of Prometheus Operator has RBAC authority, is responsible for monitoring the changes of these custom resources, and automatically completes such as Prometheus Server itself and configuration according to the definition of these resources. Automated management work.

  • Kube-state-metrics: It can collect data related to most of the built-in Kubernetes resources, such as Pod, Deploy, Service, etc. At the same time, it also provides its own data, mainly the statistics of the number of resource collections and the number of abnormalities in the collection. For example, how many replicas do I schedule? How many are available now? How many Pods are running/stopped/terminated? How many times has the Pod restarted? And so on, these status values ​​can be alarmed by adding Prometheus rule to notify operation and maintenance personnel and developers in time.

  • Prometheus: Used to collect cluster component metrics and various resource status metrics in the cluster, as well as custom-implemented monitoring metric data.

  • AlertManager: handles alerts sent by clients such as Prometheus server. It is responsible for deleting duplicate data, grouping, and routing alerts to the correct receivers, such as email, Webhook, etc.


Before deploying the business in Kubernetes, the monitoring system and log collection system need to be installed in advance. Because of the business division and other reasons, there are multiple cluster environments in our company, and each cluster deployment area is inconsistent, so when deploying monitoring services, we The solution adopted is to create a complete monitoring system in each cluster. Because our business does not belong to a business with a large PV and the cluster size is small, a monitoring system for each cluster is more reasonable. The component services in the monitoring system are stored in a separate namespace and installed directly using YAML instead of helm. The cluster monitoring architecture is shown in Figure 5.

f9f4a39dffdb28ad361939ddc35f0340.png

Figure 5 Kubernetes cluster monitoring architecture
As can be seen from the figure, we mainly collected three aspects of monitoring data:
  • Cluster components: We have collected the metrics of five components: API, Controller, Scheduler, Kubelet, and CoreDNS. Since our cluster is installed in binary mode, the acquisition of some component metrics (Controller, Scheduler) data needs to configure Endpoints and cooperate with Service. Complete Metric collection with ServiceMonitor. The configuration of Endpoints, Service, and ServiceMonitor of Controller is given below as shown in Figure 6.

    4bbe56edb1d115d923ea15a662405141.png

    Figure 6 Controller configuration


  • Pod status and resource status: Metrics in these two aspects are collected by the kube-state-metrics component. As for the functions of kube-state-metrics, I will not introduce more on the Internet and the official website.


There are also many online tutorials on the deployment of Prometheus Operator, Prometheus, Alertmanager, and kube-state-metrics. It is also fun to build and solve problems by yourself. You can also check my GitHub repository: https://github. com/doubledna/k8s-monitor, refer to installation.
The focus here is on Prometheus storage. Since Prometheus is deployed in a Kubernetes cluster and Pod is dead at any time, direct local storage is not appropriate. Of course, PV can also be used for storage, but there are many good and complete Prometheus remotes. Storage schemes are available, and remote storage schemes are recommended. Our company uses InfluxDB as the Prometheus remote storage solution. The performance of the stand-alone InfluxDB is already very strong and basically meets daily needs, but because his cluster solution is not open source, there is a single point of failure problem. So you can choose other remote storage solutions to avoid this problem. Prometheus connects to InfluxDB configuration as shown in Figure 7.

3a26f9b64dbe8b5544863983abbbb582.png

Figure 7 Prometheus remote read and write
about the alarm. In order to unify various alarm resources and alarm information formats within our company, the Architecture Department has developed an alarm API program. If other departments need to use it, they only need to convert the alarm content into a specific Json format and send it to the alarm API to push the alarm information to the individual. Phone, email, SMS, corporate WeChat. Therefore, in Alertmanager, we use the Webhook mode to send Json format alarm data to the specified API. It is recommended that capable students use this mode to process the alarm data by themselves instead of using the default configuration method to send email or WeChat. The default alarm format is not beautiful.
Grafana is of course used for data display. The effect of Grafana is shown in Figure 8 and Figure 9. My graphic configuration is also based on the template of other students in the official Grafana template. You can download the template that suits you from the official website.

6729ca29481035b689111d4de1f5b01a.png

Figure 8 Kubernetes component display

1d602de13897983b3afbead0694cb2b0.png

Figure 9 Pod performance display.
To summarize, it is suitable for small and medium-sized enterprises to use the open source Prometheus Operator package tool to quickly build a Kubernetes monitoring system. This system can basically monitor business status and component status in Kubernetes. Of course, this The system also has some shortcomings:
  • Event monitoring/collection: In Kubernetes, the Pod deployment process will generate some events. These events will record the Pod's life cycle and scheduling. And the event is not permanently stored in the cluster. If you need to know the Pod scheduling in the cluster, and collect events for auditing or other purposes, and when there are more and more businesses in the cluster, it is not possible to check events in the cluster in real time, then event monitoring/collection is very important . I recently learned about Ali’s event collection tool kube-eventer. It looks pretty good. Students who are interested can discuss how to build it.

  • In some cases, Prometheus generates too many alarms and too frequently: Although we have adjusted some of the Prometheus rules and modified the alarm frequency in the rule, in actual applications, we occasionally encounter a large number of alarms when a node fails. Information problem. Although these alarm messages are useful, too much information interferes with troubleshooting.


JVM monitoring

89feb2e1b5c65507dc2d940bbab653b2.png


Regarding JVM monitoring, since our back-end services are all written in Java, I believe that the back-ends of most domestic companies are Java. As we all know, Java programs run on the JVM virtual machine. As the business scale increases, JVM monitoring becomes more and more important. At the beginning, our company did not have JVM monitoring, and the development students were run on the business machine everywhere JVM related information, but later as the company has more and more developers, more and more specifications and requirements. One can centrally view the JVM monitoring display is a must. So I started to build the JVM monitoring service, and compared the three solutions of Pinpoint, JMX Exporter, and Skywalking during the selection. Although Pinpoint and Skywalking also support functions such as link tracking, the company already has Zipkin and the deployment of Pinpoint and Skywalking services is compared. Trouble, so I chose the JMX Exporter solution. This method has nothing to do with business code and is simple to deploy. The JMX monitoring architecture for deploying business on the host is shown in Figure 10.

966e430e86fa6b59de4b2e3fdd225b45.png

Figure 10
: A brief introduction to the old JVM monitoring . The architecture in the figure: Nowadays, due to the popularity of microservices, there are many back-end services and new back-end services are quickly added. It is not a wise choice to configure the service IP + JVM port directly in prometheus.yml. So we choose to use Consul to do the automatic registration service function and send all the service JVM port addresses deployed on the machine to Consul through the Python script on each business machine. Prometheus obtains the service JVM Metrics address from Consul to obtain the service. JVM data. Before there was no Kubernetes cluster, Prometheus used Docker to start, and remote storage used ClickHouse. However, because ClickHouse connected to Promehteus needed to use prom2click to do data conversion, one more component had more maintenance costs and this component was not fully functional, so the Prometheus remote storage in the Kubernetes cluster has been changed. Into InfluxDB. The JMX Exporter injection service is also very convenient to start. Here, you only need to put the JMX Exporter jar package and configuration file in the startup command of the program and specify the port to obtain JVM-related Metric data. The service start command is as follows:
java -javaagent:/data/jmx_prometheus_javaagent-0.11.0.jar=57034:/data/jmx_exporter/jmx_exporter.yml -jar -DLOG_DIR=/log path/service name/ -Xmx512m -Xms256m service name.jar

Configure Consul in Prometheus as shown in Figure 11. When Prometheus collects the business JVM Metrics data, it can do aggregation display, alarm and other functions in Grafana.

4548cb35820f1010489603bd74c66359.png

Figure 11 Prometheus Consul
When the service is migrated to the Kubernetes cluster, we have changed the deployment method of JMX Exporter. JMX Exporter shares jar packages and configuration to business containers through initContainer, which facilitates jmx_exporter jar package upgrades and configuration changes. The configuration of YAML is not specifically shown here, but for JVM collection, we abandoned the previous automatic registration method with Consul and used ServiceMonitor with Service to obtain JVM Metrics data. Configure YAML as shown in Figure 12 and Grafana as shown in Figure 13.

21c9a95154049c74e9f20e7dc5ebd540.png

Figure 12 JVM YAML

89feb2e1b5c65507dc2d940bbab653b2.png

Figure 13 JVM Grafana
In Kubernetes, our JVM monitoring also encountered some problems. For example, the program was directly deployed on the business machine before, and the host IP was fixed. When the JVM hangs, the program restarts automatically. We can see the program hang in Grafana The JVM state at that moment is lost, and the business image on a machine is continuous, and in the cluster, because the Pod IP and Pod name are changing, which causes a service Pod to hang, look for the previously dead Pod on Grafana It is troublesome, and the graph is not continuous on the timeline. I have thought about using the StatfulSet mode to fix the name before, but I finally felt that it was contrary to the stateless principle and did not adopt it. This problem has yet to be resolved.


Q&A

471d1a3ac37229aeb3963513051ed8de.png


Q: Why use JMX Exporter to monitor JVM instead of Actuator/Prometheus? A: At first, the company only considered link tracking, using Zipkin. Later, when JVM monitoring was requested, the requirement was that no code intrusion was allowed at all, so JMX Exporter was considered. I don't know the tool you mentioned very well, so I can understand it later.

Q: How to achieve business monitoring alarms, such as an increase in the response time of an interface or more frequent calls? A: Our company's approach is to record the interface response time of the program itself, collect it through Prometheus after exposing it to Metric data, configure the corresponding rule, and push the alarm message to Alertmanager before sending it to the operation and maintenance personnel after the set response time.


Q: Kube-state is easy to OOM when the cluster is large, how to solve it? A: Our cluster nodes are still very few, and we haven't encountered the problem of kube-state-metric OOM. Since your cluster is large, and because kube-state collects a lot of data, it is always okay to give more memory of.

Q: How is the internal application http call adjusted? Use the service domain name directly? Or is there an ingress controller in the intranet? Or other options? A: We use Eureka for service registration and discovery, but I hope the company can use Kubernetes' servic. Eureka calls are a bit of a problem.
Q: Are there any suggestions for the multi-tenant model of the Kubernetes logging system? A: If your multi-tenant is to distinguish nanospaces, you can consider adding namespace to the log path or kafka topic, so that the distinction can be made.
Q: Have you considered monitoring data aggregation across clusters and computer rooms? A: We have used the Prometheus federal method before, but because our business has many restrictions on the network across countries, we gave up this method. If your business is in China or a cloud service provider or network conditions permit, you can do it.


Guess you like

Origin blog.51cto.com/14992974/2547606