Kubernetes monitoring metrics

This blog address: https://security.blog.csdn.net/article/details/129654237

I. Introduction

Monitoring indicators are different from logs. Logs provide explicit data and are a record of application behavior operations, while indicators measure the behavior of a program within a specific period of time through data aggregation. Metric data is cumulative, they are atomic, and each is a logical unit of measurement. Indicator data can observe the status and trend of the system, but it lacks details for problem location.

2. Monitoring indicators

The monitoring of Kubernetes needs to include the monitoring of the entire infrastructure platform on the one hand, and the monitoring of the running workload on the other hand. The specific monitoring indicators vary according to the characteristics of the cluster. Here we introduce several common monitoring indicators:

1. Kubernetes component status indicators

The Kubernetes cluster architecture includes a master node and multiple computing nodes. The main components include etcd, API Server, scheduler, kube-controller-manager, etc. By monitoring the running status of Kubernetes components, the normal operation of the basic platform can be effectively guaranteed.

2. Cluster status indicators

Cluster status is a key monitoring indicator. We need to know the current status and usage of all aggregated resources in the cluster, such as node status, available Pods, and unavailable Pods.

By monitoring the cluster status, evaluating the monitoring data, and the resulting monitoring indicators, we can see a summary view of the overall health status of the cluster, as well as understand the problems associated with nodes, Pods, and Services. According to the status indicators, it can be judged whether the operation of the cluster is normal and whether there are corresponding risks.

By monitoring cluster status indicators, we can also evaluate the number of resources being used by nodes, including how many nodes are in total and how many nodes are available, so that the number and size of nodes used can be adjusted as needed.

3. Resource Status Indicators

The first is CPU utilization. Clearly and accurately knowing the usage of node CPU resources plays a vital role in ensuring the smooth and safe operation of the system and applications. Through the monitoring of CPU resource usage, we can also analyze resource usage behavior and discover malicious attacks on computing resources such as mining and denial of service attacks.

Second is memory pressure. This monitoring indicator shows the amount of memory a node is using. Through monitoring data, we can understand the memory usage status of the entire node in real time, preventing the node from affecting the application operation due to memory exhaustion.

Finally there is disk pressure. During the use of the disk, a corresponding usage threshold is usually set. By monitoring the disk usage, combined with the established usage threshold, the usage of the node’s disk space can be judged, and then it can be determined whether additional disk space needs to be added, and the current application program Is the disk usage normal, does the disk usage of the application need to be tuned, etc.

4. Network status indicators

The monitoring of indicators related to network status has important indication significance for both communication and security of application programs.

By monitoring network status indicators (such as bandwidth, speed, connection status, etc.), network problems can be discovered in a timely manner, and then the problems can be located and dealt with. In addition, for network status monitoring, in addition to discovering and solving network faults, it is also possible to analyze network status data to determine whether there is an attack at the network layer, such as the detection of denial of service attacks, and the detection of abnormal network behavior.

5. Job running indicators

In addition to monitoring basic infrastructure resources, we also need to monitor running job tasks to ensure accurate operation of tasks.

Two resources, Job and CronJob, are used in Kubernetes to provide the characteristics of one-time tasks and scheduled tasks. These two resources use the controller model to implement resource management. The Job of Kubernetes can create and guarantee the successful stop of a certain number of Pods. When a Pod object held by the Job successfully completes the task, the Job will record the successful operation of the Pod this time. When a certain number of Pod tasks are executed, the current Job will mark its own status as completed.

Based on this mechanism, Pods can be effectively managed and controlled, and these contents can be monitored at the same time, and related problems can also be found, such as job failures, crash cycles, and resource exhaustion.

Guess you like

Origin blog.csdn.net/wutianxu123/article/details/129654237
Recommended