Why does K8S kill my app

First public account: Binary Community, reprint contact: [email protected]

Guide

"K8S provides us with the ability to automatically deploy and schedule applications, and automatically restart failed applications through the health check interface to ensure service availability. However, this automatic operation and maintenance will cause our applications to fall into a continuous scheduling process under certain special circumstances. This led to business damage. This article analyzes the frequent restart scheduling problem of a core platform application on the production line, and analyzes it step by step from the system to the application, and finally locates the code level to solve the problem."

phenomenon

After the devops infrastructure is built, the business has been fully containerized and deployed based on k8s, but individual businesses will be automatically restarted by k8s after running for a period of time, and the restart is irregular, sometimes in the afternoon, sometimes in the afternoon In the early morning, from the k8s interface, some have been restarted hundreds of times:

Why does K8S kill my app

analysis

Platform-side analysis: understand the restart strategy of the platform

K8s restarts according to the restart strategy defined in pod yaml. This strategy is set through: .spec.restartPolicy, and supports the following three strategies:

  • Always: When the container terminates and exits, always restart the container, the default policy.
  • Onfailure: When container planting exits abnormally (exit code is not 0), restart the container.
  • Never: Do not restart the container when it terminates and exits.

The problematic application is automatically packaged and released using CICD. Yaml is also automatically generated in the CD link, and does not display the specified restart strategy, so the Always strategy is adopted by default. Then, under what circumstances will k8s trigger a restart, mainly in the following scenarios:

    1. POD exits normally
    1. POD exited abnormally
    1. POD uses the slave CPU to exceed the CPU limit set in yaml, or exceed the CPU limit configured in the namespace where the container is located
    1. POD usage exceeds the upper limit of Memory set in yaml or exceeds the upper limit of Memory configured in the namespace where the container is located
    1. When the resources of the host machine at runtime cannot meet the resources of the POD (memory CPU), it will be automatically scheduled to other machines, and the number of restarts will also occur +1
    1. The image specified when creating the POD cannot be found or there is no node node that meets the resource (memory and CPU) requirements of the POD, and it will continue to restart

The problematic application is restarted after running normally for a period of time, and the Yaml file of the POD itself and the namespace where it is located does not have a CPU upper limit, then it can be ruled out: 1 3 4 6, the business is developed by Springboot. If you exit without any reason, JVM The dump file itself is generated, but the restart behavior is triggered by K8s itself. Even if the dump file is generated in the POD, because the dump file directory is not mapped to the outside of the container during runtime, it is impossible to check whether the dump file was generated when it was restarted last time. dump file, so 2 5 may cause k8s to restart the service, but k8s provides commands to check the reason for the last launch of POD. The specific commands are as follows:

NAMESPACE=prod
SERVER=dts
POD_ID=$(kubectl get pods -n ${NAMESPACE} |grep ${SERVER}|awk '{print $1}')
kubectl describe pod $POD_ID -n ${NAMESPACE}

The command execution result shows that the POD is automatically killed and restarted by the kubelet component because the memory usage exceeds the limit (if the reason is empty or unknown, it may be the above reason 2 or the memory and CPU are not limited but the POD is killed by the OS in extreme cases. At this time, you can view /var/log/message to further analyze the reason).CICD sets the maximum memory for each business POD to 2G by default when creating a business, but in the run script of the basic image, the maximum and minimum JVM are set to 2G :

exec java -Xmx2g -Xms2g -jar  ${WORK_DIR}/*.jar

Application-side analysis: Analyze the running status of JVM

In analyzing the environment in which the application is running, we further analyze the state of the JVM itself used by the application, first look at the JVM memory usage command: jmap -heap {PID} Why does K8S kill my app
JVM requested memory: (eden)675.5+(from)3.5+( to)3.5+(OldGeneration)1365.5=2048mb Theoretically, the JVM will OOMKill as soon as it starts, but the fact is that the business is only killed after running for a period of time. Although the JVM declares that it requires 2G of memory, it does not consume 2G of memory immediately. Use the top command to view: Why does K8S kill my appPS: The memory seen by the top and free commands in docker is the host machine. Depending on the memory size and usage inside the container, you can use the following commands:

cat /sys/fs/cgroup/memory/memory.limit_in_bytes

When equipped with -Xmx2g -Xms2g, the virtual machine applies for 2G memory, but the submitted page will not consume any physical storage before the first access. The actual memory used by the business process at that time is 1.1g. As the business runs, after a certain period of time The memory used by the JVM will gradually increase until it reaches 2G and is killed. Recommended articles on memory management: Reserving and Committing Memory JvmMemoryUsage

Code level analysis: dissecting the root cause of the problem

Excuting an order:

jmap -dump:format=b,file=./dump.hprof [pid]

Imported JvisualVM analysis and found that there are a large number of Span objects that have not been recycled. The reason why they are not recycled is that they are referenced by item objects in the queue: Why does K8S kill my appexecution at intervals:

jmap -histo pid |grep Span

It is found that the number of span objects has been increasing. Span belongs to the object in the distributed call chain tracking system DTS that business engineering relies on. DTS is a transparent and non-intrusive basic system, and the business does not show the reference to the Span. In the design of DTS, Span is generated in the business thread, and then placed in the blocking queue, waiting for the serialized thread to be consumed asynchronously. The production and consumption code is as follows: Why does K8S kill my appFrom the above code, Span is continuously increasing, and it should be the consumption of the consumer thread itself. The speed is lower than the speed of the producer. The consumption logic executed by the consumer thread is sequential IO write disk. Calculated according to the IOPS of 30-40m of ECS ordinary disk, each Span can see through dump, the average size is 150 bytes, theoretically it can be written per second : 30 1024 1024/150=209715, so it should not be the consumption logic that causes the consumption rate to slow down, and there is a sleep(50) in the code, that is, up to 20 Spans can be written per second, and the business has a timing task running , Each time more Span objects will be generated, and if other business codes are running at this time, a large number of Spans will also be generated, which is far greater than the consumption speed, so there is a backlog of objects. Over time, memory consumption gradually increases Big, leading to OOMKill. Dump the thread stack of the business:

jstack  pid >stack.txt

But I found that there are two writing threads, one state is always waiting on condition, and the other dump is sleep multiple times: Why does K8S kill my appbut the code is through Executors.newSingleThreadExecutor(thf); how can there be two consumers in the single-threaded pool? ? Further look at the code record, it turns out that the logic of the sending backend is always integrated into the core code when it is modified once in November. This function was automatically assembled by external jar dependency injection in the previous version, so it will appear in the current version Two Sender objects, of which the automatically created Sender object is not referenced by the DTS system, and the queue in it is never empty, causing the consumer thread under it to always block, and the built-in Sender object appears because of Sleep (50) causing the consumption speed to decrease. Stacked up, it is impossible to clearly capture his running state during Dump, and it seems to be sleeping all the time. By observing the files written by the consumer thread serialization, it is found that the data has been written, indicating that the consumer thread is indeed running. Why does K8S kill my app
Submitted by code The record learned that the last version of the business will generate a large amount of Span in some situations. The consumption of Span is very fast, which will cause the CPU of this thread to soar more. In order to alleviate this situation, sleep is added, and it is actually found After the problem, the business code has been optimized. The DTS system does not need to be modified. DTS should discover problems and promote business repair and optimization. The modification of the basic system should be very cautious because the impact is very wide. In view of the problem that the maximum memory of the POD is equal to the maximum memory of the virtual machine, by modifying the CD code, 200M will be added to the memory size of the business configuration by default. Why is 200M not more? Because k8s will calculate the maximum memory of the currently running POD to evaluate How many PODs can the current node have? If it is configured to +500m or more, it will cause K8S to think that the node resources are insufficient and cause waste, but it cannot be too few or too few, because the application will rely on some third parties in addition to its own code. Shared libraries, etc., may also cause Pod to restart frequently.

to sum up

 上述问题的根因是人为降低了异步线程的消费速度,导致消息积压引起内存消耗持续增长导致OOM,但笔者更想强调的是,当我们把应用部署到K8S或者Docker时,**POD和Docker分配的内存需要比应用使用的最大内存适当大一些**,否则就会出现能正常启动运行,但跑着跑着就频繁重启的场景,如问题中的场景,POD指定里最大内存2G,理论上JVM启动如果立即使用里2G肯定立即OOM,开发或者运维能立即分析原因,代价会小很多,但是因为现代操作系统内存管理都是VMM(虚拟内存管理)机制,当JVM参数配置为: -Xmx2g -Xms2g时,**虚拟机会申请2G内存,但提交的页面在首次访问之前不会消耗任何物理存储,**所以就出现理论上启动就该OOM的问题延迟到应用慢慢运行直到内存达到2G时被kill,导致定位分析成本非常高。另外,对于JVM dump这种对问题分析非常重要的日志,一定要映射存储到主机目录且保证不被覆盖,不然容器销毁时很难去找到这种日志。

More in-depth articles, follow: Binary Community

Guess you like

Origin blog.51cto.com/14957687/2543829