K8s Pod closes gracefully, it's not as easy as you think!

When the deployment service is updated, the old Pod will be terminated and the new Pod will take over . If the old Pod has a long operation during this deployment process , we want to kill the pod (graceful shutdown) after the operation is successfully completed. If this cannot be done, the killed pod may lose a certain amount of traffic. Or the outside world cannot perceive that the Pod is killed. In particular, if we have an API that receives a lot of traffic, the error rate increases significantly during deployment.

In fact, this is quite simple, just add an elegant shutdown. I wrote the best practice of graceful shutdown before, the elegant and lossless switching practice of K8S Pod traffic , and later found that it is not elegant enough......

When Kubernetes kills a pod, the following 5 steps occur:

1. The pod switches to the terminated state and stops receiving any new traffic, the container is still running inside the pod.

2. A preStop hook is a special command or HTTP request that is executed and sent to the container inside the pod.

3. The SIGTERM signal is sent to the pod and the container realizes that it will shut down soon.

4. Kubernetes waits for the grace period (terminationGracePeriodSeconds). This wait is executed in parallel with the preStop hook and the SIGTERM signal (default 30 seconds). Therefore, Kubernetes does not wait for these to complete. If this period is over, go directly to the next step. It is important to set the value of the grace period correctly.

5. Send a SIGKILL signal to the pod, and then remove the pod. If the container is still running after the grace period, the Pod is forcibly removed by SIGKILL and the termination is complete.

To sum up, it is roughly divided into two steps. The first step is to define preStop. Generally, it can sleep for 30s to process residual traffic. The second step is to send a SIGTERM signal. After the service receives the signal, the service is processed. For example: closing the connection, notifying the third-party registry service to close...

Some students have questions, since the pod has been terminated and the K8s network endpoint has also been removed, why is there still traffic coming in?

Because the removal of this network interface is asynchronous, this is why the preStop is executed first, and then the SIGTERM signal is sent.

This basically guarantees that the traffic is lost, but the premise of this is that the service can receive the SIGTERM signal.

Ideally, a container has only one process, but it is difficult to achieve in real scenarios. For example, I will use a shell script to manage and start the Java process. In addition to the main process of the shell script, I also need to run monitoring and log collection. Wait for the child process, so that there are multiple processes running in a container.

The bottom layer of the system will send the SIGTERM signal to the main process by default, and send the SIGKILL signal to the remaining child processes. The reason why the system does this is because everyone does not capture and transmit signals when designing the main process script, which will cause multiple child processes to be unable to be terminated normally when the container is closed, so the system uses SIGKILL, an unmaskable signal, It is to be able to shut down all processes in the container without any preconditions.

Specifically, it can be used strace -p pidto track service calls.

That is to say, if the main process itself is not the service itself, it may be forced to kill, and the solution is very simple, that is, forward the received signal in the main process and send it to other child processes in the container , so that all processes in the container will receive SIGTERM instead of SIGKILL signal when they stop.

How to achieve it? For example, the following trap signal is an implementation method. Here is a best practice http://veithen.io/2014/11/16/sigterm-propagation.html.

#startup.sh
...
trap 'kill -TERM $child' TERM
nohup java $JAVA_OPTS -jar ./xxx.jar --server.port=8080 &

child=$!
wait $child
wait $child

当然很多成熟的框架都实现了优雅关闭功能,比如spring的CustomHealthCheck类扩展了AbstractHealthIndicator类,并允许我们通过覆盖doHealthCheck()方法来构建自定义健康检查结构。根据我们从HealthService收到的标志,我们将系统的健康状态设置为up或down。

这样的话,我们可以通过preStop调用该接口实现另外一种方式的优雅关闭。


 lifecycle:
        preStop:
          httpGet:
            path: /unhealthy
            port: http

最后服务端收到优雅关闭信号后可以进行一些善后处理工作。

这就是K8s,自身很简单,但是它的低层牵涉了Linux内核、进程、网络、存储等方方面面的知识,但并不会在Kubernetes的文档中交代清楚。可偏偏就是它们,才是容器技术的精髓所在。

推荐

A Big Picture of Kubernetes

Kubernetes入门培训(内含PPT)



原创不易,随手关注或者”在看“,诚挚感谢!

本文分享自微信公众号 - 云原生技术爱好者社区(programmer_java)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324124787&siteId=291194637