Brief
After containerization, released at the time of application, a service restart, resulting in a large number of the caller to the service error until the service is restarted to complete. Content is being given RPC call fails, this is our RPC elegant closed, that is, in the process SIGTERM signal is received, we ShutdownHook mechanism by the JVM, registered unregister the hook RPC services, in the process of closing SIGTERM when the application itself will take the initiative to prevent the removal of a large number of the caller error from the registry. But why after containerized can cause this problem?
Troubleshooting
Application start normally
within view of the container process
# yum install psmisc
# pstree -p
bash(1)───java(22)─┬─{java}(23)
├─{java}(24)
├─{java}(25)
├─{java}(26)
├─{java}(27)
├─{java}(28)
├─{java}(29)
├─{java}(30)
...
# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 09:50 ? 00:00:00 /bin/bash run.sh start
root 22 1 15 09:50 ? 00:01:20 /app/3rd/jdk/default/bin/java -Xmx512m -Xms512m ...
root 49 0 0 09:51 pts/0 00:00:00 bash
root 263 49 0 09:59 pts/0 00:00:00 ps -ef
In a container normally kill 22 sub-processes, we can see our application shutdown hooks can properly handle the aftermath
However, in the actual production, we deploy rollover, be deleted pod by looking at the log and found that pod is terminate when the application process does not correctly handle the SIGTERM signal problems.
problem analysis
According to the survey of Kubernetes mechanism, as shown:
https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods
because our container through run.sh script starts, the front screenshot we can see, java process is the child process number 1 run.sh process, the corresponding Kubernetes principle, we can see No. 22 java process in the POD delete does not necessarily receive SIGTERM, so we are led to the shutdown hook does not take effect.
problem solved
既然已经定位问题,那么解决问题的方法就有了思路,run.sh执行java进程后,将进程上下文让给java进程,java进程接管,java进程变为容器内的1号进程。
我们参考了这篇文章受到启发
https://yeasy.gitbooks.io/docker_practice/content/image/dockerfile/entrypoint.html
在run.sh执行java前面增加exec命令即可
然后,重新build镜像,发布,然后重启,再查看重启前POD留下的日志
问题解决!