Kubernetes operation and maintenance of the micro-pit mining production sharing service

Production experience

1、限制了容器资源,还经常被杀死?
2、滚动更新之健康检查重要性
3、滚动更新之流量的丢失

Let me talk about the first question, limiting container resources, but also often the reason to kill?
That java application deployment, and soon restart, restart is in fact rebuilt, which means that your pod is unhealthy, then re k8s help you to pull up, so we should look for problems to the investigation, saying that white is actually being killed, you can describe to look at events, usually can see that because of health checks do not pass, then pulled again, because it is java application, because the heap overflow and then kill off, will appear in the log kill the last line of a field, look at the reasons for its restart, before I encountered is that it is not locked heap memory, which itself has jvm, jvm main memory do the exchange of data it is a heap memory is mainly reflected in the performance of the design, so he could easily exceed the heap memory, then beyond it, is likely to be killed k8s out why k8s will kill it, because it exceeds the limit, default container will use all resources to host, if not resource constraints will affect the entire host, and the whole host of resources will be insufficient to achieve drift, it will be transferred to other On the machine, then abnormal, it may play an avalanche effect related, so in general we are to do resource constraints, do this limitation, you can not limit the java application? It could not be other restrictions, although k8s deploy applications is still very convenient, but the deployment of java applications or incompatible,
such as it does not recognize the limitations of the current java container, we just can not recognize the specified limit, that is, you go in yaml restrictions heap memory container is not limited, it will exceed this limit, if the limit is beyond the limits, k8s will kill it, k8s itself it has this strategy, if the limit is exceeded this capacity, it will help you kill, and then help you pull up.

Faced with this java heap memory a little bit of traffic bursts, it is possible the utilization of the information on the table, so this range is quite large, this will lead k8s kill, and then pulled up the cycle, may one day several hundredth of this effect.

For that matter, how to solve resource constraints docker, in fact, this resource limitations or use of this docker to do, but k8s he converted, but k8s call interface to do some limitations, in fact, how to make docker to identify restrictions java heap memory, to solve this problem.
To solve this problem there are two options, which is then configured to use java heap memory for this
configuration java heap memory usage
Java -Xmx maximum heap memory usage, is a -Xms is the initial heap memory usage
is generally set a maximum heap memory usage, if you do not set beyond this setting will continue to use a host of memory, resulting in physical memory is not enough, the emergence of the phenomenon of heap memory overflow, this is very common, we will use this java -Xmx is used when fast when full, it will have a garbage collection, and then use the cycle, this will ensure stable operation of the application of this java, all we yaml to configure resource limit is definitely not enough, we have to set the heap memory for the java, we can not go hand in Dockerfile write this, and generally dockerfile to pass this value, set a variable in the file yaml

env:
  - name: JAVA_OPTS
   value: “-Xmx1g “

Here are some limitations of this variable container resources of our previous configuration will be passed pod will be inside this building is a mirror of container, container CMD command is played down the incoming $ JAVA_OPTS variable, and it will call us system variables of the system, the system variables have been giving it a value, so it is safe to drink this variable, and to play this application, thereby setting the heap size, the recommended value should be smaller than limlts, smaller 10% , because beyond this limit will kill limits, and then pull up.
General set up, rebuild what the mirror, look into the container to the process you can see, the other is so set

The second issue, the importance of health checks rollover
rollover is the default strategy k8s, usually after we deploy to k8s first to use when you configure the health check, rollover will be based on the state probe to judge is not allowed to continue to update the access flow, that is, your current application is not to provide services, so you scroll updating process will ensure that you are not available nodes to ensure a smooth upgrade, so this a rolling update is set at the beginning, it is important to check the health status of
health checks start rolling updates what role it?

The inclusion of a copy of the offer after starting the business take a minute to provide services, such as java start more slowly, if there is no health checks, to make sure he is not ready, you think it's ready for immediate start up during this period, a it is unable to provide services within minutes, so the new traffic is certainly not handle, this is the first case, the second case, due to human configuration errors, such as database connection is not on, or not connected elsewhere, or where the profile was wrong, it triggers a rollover, then this pod of it all rolling update completed, the result is a problem, this is the case, a new copy regarded replace the old copy, so after that, the production environment, the consequences are serious, many services are not available, so the configuration update when scrolling, health checks must be accompanied by, and that coupled with the health check, the new copy will be forwarded after the inspection is completed the new flow, if it does not pass it will not replace all of that is that it will not continue to update, Because it is limited available node, if available nodes reach this number, it will not continue to be updated health checks, there are two types, readinessprobe is ready to check, check and there are also two different ways, such as http, sounding a url, as well as tcp socket, sniffer ports, as well as a shell command execution, execute a shell command to determine a return value, so the provider three kinds of health check method, readinessprobe readiness check is your Pod check fails, if it is http, you can page through a probe to determine the return status code to detect if the local port does not make sense, then, it will not let it join back service, because service as you access to the whole of unified entrance Well, If it is not passed, the new traffic will not be forwarded to it, that's readiness check, will not give you a new forwarding traffic if the health status does not pass, the other is initialDelaySeconds: 60, is to check 60 seconds, because the general java application start is about a minute and a periodSeconds: 10, also Is not done once by 10 seconds, and then is livenessProbe: Survival checks.

That is, the check fails it will kill container, according to reboot your strategy, generally rebuild, give you a new pull a container, then determine there is no ready way to judge is also based on the detection of these ports, or you can use special the other two methods, http, exec, it is generally to be configured on the two, it is not ready to check the allocation of new traffic to your survival is to help you re-check the pull.

The last question rollover flow of the missing
general is a connection rejection response error, can not call
the general rollover is close existing pod, renewed a new pod, close the existing fact that deleted a pod, then apiserver You will be notified to kubelet, then kubelet will close the container, and then remove from the back-end service, do not distribute the new flow, and then removed off, and then tell kube-proxy, you can handle the new forwarding rule, and then scheduled on the node, in fact, this is a pod downline cycle
process an additional forwarded, forwarded the new pod time period, it is the interval, there will be a waiting time after closing the pod, at this time, it might also access a number of new traffic, but its service is no longer processing new requests, so can cause connection refused, how to solve this problem, in fact, readiness probes throughout the process and play a key role, once received delete endpoint events pod, which would not have been related to the readiness of the probe results

How to ensure that it is elegant to deal with it?
In fact, the need to add sleep when you close the pod this time, in fact, can solve this problem, in the shutdown and startup hook is have a presence, all you can before closing the container, the implementation of this hook, hook the definition of a shell , y can also define a http request, also was a supporter of two types, that is, the same level in the container, env here
sleep five seconds is closed containers that you will not quit right away, then sleep for five seconds to go closed applications, which can be enough for five seconds kube-proxy refresh this rule, so, new entrants will not be forwarded to the traffic just closed pod, this hook will be able to increase the suspension time you close the pod, so that kube -proxy increase the time to refresh the rules,
add

lifecycle :
preStop :
     exec :
           command :
             - sh
             - -c
             - “sleep 5”

In this way, you do not need to modify the code for your application, so, rollover will not be forwarded on coming off the pod, so can solve the related problem.

Guess you like

Origin blog.51cto.com/14143894/2438516
Recommended