[Posts] recordof KUBERNETES / DOCKER network troubleshooting

Remember once KUBERNETES / DOCKER network troubleshooting

Yesterday, Friday night, Temporary work, when we reported the problem to the user can not normally access the network at a relatively strange Kubernetes cluster, let us help look, we have been following up from around 17:30 until 10:00 pm so, users can not access the remote machine in case of remote control user can find the problem. This problem is rather interesting, I personally feel that some of the methods used in the investigation of these commands and troubleshooting can share with you, so wrote this article.

Symptoms of the problem

Users in the micro letter said directly that they found to be restarted hundreds or even thousands of times in a pod under the Kuberbnetes, then open the pod investigation, found that the above services can be accessed and sometimes, sometimes not access, that is, there is a certain probability You can not visit, not knowing why. And not all pod problems, but only one or two pod-specific network access is a problem. Users say this pod running Java programs, in order to rule out problems is Java, with user  docker exec -it commands directly into the container of SimpleHttpServer start a Python test also found the same problem.

We probably know the user's cluster like this version, Kuberbnetes 1.7, the network using a flannel of gw mode, Docker version unknown, the operating system CentOS 7.4, running docker directly on a physical machine, the physical configuration is very high, 512GB memory, several CPU core, running on top of hundreds of Docker containers.

 

Troubleshoot problems

Issues preliminary investigation

First of all, we ruled out the issue flannel, because the entire cluster network communications are normal, only one or two pod-specific problem. And with  telnet ip port there when the command manual testing network connectivity great probability of  connection refused error, the probability of about 1/4, and 3/4 of the case is properly connected.

At that time, we allow users to grab a bag to see, then, users have caught the problem of TCP connection is received  SYN , immediately returned RST, ACK

I asked about the location where the two IP users, know, 10.233.14.129 Shi  docker0, 10.233.14.145 is the IP within the container. So that's basically excludes all questions and kubernets or flannel, which is a problem on the local network of the Docker.

Such is the case of direct Reset in  telnet the displayed  connection refused error messages, for my personal experience, this  SYNcomplete direct return  RST, ACKconditions there are only three circumstances:

  1.  TCP connection can not be established, the reason can not establish a connection is essentially a TCP connection identification of the five-tuple can not be completed, the vast majority of cases are not related to the server port number.
  2. TCP link built wrong, it may be because some modified TCP parameters, especially those parameters is off by default, because these parameters leads to incomplete TCP protocol.
  3. Iptables firewall settings, including  REJECT rules.

Because when she was driving, when the red light, I felt a bit like NAT network server opens  tcp_tw_recycle and  tcp_tw_reuse the disease condition (see detail " those things the TCP (on) "), so let users view the on a TCP parameters, we found that users of a TCP parameters have not changed, all the default, so we ruled out the question TCP parameters.

Then, I do not think that will be set up iptables on the container, and if it is 100% of the issue, not when the good times and bad. So, I suspect that the port number on the container is not listening, but soon good, and this may be the problem applications. So I look over there to let users log application, and with  kublet describea look at the situation run, and the iptables host of look.

However, we did not find any problems. At this time, we lost all investigative leads, the feeling can not go on ......

Re-sort

This time, at home, we finished eating, and the user through a phone call, all the details of the re-carding again, this time to provide users with a more critical information - " capture this thing in the  docker0 can on caught, however, to grasp the container vessel to return RST, ACK  "! However, according to my knowledge, I know  docker0 and the container  veth on the card, there is no longer among the network devices (see " Docker basic technologies: LINUX NAMESPACE (at) ")!

So we put this thing into just the last case - IP address conflict!

Look at Linux IP address conflict is not a simple matter, but there is no way to install some other commands in the user's production environment, we can only use an existing command, this time, we have found on the user's machine  arping and then we use this command to detect there is no conflict of IP addresses. Use the following command:

1
2
$ arping -D -I docker0 -c 2 10.233.14.145
$ echo $?

According to the document, -D the parameter is an IP address conflict detection mode, if the retirement status of the command is  0 then there is conflict. The results returned  1 . Moreover, we use  arping IP when not found a different mac address. This time, it seems that the problem of clues and broken .

Because the customer is still there to deal with some other things, so we work in intermittent cases, but also require the user to complete some work, so little progress is slow, but give us some time to think.

Vista

Now we know, the possibility of conflict of IP is very large, but we are not to be found and whose IP conflict. And we know that if this restart the machine, the problem must be solved out, but we feel that this is not the way to solve the problem, because you can restart the machine fell to solve this problem temporarily, but if we do not know how this problem happen, then the next question will come again. And restart the machine online this cost is too high.

So, our curiosity we continue to investigate. I let users  kubectl delete which two problematic pod, because the already services continue to reboot, thus, there is no problem to delete. Deleted after two pod (one IP to  10.233.14.145 another is  10.233.14.137), we found that, kubernetes on other machines to restart a new instance of the two services. However, on the question of the machine, the two can ping the IP address actually get through .

Well, the problem IP address conflict can confirm. Since 10.233.14.xxx this segment is docker, so this must be the IP address on this machine. So, we would like to see on the veth IP card in all the network namespace.

On this matter, we took a point in time, because the relevant commands are also very familiar with, so I spent some time Google, and look at the relevant man.

  • First, we went to  /var/run/netnsthe directory to view the system's network namespace, found nothing.
  • Then, we went to  /var/run/docker/netns the next catalog Docker's namespace, found better.
  • Thus, we see the network namespace Docker's IP address with a way to specify the location of

To use this  nsenter command, this command can be entered in the namespace to execute commands. such as

1
$ nsenter --net= /var/run/docker/netns/421bdb2accf1 ifconfig -a

The above command, to  var/run/docker/netns/421bdb2accf1 the network namespace in the implementation of the  ifconfig -a command. So we can use the following command to traverse all network namespace.

1
$ ls /var/run/docker/netns | xargs -I {} nsenter --net= /var/run/docker/netns/ {} ip addr

Then, we found a relatively strange thing.

  • 10.233.14.145 We found this IP, description, under the docker's namespace as well as the IP.
  • 10.233.14.137This IP is not found in the network namespace docker.

There namespace leaking? So I checked the Internet and found a bug a docker's - in the docker remove / stop a container when there is no clear the appropriate network namespace, this problem has been reported to  Issue # 31597  then fix in the  PR # 31996 , and Merge Docker to the version of 17.05 in. The user version is 17.09, should include this fix. It should not be a problem, feel and go elsewhere.

However,  10.233.14.137 this IP ping can get through to prove that the IP tied to a certain network card, and be hidden under a network namespace.

Here, to see all network namespace, only the last one way, and that is to  /proc/ the directory, put in all the pid  /proc/<pid>/ns directory to be exhaustive out. Fortunately, there is a more convenient command can do this thing: lsns

So I wrote the following commands:

1
$ lsns -t net | awk ‘{print $4}' | xargs -t -I {} nsenter -t {}&nbsp;-n ip addr | grep -C 4 "10.233.14.137"

explain.

  • lsns -t net Lists all open network namespace of the process, its PID is the process 4
  • Opened to all network namespace of the process PID out, transferred  xargs command
  • By the  xargs command in turn passed these PID  nsenter command,
    • xargs -t It will mean the implementation of the relevant command to break out, so I know the PID.
    • xargs -I {}  It is to declare a placeholder to replace the relevant PID

Finally, we found that, although  /var/run/docker/netns not found under  10.233.14.137 , but  lsns found the three processes, they have used 10.233.14.137 this IP (conflict so much), and their MAC address all the same! (No wonder arping not found). By ps command, the process can be found in these three, two are of java, and one /pause (this should be kubernetes sandbox).

We continue to win in hot pursuit, with the pstreecommand to break out of the entire process tree. We found that the process of the parent process three more are in the same call  docker-contiane in the process!

This is obviously still docker, but in docker ps the road but could not find the appropriate container, what the hell! Quick collapse ......

Continue to look at the process tree, found that the  docker-contiane parent process is not  dockerd below, but in  systemd this super parent process PID 1, I rely on! And then we found a pile of such a wild process (this field process or zombie process is harmful to the system, or at least make the system into sub-health state, because they are still occupied resources).

docker-contiane It should be  dockerd the child process is linked to  pid 1 only one reason, and that is the parent process "fly" out, only to find pid 1 when the adoptive father. This shows that there has been serious on this machine  dockerd problem withdraw from the process, and unconventional, because  systemd the reason to become pid 1, it is to the regulatory children and grandchildren of all processes, but still did not manage, indicating a very issue regulations. (Note, about systemd, see " Linux PID 1 and Systemd  ", something about the father and son process, see the "Unix Advanced Programming Environment" a book)

Next step is to look  systemd for the  dockerd record of the log ...... (however log only three days, and three days dockerdwithout any exception)

to sum up

Through this survey, we can sum up,

1) For the questionnaire, a relatively solid foundation of knowledge, know the causes and extent of the problem.

2) If you go elsewhere, to re-comb, looking back a closer look at some of the clues, carefully weighing every detail.

3) a variety of diagnostic tools to be more familiar with, it makes you do more with less.

4) do system maintenance and cleaning more similar, often you need to see if there are some things zombie process or some garbage systems, these things should be promptly cleared away.

Finally, to say look, a lot of people say, Docker fit within the physical machine to run on, this is not exactly right, because they only take into account the cost of performance, without taking into account the cost of operation and maintenance, start at a few hundred containers in such 512GB the games are played, it is not good, because this is a big single essence, because you have a reason to restart some of the key process or machine, your face is a huge influence .

 

-------- --------- update 2018/12/10

problem causes

Two days in their own environment, test it, found that as long as the through  systemctl start/stop docker to start and stop commands such Docker, is to get rid of all of all processes and resources. This is no problem. The only problem I can reproduce the user's operation is a direct  kill -9 <dockerd pid> but users should not do this thing. And if there Docker crash event, Systemd that can  journalctl -u docker view the system log of such a command.

So, I look for the user to find out their problems in Docker start and stop, and users say their execution  systemctl stop docker time of this command, this command does not respond to the discovery, it is possible to press the  Ctrl +Cup !

This should be the cause of a large number of  docker-containe processes linked to  PID 1 the cause of the next. As mentioned above, running hundreds of containers on a single physical machine users, so that process trees are very large, I think, stop taking time, the system must be to traverse all the docker to a child process exits a hair signal, this process can be very long. The operator commands that lead to feign death, but the press  Ctrl + C , leading to a lot of container and the process is not terminated ......

 

Other matters

Some students asked why I write in this article is  docker-containe not the  containd process? This is because  pstree to cut off, with  ps can see the whole command, just the name of a process name  docker-prefix.

Here is the difference between the two processes of different tree installation package (which  sleep I use  buybox Mirror activation)

Installation package CENTOS
1
2
3
4
5
6
systemd───dockerd─┬─docker-contained─┬─3*[docker-contained-shim─┬─ sleep ]
                   │                 │                    └─9*[{docker-containe}]]
                   │                 ├─docker-contained-shim─┬─ sleep
                   │                 │                 └─10*[{docker-containe}]
                   │                 └─14*[{docker-contained-shim}]
                   └─17*[{dockerd}]
DOCKER official installation package
1
2
3
4
5
6
systemd───dockerd─┬─containerd─┬─3*[containerd-shim─┬─ sleep ]
                   │            │                 └─9*[{containerd-shim}]
                   │            ├─2*[containerd-shim─┬─ sleep ]
                   │            │                    └─9*[{containerd-shim}]]
                   │            └─11*[{containerd}]
                   └─10*[{dockerd}]

By the way, since Docker 1.11 version, Docker process group like this model will change the top.

  • dockerd Is Docker Engine daemons, direct-to-user operation. dockerd It will start when you start  containerd a child process, before they communicate via RPC.
  • containerd It is dockerdand runcan intermediate component between the exchanges. He and  dockerd decoupling is to make Docker become more neutral, and supports the standard OCI.
  • containerd-shim  Is used to actually run the container, each container will start a new shim from a process, it is mainly specified by three parameters: container id, boundle directory (containerd corresponds to a container generated directory, usually located at: /var/run/docker/libcontainerd/containerID) , and run the command (by default  runc) to create a container.
  • docker-proxy You may also see this process in the new version of the Docker, this process is the user-level routing proxy. As long as you use  ps -elf this command to break out of its command line, you can see it is to do port mapping. If you do not want this proxy, you can  dockerd add the startup command line arguments:   --userland-proxy=false this parameter.

For more details, you can own Google. It is recommended that two articles:

Guess you like

Origin www.cnblogs.com/jinanxiaolaohu/p/11957682.html