Remember once KUBERNETES / DOCKER network troubleshooting
Yesterday, Friday night, Temporary work, when we reported the problem to the user can not normally access the network at a relatively strange Kubernetes cluster, let us help look, we have been following up from around 17:30 until 10:00 pm so, users can not access the remote machine in case of remote control user can find the problem. This problem is rather interesting, I personally feel that some of the methods used in the investigation of these commands and troubleshooting can share with you, so wrote this article.
Symptoms of the problem
Users in the micro letter said directly that they found to be restarted hundreds or even thousands of times in a pod under the Kuberbnetes, then open the pod investigation, found that the above services can be accessed and sometimes, sometimes not access, that is, there is a certain probability You can not visit, not knowing why. And not all pod problems, but only one or two pod-specific network access is a problem. Users say this pod running Java programs, in order to rule out problems is Java, with user docker exec -it
commands directly into the container of SimpleHttpServer start a Python test also found the same problem.
We probably know the user's cluster like this version, Kuberbnetes 1.7, the network using a flannel of gw mode, Docker version unknown, the operating system CentOS 7.4, running docker directly on a physical machine, the physical configuration is very high, 512GB memory, several CPU core, running on top of hundreds of Docker containers.
Troubleshoot problems
Issues preliminary investigation
First of all, we ruled out the issue flannel, because the entire cluster network communications are normal, only one or two pod-specific problem. And with telnet ip port
there when the command manual testing network connectivity great probability of connection refused
error, the probability of about 1/4, and 3/4 of the case is properly connected.
At that time, we allow users to grab a bag to see, then, users have caught the problem of TCP connection is received SYN
, immediately returned RST, ACK
I asked about the location where the two IP users, know, 10.233.14.129
Shi docker0
, 10.233.14.145
is the IP within the container. So that's basically excludes all questions and kubernets or flannel, which is a problem on the local network of the Docker.
Such is the case of direct Reset in telnet
the displayed connection refused
error messages, for my personal experience, this SYN
complete direct return RST, ACK
conditions there are only three circumstances:
- TCP connection can not be established, the reason can not establish a connection is essentially a TCP connection identification of the five-tuple can not be completed, the vast majority of cases are not related to the server port number.
- TCP link built wrong, it may be because some modified TCP parameters, especially those parameters is off by default, because these parameters leads to incomplete TCP protocol.
- Iptables firewall settings, including
REJECT
rules.
Because when she was driving, when the red light, I felt a bit like NAT network server opens tcp_tw_recycle
and tcp_tw_reuse
the disease condition (see detail " those things the TCP (on) "), so let users view the on a TCP parameters, we found that users of a TCP parameters have not changed, all the default, so we ruled out the question TCP parameters.
Then, I do not think that will be set up iptables on the container, and if it is 100% of the issue, not when the good times and bad. So, I suspect that the port number on the container is not listening, but soon good, and this may be the problem applications. So I look over there to let users log application, and with kublet describe
a look at the situation run, and the iptables host of look.
However, we did not find any problems. At this time, we lost all investigative leads, the feeling can not go on ......
Re-sort
This time, at home, we finished eating, and the user through a phone call, all the details of the re-carding again, this time to provide users with a more critical information - " capture this thing in the docker0
can on caught, however, to grasp the container vessel to return RST, ACK
"! However, according to my knowledge, I know docker0
and the container veth
on the card, there is no longer among the network devices (see " Docker basic technologies: LINUX NAMESPACE (at) ")!
So we put this thing into just the last case - IP address conflict!
Look at Linux IP address conflict is not a simple matter, but there is no way to install some other commands in the user's production environment, we can only use an existing command, this time, we have found on the user's machine arping
and then we use this command to detect there is no conflict of IP addresses. Use the following command:
1
2
|
$ arping -D -I docker0 -c 2 10.233.14.145
$
echo
$?
|
According to the document, -D
the parameter is an IP address conflict detection mode, if the retirement status of the command is 0
then there is conflict. The results returned 1
. Moreover, we use arping
IP when not found a different mac address. This time, it seems that the problem of clues and broken .
Because the customer is still there to deal with some other things, so we work in intermittent cases, but also require the user to complete some work, so little progress is slow, but give us some time to think.
Vista
Now we know, the possibility of conflict of IP is very large, but we are not to be found and whose IP conflict. And we know that if this restart the machine, the problem must be solved out, but we feel that this is not the way to solve the problem, because you can restart the machine fell to solve this problem temporarily, but if we do not know how this problem happen, then the next question will come again. And restart the machine online this cost is too high.
So, our curiosity we continue to investigate. I let users kubectl delete
which two problematic pod, because the already services continue to reboot, thus, there is no problem to delete. Deleted after two pod (one IP to 10.233.14.145
another is 10.233.14.137
), we found that, kubernetes on other machines to restart a new instance of the two services. However, on the question of the machine, the two can ping the IP address actually get through .
Well, the problem IP address conflict can confirm. Since 10.233.14.xxx
this segment is docker, so this must be the IP address on this machine. So, we would like to see on the veth IP card in all the network namespace.
On this matter, we took a point in time, because the relevant commands are also very familiar with, so I spent some time Google, and look at the relevant man.
- First, we went to
/var/run/netns
the directory to view the system's network namespace, found nothing. - Then, we went to
/var/run/docker/netns
the next catalog Docker's namespace, found better. - Thus, we see the network namespace Docker's IP address with a way to specify the location of
To use this nsenter
command, this command can be entered in the namespace to execute commands. such as
1
|
$ nsenter --net=
/var/run/docker/netns/421bdb2accf1
ifconfig
-a
|
The above command, to var/run/docker/netns/421bdb2accf1
the network namespace in the implementation of the ifconfig -a
command. So we can use the following command to traverse all network namespace.
1
|
$
ls
/var/run/docker/netns
|
xargs
-I {} nsenter --net=
/var/run/docker/netns/
{} ip addr
|
Then, we found a relatively strange thing.
10.233.14.145
We found this IP, description, under the docker's namespace as well as the IP.10.233.14.137
This IP is not found in the network namespace docker.
There namespace leaking? So I checked the Internet and found a bug a docker's - in the docker remove / stop a container when there is no clear the appropriate network namespace, this problem has been reported to Issue # 31597 then fix in the PR # 31996 , and Merge Docker to the version of 17.05 in. The user version is 17.09, should include this fix. It should not be a problem, feel and go elsewhere.
However, 10.233.14.137
this IP ping can get through to prove that the IP tied to a certain network card, and be hidden under a network namespace.
Here, to see all network namespace, only the last one way, and that is to /proc/
the directory, put in all the pid /proc/<pid>/ns
directory to be exhaustive out. Fortunately, there is a more convenient command can do this thing: lsns
So I wrote the following commands:
1
|
$ lsns -t net |
awk
‘{print $4}' |
xargs
-t -I {} nsenter -t {} -n ip addr |
grep
-C 4
"10.233.14.137"
|
explain.
lsns -t net
Lists all open network namespace of the process, its PID is the process 4- Opened to all network namespace of the process PID out, transferred
xargs
command - By the
xargs
command in turn passed these PIDnsenter
command,xargs -t
It will mean the implementation of the relevant command to break out, so I know the PID.xargs -I {}
It is to declare a placeholder to replace the relevant PID
Finally, we found that, although /var/run/docker/netns
not found under 10.233.14.137
, but lsns
found the three processes, they have used 10.233.14.137
this IP (conflict so much), and their MAC address all the same! (No wonder arping not found). By ps
command, the process can be found in these three, two are of java, and one /pause
(this should be kubernetes sandbox).
We continue to win in hot pursuit, with the pstree
command to break out of the entire process tree. We found that the process of the parent process three more are in the same call docker-contiane
in the process!
This is obviously still docker, but in docker ps
the road but could not find the appropriate container, what the hell! Quick collapse ......
Continue to look at the process tree, found that the docker-contiane
parent process is not dockerd
below, but in systemd
this super parent process PID 1, I rely on! And then we found a pile of such a wild process (this field process or zombie process is harmful to the system, or at least make the system into sub-health state, because they are still occupied resources).
docker-contiane
It should be dockerd
the child process is linked to pid 1
only one reason, and that is the parent process "fly" out, only to find pid 1 when the adoptive father. This shows that there has been serious on this machine dockerd
problem withdraw from the process, and unconventional, because systemd
the reason to become pid 1, it is to the regulatory children and grandchildren of all processes, but still did not manage, indicating a very issue regulations. (Note, about systemd, see " Linux PID 1 and Systemd ", something about the father and son process, see the "Unix Advanced Programming Environment" a book)
Next step is to look systemd
for the dockerd
record of the log ...... (however log only three days, and three days dockerd
without any exception)
to sum up
Through this survey, we can sum up,
1) For the questionnaire, a relatively solid foundation of knowledge, know the causes and extent of the problem.
2) If you go elsewhere, to re-comb, looking back a closer look at some of the clues, carefully weighing every detail.
3) a variety of diagnostic tools to be more familiar with, it makes you do more with less.
4) do system maintenance and cleaning more similar, often you need to see if there are some things zombie process or some garbage systems, these things should be promptly cleared away.
Finally, to say look, a lot of people say, Docker fit within the physical machine to run on, this is not exactly right, because they only take into account the cost of performance, without taking into account the cost of operation and maintenance, start at a few hundred containers in such 512GB the games are played, it is not good, because this is a big single essence, because you have a reason to restart some of the key process or machine, your face is a huge influence .
-------- --------- update 2018/12/10
problem causes
Two days in their own environment, test it, found that as long as the through systemctl start/stop docker
to start and stop commands such Docker, is to get rid of all of all processes and resources. This is no problem. The only problem I can reproduce the user's operation is a direct kill -9 <dockerd pid>
but users should not do this thing. And if there Docker crash event, Systemd that can journalctl -u docker
view the system log of such a command.
So, I look for the user to find out their problems in Docker start and stop, and users say their execution systemctl stop docker
time of this command, this command does not respond to the discovery, it is possible to press the Ctrl +C
up !
This should be the cause of a large number of docker-containe
processes linked to PID 1
the cause of the next. As mentioned above, running hundreds of containers on a single physical machine users, so that process trees are very large, I think, stop taking time, the system must be to traverse all the docker to a child process exits a hair signal, this process can be very long. The operator commands that lead to feign death, but the press Ctrl + C
, leading to a lot of container and the process is not terminated ......
Other matters
Some students asked why I write in this article is docker-containe
not the containd
process? This is because pstree
to cut off, with ps
can see the whole command, just the name of a process name docker-
prefix.
Here is the difference between the two processes of different tree installation package (which sleep
I use buybox
Mirror activation)
1
2
3
4
5
6
|
systemd───dockerd─┬─docker-contained─┬─3*[docker-contained-shim─┬─
sleep
]
│ │ └─9*[{docker-containe}]]
│ ├─docker-contained-shim─┬─
sleep
│ │ └─10*[{docker-containe}]
│ └─14*[{docker-contained-shim}]
└─17*[{dockerd}]
|
1
2
3
4
5
6
|
systemd───dockerd─┬─containerd─┬─3*[containerd-shim─┬─
sleep
]
│ │ └─9*[{containerd-shim}]
│ ├─2*[containerd-shim─┬─
sleep
]
│ │ └─9*[{containerd-shim}]]
│ └─11*[{containerd}]
└─10*[{dockerd}]
|
By the way, since Docker 1.11 version, Docker process group like this model will change the top.
dockerd
Is Docker Engine daemons, direct-to-user operation.dockerd
It will start when you startcontainerd
a child process, before they communicate via RPC.containerd
It isdockerd
andrunc
an intermediate component between the exchanges. He anddockerd
decoupling is to make Docker become more neutral, and supports the standard OCI.containerd-shim
Is used to actually run the container, each container will start a new shim from a process, it is mainly specified by three parameters: container id, boundle directory (containerd corresponds to a container generated directory, usually located at:/var/run/docker/libcontainerd/containerID
) , and run the command (by defaultrunc
) to create a container.docker-proxy
You may also see this process in the new version of the Docker, this process is the user-level routing proxy. As long as you useps -elf
this command to break out of its command line, you can see it is to do port mapping. If you do not want this proxy, you candockerd
add the startup command line arguments:--userland-proxy=false
this parameter.
For more details, you can own Google. It is recommended that two articles: