K8S+DevOps Architect Practical Course | Docker Network

Video source: Station B "Docker&k8s Tutorial Ceiling, Absolutely the best one taught by Station B, this set of learning k8s to get all the core knowledge of Docker is here"

Organize the teacher's course content and test notes while studying, and share them with everyone. Any infringement will be deleted. Thank you for your support!

Attach a summary post: K8S+DevOps Architect Practical Course | Summary


The docker container is an isolated virtual system, and the container can have its own independent network space.

  • How do multiple containers communicate with each other?
  • How is the communication between the container and the host machine achieved?
  • How is the port mapping implemented using the -p parameter?

With these questions in mind, let's learn about docker's network model. Finally, I will show you the conversion process of data packets between the container and the host by capturing packets.

network mode

When we use docker run to create a Docker container, we can use the --net option to specify the network mode of the container. Docker has the following four network modes:

  • bridge mode, use --net=bridge to specify, the default setting
  • Host mode, use --net=host to specify, the network space inside the container shares the space of the host machine, the effect is similar to starting a process directly on the host machine, and the port information is shared with the host machine
  • container mode, use --net=container:NAME_or_ID to specify that the specified container shares the network namespace with the specific container
  • none mode, use --net=none to specify that the network mode is empty, that is, only the network namespace is reserved, but no network-related configuration (network card, IP, routing, etc.)

bridge mode

Then we did not specify the network mode when we demonstrated the creation of the docker container before. If it is not specified, the bridge mode will be used by default. The original meaning of bridge is the meaning of bridge, which is actually the bridge mode.

Then how do we understand the bridge? If we need to make an analogy, we can regard the bridge as a layer 2 switch device. Let's take a look at this picture:

Switch communication diagram

Schematic diagram of bridge mode

In Linux, a network device that can function as a virtual switch is a bridge. It is a device working at the data link layer (DataLink), and its main function is to forward data packets to different ports of the bridge according to the MAC address . Where is the bridge, check the bridge

$ yum install -y bridge-utils
$ brctl show
bridge name    bridge id           STP enabled        interfaces
docker0        8000.0242b5fbe57b   no                 veth3a496ed

After having a bridge, let's see what docker does when starting a container to achieve interconnection between containers

When Docker creates a container, it will perform the following operations:

  • Create a pair of virtual interfaces/network cards, that is, veth pair;
  • One end of the local host is bridged to the default docker0 or the specified bridge, and has a unique name, such as veth9953b75;
  • One end of the container is placed inside the newly started container, and the name is changed to eth0. This network card/interface is only visible in the namespace of the container;
  • Obtain a free address from the available address segment of the bridge (that is, the network corresponding to the bridge) and assign it to eth0 of the container
  • Configure a default route to the bridge

The whole process is actually done by docker automatically for us, cleaning up all containers to verify.

## 清掉所有容器
$ docker rm -f $(docker ps -aq)
$ docker ps
$ brctl show # 查看网桥中的接口, 目前没有

## 创建测试容器test1
$ docker run -d --name test1 nginx:alpine
$ brctl show # 查看网桥中的接口, 已经把test1的veth端接入到网桥中
$ ip a | grep veth # 已在宿主机中可以查看到
$ docker exec -ti test1 sh
/# ifconfig  # 查看容器的eth0网卡及分配的容器ip

# 再来启动一个测试容器,测试容器间的通信
$ docker run -d --name test2 nginx:alpine
$ docker exec -ti test2 sh
/# sed -i 's/dl-cdn.alpinelinux.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apk/repositories
/# apk add curl
/# curl 172.17.0.8:80

##为啥可以通信?
/# route -n  #
Kernel IP routing table
Destination    Gateway      Genmask       Flags Metric Ref     Use Iface
0.0.0.0        172.17.0.1   0.0.0.0       UG    0      0         0 eth0
172.17.0.0     0.0.0.0      255.255.0.0   U     0      0         0 eth0

# eth0网卡是这个容器里的默认路由设备;所有对172.17.0.0/16网段的请求, 也会被交给eth0来处理(第二条172.17.0.0路由规则),这条路由规则的网关(Gateway)是0.0.0.0,这就意味着这是一条直连规则, 即:凡是匹配到这条规则的IP包, 应该经过本机的eth0网卡, 通过二层网络(数据链路层)直接发往目的主机。

# 而要通过二层网络到达test1容器, 就需要有172.17.0.8这个IP地址对应的MAC地址。所以test2容器的网络协议栈, 就需要通过eth0网卡发送一个ARP广播, 来通过IP地址查找对应的MAC地址。

#这个eth0网卡, 是一个Veth Pair,它的一端在这个test2容器的Network Namespace里,而另一端则位于宿主机上(Host Namespace), 并且被“插”在了宿主机的docker0网桥上。网桥设备的一个特点是插在桥上的网卡都会被当成桥上的一个端口来处理,而端口的唯一作用就是接收流入的数据包,然后把这些数据包的“生杀大权”(比如转发或者丢弃),全部交给对应的网桥设备处理。

# 因此ARP的广播请求也会由docker0来负责转发, 这样网桥就维护了一份端口与mac的信息表, 因此针对test2的eth0拿到mac地址后发出的各类请求, 同样走到docker0网桥中由网桥负责转发到对应的容器中。

# 网桥会维护一份mac映射表, 我们可以大概通过命令来看一下,
$ brctl showmacs docker0
## 这些mac地址是主机端的veth网卡对应的mac, 可以查看一下
$ ip a

How do we know how these virtual network cards on the bridge correspond to the container side?

Through ifindex, network card index number

## 查看test1容器的网卡索引
$ docker exec -ti test1 cat /sys/class/net/eth0/ifindex

## 主机中找到虚拟网卡后面这个@ifxx的值, 如果是同一个值, 说明这个虚拟网卡和这个容器的eth0网卡是配对的。
$ ip a | grep @if

Organize the script and quickly view the correspondence:

for container in $(docker ps-q); do
    iflink=`docker exec -it $container sh -c 'cat /sys/class/net/eth0/if1ink'`
    iflink=`echo $iflink|tr -d '\r'`
    veth=`grep -l $iflink /sys/class/net/veth*/ifindex`
    veth=`echo $veth|sed -e 's;^.*net/\(.*\)/ifindex$;\1;'`
    echo $container:$veth
done

We have explained the communication between containers above, so how is the communication between the container and the host machine done?

Add port mapping:

## 启动容器的时候通过-p参数添加宿主机端口与容器内部服务端口的映射
$ docker run --name test -d -p 8088:80 nginx:alpine
$ curl localhost:8088

How is port mapping implemented? First review the iptables linked list diagram

When accessing port 8088 of the machine, the data packets will enter the machine from the inflow direction, so the PREROUTING and INPUT chains are involved. We do port mapping between the host machine and the container, so port conversion will definitely be involved. Which one? The table is responsible for storing port translation information. It is the nat table, which is responsible for maintaining network address translation information. So let's look at the nat table of the PREROUTING chain:

$ iptables -t nat -nvL PREROUTING
chain PREROUTING (policy ACCEPT 159 packets, 20790 bytes)
 pkts bytes target     prot opt in     out    source        destination
    3   156 DOCKER     all  --  *      *      0.0.0.0/0     0.0.0.0/0      ADDRTYPE match dst-type LOCAL

The rule uses the addrtype extension of iptables to match the packets whose network type is local, how to determine which ones match local,

$ ip route show table local type local
127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
172.17.0.1 dev docker0 proto kernel scope host src 172.17.0.1
172.21.51.67 dev eth0 proto kernel scope host src 172.21.51.67

That is to say, if the target address type matches these, it will be forwarded to our TARGET. TARGET is an action, which means what kind of operation is performed on the data packet that meets the requirements. The most common one is ACCEPT or DROP. The TARGET here is DOCKER, it is obvious that DOCKER is not a standard action, so what is DOCKER? We usually define a custom chain, so that certain types of corresponding rules are placed in the custom chain, and then the custom chain is bound to the standard chain, so here DOCKER is a custom chain. Then let's take a look at the rules on the custom chain of DOCKER.

$ iptables -t nat -nvL DOCKER
Chain DOCKER (2 references)

 pkts bytes target     prot opt in     out    source        destination
 
    0     0 RETURN     all  --  docker0 *     0.0.0.0/0     0.0.0.0/0
    
    0     0 DNAT       tcp  --  !docker0 *     0.0.0.0/0     0.0.0.0/0   tcp dpt:8088 to:172.17.0.2:80

This rule is to perform DNAT conversion on the tcp traffic received by the host with the destination port 8088, and send the traffic to 172.17.0.2:80. Is the address of 172.17.0.2 the IP address of the Docker container we created above? The traffic goes to After the bridge is connected, the forwarding of the bridge will be ok later.

Therefore, the outside world only needs to visit 192.168.136.133:8088 to access the services in the container.

The data packet goes through the POSTROUTING chain in the export direction, let's check the rules:

$ iptables -t nat -nvL POSTROUTING
Chain POSTROUTING (policy ACCEPT 1099 packets, 67268 bytes)
 pkts bytes target     prot opt in     out    source        destination
   86  5438 MASQUERADE  all  --  *      !docker0  172.17.0.0/16  0.0.0.0/0
    0     0 MASQUERADE  tcp  --  *      *        172.17.0.4   172.17.0.4   tcp dpt:80

Please pay attention to what the action MASQUERADE means. It is actually a more flexible SNAT that converts the source address into the egress ip address of the host. Then explain the meaning of this rule:

This rule converts the source address of the packet whose source address is 172.17.0.0/16 (that is, the packet generated from the Docker container) and is not sent from the docker0 network card to the address of the host network card. The approximate process is that the ACK packet is sent out in the container, and will be routed to the bridge docker0, and the bridge will forward it to the host network card eth0 according to the routing rules of the host machine. At this time, the packet will be transferred from the docker0 network card to the eth0 network card, and Sending from the eth0 network card, this rule will take effect at this time, and the source address is replaced with the ip address of eth0.

Note that the process just now involves the transfer of packets between network cards, so the ip_forward forwarding service of the host must be enabled, otherwise the packets cannot be forwarded, and the service must not be accessible.

Packet Capture Demo

Let's think about which network card package we want to capture

  • First access the host's port 8088, let's grab the host's eth0
$ tcpdump -i eth0 port 8088 -w host.cap
  • Then the final packet will flow into the container, so let's grab the eth0 network card in the container
# 容器内安装一下tcpdump
$ sed -i 's/dl-cdn.alpinelinux.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apk/repositories
$ apk add tcpdump
$ tcpdump -i eth0 port 80 -w container.cap

Go to another machine to visit,

$ curl 172.21.51.67:8088/

Stop capturing packets and copy the packets in the container to the host

$ docker cp test:/root/container.cap /root/

Copy the captured content locally and use wireshark for analysis.

$ scp [email protected]:/root/*.cap /d/packages

(wireshark merge package for analysis)

The packets entering the container are DNAT, and the packets going out are SNAT. In this way, the outside world does not know who is providing the service inside the machine. In fact, it shares an external IP address with multiple machines on the internal network to access the Internet. The effect is the same, so this is also a common application scenario of the NAT function.

Host mode

No network space is created inside the container, and the network space of the host is shared. For example, create a mysql container directly through the host mode:

$ docker run --net host -d --name mysql -e MYSQL_ROOT_PASSWORD=123456 mysql:5.7

After the container is started, it will listen to port 3306 by default. Since the network mode is host, the service can be accessed directly through port 3306 of the host, which is equivalent to directly starting the mysqld process in the host.

Container mode

This mode specifies that a newly created container shares a Network Namespace with an existing container, rather than sharing it with the host. The newly created container will not create its own network card, configure its own IP, but share the IP, port range, etc. with a specified container. Similarly, in addition to the network aspects of the two containers, other things such as file systems and process lists are still isolated. The processes of the two containers can communicate through the lo network card device.

## 启动测试容器, 共享mysql的网络空间
$ docker run -ti --rm --net=container:mysql busybox sh
/# ip a
/# netstat -tlp | grep 3306
/# telnet localhost 3306

It is very useful in some special scenarios, for example, the pod of kubernetes, kubernetes creates an infrastructure container for the pod, and other containers under the same pod share the network namespace of the infrastructure container in container mode, and access each other through localhost , forming a unified whole.

None mode

Only the corresponding network space will be created, and the network stack (network card, router, etc.) will not be configured.

# 创建none的容器
$ docker run -it --rm --name=network-none --net=none nginx:alpine sh
# ifconfig

Operate on the host machine:

# 创建虚拟网卡对
$ ip link add A type veth peer name B
# A端插入到docker0网桥
$ brctl addif docker0 A
$ ip link set A up

# B端插入到network-none容器中, 需要借助ip netns, 因此需要显示的创建命名network namespace
$ PID=$(docker inspect -f '{
   
   {.State.Pid}}' network-none)
$ mkdir -p /var/run/netns
$ ln -s /proc/$PID/ns/net /var/run/netns/$PID

# B端放到容器的命名空间
$ ip link set B netns $PID
$ ip netns exec $PID ip link set dev B name eth0  # 修改设备名称为eth0, 和docker默认行为一致
$ ip netns exec $PID ip link set eth0 up

# 设置ip
$ ip netns exec $PID ip addr add 172.17.0.100/16 dev eth0
# 添加默认路由, 指定给docker0网桥
$ ip netns exec $PID ip route add default via 172.17.0.1

# 测试容器间通信

Prerequisite knowledge:

  • The ip netns command is used to manage the network namespace. It can create a named network namespace, and then refer to the network namespace by name
  • A network namespace is logically a copy of the network stack, with its own routes, firewall rules, and network devices. By default, child processes inherit the network namespace of their parent process. That is, if you do not explicitly create a new network namespace, all processes inherit the same default network namespace from the init process.
  • By convention, the named network namespace is an object in the /var/run/netns/ directory that can be opened. For example, if there is a network namespace object named net1, the file descriptor generated by opening the /var/run/netns/net1 object can refer to the network namespace net1. By referencing this file descriptor, the network namespace of the process can be modified.

Guess you like

Origin blog.csdn.net/guolianggsta/article/details/131206320