DockOne Technology Sharing (18): An article takes you to understand Flannel

[Editor's Note] Flannel is an overlay network tool designed by the CoreOS team for Kubernetes. Its purpose is to help each CoreOS host using Kubernetes have a complete subnet. This sharing will introduce the use of this tool from three aspects: the introduction of Flannel, its working principle, and its installation and configuration.

Part 1: Introduction to Flannel

Flannel is a network planning service designed by the CoreOS team for Kubernetes. In simple terms, its function is to allow Docker containers created by different node hosts in the cluster to have a unique virtual IP address for the entire cluster.

In the Kubernetes network model, it is assumed that each physical node should have a "dedicated subnet IP" that "belongs to the same intranet IP segment". E.g:

NodeA: 10.0.1.0/24
NodeB: 10.0.2.0/24
Node C: 10.0.3.0/24


However, in the default Docker configuration, the Docker service on each node is responsible for the IP allocation of the container on the node. One problem this causes is that containers on different nodes may get the same internal and external IP addresses. And enable these containers to find each other through IP addresses, that is, to ping each other.

The design purpose of Flannel is to re-plan the use rules of IP addresses for all nodes in the cluster, so that containers on different nodes can obtain IP addresses that "belong to the same intranet" and "non-repetitive", and allow containers belonging to different nodes. Containers can communicate directly through intranet IPs.

Part II: How Flannel Works

Flannel is essentially an "overlay network", that is, it wraps TCP data in another network packet for routing, forwarding and communication. Currently, data forwarding methods such as UDP, VxLAN, AWS VPC, and GCE routing are supported. .

The default data communication method between nodes is UDP forwarding. There is the following schematic diagram on Flannel's GitHub page:

01.png


The amount of information in this picture is very complete, and the following is a brief interpretation.

After the data is sent from the source container, it is forwarded to the flannel0 virtual network card through the docker0 virtual network card of the host where it is located. This is a P2P virtual network card, and the flanneld service listens on the other end of the network card.

Flannel maintains a routing table between nodes through the Etcd service, which we will introduce later in the configuration section.

The flanneld service of the source host encapsulates the original data content in UDP and delivers it to the flanneld service of the destination node according to its own routing table. After the data arrives, it is unpacked, and then directly enters the flannel0 virtual network card of the destination node, and then forwarded to the destination host's flanneld service. docker0 virtual network card, and finally the docker0 route reaches the target container just like the local container communication.

In this way, the transmission of the entire data packet is completed, and three issues need to be explained here.

The first question, what is UDP encapsulation?

Let's look at the figure below, which is a ping command communication packet captured on one of the communication nodes. It can be seen that the data content part of UDP is actually another ICMP (that is, ping command) data packet.

02.png


The original data is encapsulated by UDP on the Flannel service of the starting node. After it is delivered to the destination node, it is restored to the original data packet by the Flannel service at the other end. The Docker services on both sides cannot feel the existence of this process.

Second question, why does Docker on each node use a different IP address segment?

This thing seems strange, but the truth is very simple. In fact, it is simply because Flannel secretly modified the startup parameters of Docker after allocating the available IP address segment for each node through Etcd, as shown in the figure below.

03.png


这个是在运行了Flannel服务的节点上查看到的Docker服务进程运行参数。

注意其中的“--bip=172.17.18.1/24”这个参数,它限制了所在节点容器获得的IP范围。

这个IP范围是由Flannel自动分配的,由Flannel通过保存在Etcd服务中的记录确保它们不会重复。

第三个问题,为什么在发送节点上的数据会从docker0路由到flannel0虚拟网卡,在目的节点会从flannel0路由到docker0虚拟网卡?

我们来看一眼安装了Flannel的节点上的路由表。下面是数据发送节点的路由表:

04.png


这个是数据接收节点的路由表:

05.png


例如现在有一个数据包要从IP为172.17.18.2的容器发到IP为172.17.46.2的容器。根据数据发送节点的路由表,它只与172.17.0.0/16匹配这条记录匹配,因此数据从docker0出来以后就被投递到了flannel0。同理在目标节点,由于投递的地址是一个容器,因此目的地址一定会落在docker0对于的172.17.46.0/24这个记录上,自然的被投递到了docker0网卡。

第三部分:Flannel的安装和配置

Flannel是Golang编写的程序,因此的安装十分简单。

https://github.com/coreos/flannel/releaseshttps://github.com/coreos/etcd/releases分别下载Flannel和Etcd的最新版本二进制包。

解压后将Flannel的二进制文件“flanneld”和脚本文件“mk-docker-opts.sh”、以及Etcd的二进制文件“etcd”和“etcdctl”放到系统的PATH目录下面安装就算完成了。

配置部分要复杂一些。

首先启动Etcd,参考https://github.com/coreos/etcd ... overy

访问这个地址:https://discovery.etcd.io/new?size=3 获得一个“Discovery地址”

在每个节点上运行以下启动命令:

etcd -initial-advertise-peer-urls http://<当前节点IP>:2380 -listen-peer-urls http://<当前节点IP>:2380 -listen-client-urls http://<当前节点IP>:2379,http://<当前节点IP>:2379 -advertise-client-urls http://<当前节点IP>:2379  -discovery <刚刚获得的Discovery地址> &


启动完Etcd以后,就可以配置Flannel了。

Flannel的配置信息全部在Etcd里面记录,往Etcd里面写入下面这个最简单的配置,只指定Flannel能用来分配给每个Docker节点的拟IP地址段:

etcdctl set /coreos.com/network/config '{ "Network": "172.17.0.0/16" }'


然后在每个节点分别启动Flannel:

flanneld &


最后需要给Docker动一点手脚,修改它的启动参数和docker0地址。

在每个节点上执行:

sudo mk-docker-opts.sh -i
source /run/flannel/subnet.env
sudo rm /var/run/docker.pid
sudo ifconfig docker0 ${FLANNEL_SUBNET} 



重启动一次Docker,这样配置就完成了。

现在在两个节点分别启动一个Docker容器,它们之间已经通过IP地址直接相互ping通了。

到此,整个Flannel集群也就正常运行了。

最后,前面反复提到过Flannel有一个保存在Etcd的路由表,可以在Etcd数据中找到这些路由记录,如下图。

06.png

 

Q&A

问:数据从源容器中发出后,经由所在主机的docker0虚拟网卡转发到flannel0虚拟网卡,这种P2P实际生产中是否存在丢包,或者此机制有高可用保障么?
答:只是本机的P2P网卡,没有经过外部网络,应该还比较稳定。但我这里没有具体数据。

问:UDP数据封装,转发的形式也是UDP么?我们一般知道UDP发送数据是无状态的,可靠么?
答:转发的是UDP,高并发数据流时候也许会有问题,我这里同样没有数据。

问:实际上,kubernates是淡化了容器ip,外围用户只需关注所调用的服务,并不关心具体的ip,这里fannel将IP分开且唯一,这样做有什么好处?有实际应用的业务场景么?
答: IP唯一是Kubernetes能够组网的条件之一,不把网络拉通后面的事情都不好整。

问:Flannel通过Etcd分配了每个节点可用的IP地址段后,偷偷的修改了Docker的启动参数:那么如果增加节点,或删除节点,这些地址段(ETCD上)会动态变化么?如果不是动态变化,会造成IP地址的浪费么?
答会造成一些浪费,一般使用10.x.x.x的IP段。

问:sudo mk-docker-opts.sh -i 这个命令具体干什么了?非coreos上使用flannel有什么不同?
答:生成了一个Docker启动的环境变量文件,里面给Docker增加了启动参数。
没有什么不同,只是CoreOS集成了Flannel,在CoreOS上面启动Flannel只是一行命令:systemctl start flanneld。

问:容器IP都是固定的吗?外网与物理主机能ping通,也能ping通所有Docker集群的容器IP?
答:不是固定的,IP分配还是Docker在做,Flannel只是分配了子网。

问:Flannel的能否实现VPN?你们有没有研究过?
答: 应该不能,它要求这些容器本来就在一个内网里面。

问:Flannl是谁开发的?全是对k8s的二次开发吗?
答: CoreOS公司,不是k8s的二次开发,独立的开源项目,给k8s提供基础网络环境。

问:Flannel支持非封包的纯转发吗?这样性能就不会有损失了?
答:非封装怎样路由呢?发出来的TCP包本身并没有在网络间路由的信息,别忘了,两个Flannel不是直连的,隔着普通的局域网络。

问: Flanel现在到哪个版本了,后续版本有什么侧重点?性能优化,还是功能扩展?
答:还没到1.0,在GitHub上面有他们的发展计划,性能是很大的一部分。

问: 就是在CoreOS中,客户还需要安装Flannel吗?
答:不需要,在启动的Cloudinit配置里面给Etcd写入Flannel配置,然后加上flanneld.service command: start 就可以了,启动完直接可用,文档连接我不找了,有这段配置,现成的。

问: 可不可以直接用命令指定每个主机的ip范围,然后做gre隧道实现节点之间的通信?这样也可以实现不同主机上的容器ip不同且可以相互通信吧?
答:还不支持指定哪个节点用那段IP,不过貌似可以在Etcd手改。

问: Flannel只是负责通信服务,那是不是还要安装k8s?
答:是的,k8s是单独的。

问:现在Docker的网络组件还有什么可以选择或者推荐的?
A: The commonly used overlay networks are Flannel and Weave, and other OVS and the like are mentioned separately.

===========================
The above content According to the WeChat group on August 25, 2015 WeChat group sharing content. Shareer Lin Fan, ThoughtWorks Chengdu Cloud&DevOps consultant, his main research content is application containerization and CoreOS system related fields. Author of "CoreOS Practice Guide" and "CoreOS Things" series of articles. CoreOS topics will be shared at the Container Technology Conference on August 28. DockOne organizes targeted technical sharing every week. Interested students are welcome to add WeChat: liyingjiesx, join the group to participate, and you can leave us a message if you have any topics you want to hear.

 

http://dockone.io/article/618

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326908108&siteId=291194637