How Flannel Works

Author: [email protected]

Overview

Recently, our TaaS platform has encountered many network problems. It turns out that the "contiv + ovs + vlan" solution is not suitable for large-scale and high-concurrency scenarios such as TaaS. . Time is tight, and we can only use the simple and stable network solution "Flannel + host-gw" to build a small-scale cluster as an emergency alternative. Take this opportunity to learn how Flannel, who has been widely criticized for her poor performance in the past two years, has been afraid to touch it now. After half a month of the Spring Festival, the stability test and stress test proved that it is indeed very stable. Of course, calico (bgp) is our main network solution in the future.

Flannel supports multiple Backend protocols, but does not support runtime modification of Backend. The official recommendation is to use the following Backends:

  • VXLAN, the performance loss is about 20~30%;
  • host-gw, the performance loss is about 10%, and it requires two-layer direct connection between hosts, so it is only suitable for small clusters;
  • UDP, it is recommended to only be used for debugging, because the performance is bad. If the network card supports enable udp offload, the performance is still great if the network card is directly unpacked and unpacked.

Experimental Backend, not recommended for production:

  • AliVPC
  • Alloc
  • AWS VPC
  • GCE
  • IPIP
  • IPSec

Flannel configuration

The official configuration of Flannel can be found at https://github.com/coreos/flannel/blob/master/Documentation/configuration.md , but note that the configuration in the documentation is not up-to-date and incomplete.

Configuration via command line

The command line configuration and description of the latest version of Flannel v0.10.0 are as follows:

Usage: /opt/bin/flanneld [OPTION]...
  -etcd-cafile string
    	SSL Certificate Authority file used to secure etcd communication
  -etcd-certfile string
    	SSL certification file used to secure etcd communication
  -etcd-endpoints string
    	a comma-delimited list of etcd endpoints (default "http://127.0.0.1:4001,http://127.0.0.1:2379")
  -etcd-keyfile string
    	SSL key file used to secure etcd communication
  -etcd-password string
    	password for BasicAuth to etcd
  -etcd-prefix string
    	etcd prefix (default "/coreos.com/network")
  -etcd-username string
    	username for BasicAuth to etcd
  -healthz-ip string
    	the IP address for healthz server to listen (default "0.0.0.0")
  -healthz-port int
    	the port for healthz server to listen(0 to disable)
  -iface value
    	interface to use (IP or name) for inter-host communication. Can be specified multiple times to check each option in order. Returns the first match found.
  -iface-regex value
    	regex expression to match the first interface to use (IP or name) for inter-host communication. Can be specified multiple times to check each regex in order. Returns the first match found. Regexes are checked after specific interfaces specified by the iface option have already been checked.
  -ip-masq
    	setup IP masquerade rule for traffic destined outside of overlay network
  -kube-api-url string
    	Kubernetes API server URL. Does not need to be specified if flannel is running in a pod.
  -kube-subnet-mgr
    	contact the Kubernetes API for subnet assignment instead of etcd.
  -kubeconfig-file string
    	kubeconfig file location. Does not need to be specified if flannel is running in a pod.
  -log_backtrace_at value
    	when logging hits line file:N, emit a stack trace
  -public-ip string
    	IP accessible by other nodes for inter-host communication
  -subnet-file string
    	filename where env variables (subnet, MTU, ... ) will be written to (default "/run/flannel/subnet.env")
  -subnet-lease-renew-margin int
    	subnet lease renewal margin, in minutes, ranging from 1 to 1439 (default 60)
  -v value
    	log level for V logs
  -version
    	print version and exit
  -vmodule value
    	comma-separated list of pattern=N settings for file-filtered logging

It needs to be explained as follows:

  • We get the configuration by -kube-subnet-mgrconfiguring Flannel to read the corresponding ConfigMap from the Kubernetes APIServer. -kubeconfig-file, -kube-api-urlWe have no configuration, because we use DaemonSet to deploy Flannel through Pod, so Flannel and Kubernetes APIServer are authenticated and communicated through ServiceAccount.

  • Another way is to read the Flannel configuration directly from etcd, and you need to configure the corresponding -etcdFlag at the beginning.

  • -subnet-fileThe default is /run/flannel/subnet.env, generally do not need to change. Flannel will inject the environment variables corresponding to the subnet information of the local machine into the file. Flannel really gets the subnet information from here, for example:

    FLANNEL_NETWORK=10.244.0.0/16
    FLANNEL_SUBNET=10.244.26.1/24
    FLANNEL_MTU=1500
    FLANNEL_IPMASQ=true
    
  • -subnet-lease-renew-marginIndicates how much time before etcd lease expires before it can be automatically renewed. The default is 1h. Because the ttl time is 24h, this configuration is naturally not allowed to exceed 24h, that is, [1, 1439] min.

Configured via environment variables

The above command line configuration items can be set by changing to uppercase, underscores to underscores, and converting the FLANNELD_prefixes into corresponding environment variables.

For example --etcd-endpoints=http://10.0.0.2:2379, the corresponding environment variable is FLANNELD_ETCD_ENDPOINTS=http://10.0.0.2:2379.

Deploy Flannel

Deploying Flannel via the Kubernetes DaemonSet is undisputed. At the same time, create the corresponding ClusterRole, ClusterRoleBinding, ServiceAccount, and ConfigMap. The complete Yaml description file can be referred to as follows:

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
rules:
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - nodes/status
    verbs:
      - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    k8s-app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "plugins": [
        {
         "type": "flannel",
         "delegate": {
           "hairpinMode": true,
           "isDefaultGateway": true
         }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "host-gw"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel
  namespace: kube-system
  labels:
    tier: node
    k8s-app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        k8s-app: flannel
    spec:
      imagePullSecrets:
      - name: harborsecret
      serviceAccountName: flannel
      containers:
      - name: kube-flannel
        image: registry.vivo.xyz:4443/coreos/flannel:v0.10.0-amd64
        command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr"]
        securityContext:
          privileged: true
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        volumeMounts:
        - name: run
          mountPath: /run
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      - name: install-cni
        image: registry.vivo.xyz:4443/coreos/flannel-cni:v0.3.0
        command: ["/install-cni.sh"]
        #command: ["sleep","10000"]
        env:
        # The CNI network config to install on each node.
        - name: CNI_NETWORK_CONFIG
          valueFrom:
            configMapKeyRef:
              name: kube-flannel-cfg
              key: cni-conf.json
        volumeMounts:
        #- name: cni
        #  mountPath: /etc/cni/net.d
        - name: cni
          mountPath: /host/etc/cni/net.d
        - name: host-cni-bin
          mountPath: /host/opt/cni/bin/
      hostNetwork: true
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      volumes:
        - name: run
          hostPath:
            path: /run
        #- name: cni
        #  hostPath:
        #    path: /etc/kubernetes/cni/net.d
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: flannel-cfg
          configMap:
            name: kube-flannel-cfg
        - name: host-cni-bin
          hostPath:
            path: /etc/cni/net.d
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate

working principle

It's easy to confuse several things. What we usually say about Flannel (coreos/flannel) is actually flanneld. Everyone knows that Kubernetes connects to network plug-ins through the CNI standard, but when you look at the code of Flannel (coreos/flannel), you do not find that it implements the CNI interface. If you've played with other CNI plugins, you'll know that there's also a binary for kubele to call, and that will call the backend network plugin. What is this binary for Flannel (coreos/flannel)? Where is the git repo?

This binary file corresponds to the host machine /etc/cni/net.d/flannel. Its code address is https://github.com/containernetworking/plugins. The most hateful name is flannel . some type of.

There is also a container in the above Flannel Pod called install-cni, and its corresponding script is at https://github.com/coreos/flannel-cni.

kube-flannel container

Running in the kube-flannel container is our protagonist flanneld, and we need to pay attention to the directories/files in this container:

  • /etc/kube-flannel/cni-conf.json
  • /etc/kube-flannel/net-conf.json
  • /run/flannel/subnet.env
  • /opt/bin/flanneld

Here is what corresponds to my environment:

/run/flannel # ls /etc/kube-flannel/
cni-conf.json  net-conf.json
/run/flannel # cat /etc/kube-flannel/cni-conf.json 
{
  "name": "cbr0",
  "plugins": [
    {
     "type": "flannel",
     "delegate": {
       "hairpinMode": true,
       "isDefaultGateway": true
     }
    }
  ]
}
/run/flannel # cat /etc/kube-flannel/net-conf.json 
{
  "Network": "10.244.0.0/16",
  "Backend": {
    "Type": "host-gw"
  }
}

/run/flannel # cat  /run/flannel/subnet.env 
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.26.1/24
FLANNEL_MTU=1500
FLANNEL_IPMASQ=true

/run/flannel # ls /opt/bin/
flanneld           mk-docker-opts.sh
/run/flannel # cat /opt/bin/mk-docker-opts.sh 
#!/bin/sh

usage() {
	echo "$0 [-f FLANNEL-ENV-FILE] [-d DOCKER-ENV-FILE] [-i] [-c] [-m] [-k COMBINED-KEY]

Generate Docker daemon options based on flannel env file
OPTIONS:
	-f	Path to flannel env file. Defaults to /run/flannel/subnet.env
	-d	Path to Docker env file to write to. Defaults to /run/docker_opts.env
	-i	Output each Docker option as individual var. e.g. DOCKER_OPT_MTU=1500
	-c	Output combined Docker options into DOCKER_OPTS var
	-k	Set the combined options key to this value (default DOCKER_OPTS=)
	-m	Do not output --ip-masq (useful for older Docker version)
" >&2

	exit 1
}

flannel_env="/run/flannel/subnet.env"
docker_env="/run/docker_opts.env"
combined_opts_key="DOCKER_OPTS"
indiv_opts=false
combined_opts=false
ipmasq=true

while getopts "f:d:icmk:?h" opt; do
	case $opt in
		f)
			flannel_env=$OPTARG
			;;
		d)
			docker_env=$OPTARG
			;;
		i)
			indiv_opts=true
			;;
		c)
			combined_opts=true
			;;
		m)
			ipmasq=false
			;;
		k)
			combined_opts_key=$OPTARG
			;;
		[\?h])
			usage
			;;
	esac
done

if [ $indiv_opts = false ] && [ $combined_opts = false ]; then
	indiv_opts=true
	combined_opts=true
fi

if [ -f "$flannel_env" ]; then
	. $flannel_env
fi

if [ -n "$FLANNEL_SUBNET" ]; then
	DOCKER_OPT_BIP="--bip=$FLANNEL_SUBNET"
fi

if [ -n "$FLANNEL_MTU" ]; then
	DOCKER_OPT_MTU="--mtu=$FLANNEL_MTU"
fi

if [ -n "$FLANNEL_IPMASQ" ] && [ $ipmasq = true ] ; then
	if [ "$FLANNEL_IPMASQ" = true ] ; then
		DOCKER_OPT_IPMASQ="--ip-masq=false"
	elif [ "$FLANNEL_IPMASQ" = false ] ; then
		DOCKER_OPT_IPMASQ="--ip-masq=true"
	else
		echo "Invalid value of FLANNEL_IPMASQ: $FLANNEL_IPMASQ" >&2
		exit 1
	fi
fi

eval docker_opts="\$${combined_opts_key}"

if [ "$docker_opts" ]; then
	docker_opts="$docker_opts ";
fi

echo -n "" >$docker_env

for opt in $(set | grep "DOCKER_OPT_"); do

	OPT_NAME=$(echo $opt | awk -F "=" '{print $1;}');
	OPT_VALUE=$(eval echo "\$$OPT_NAME");

	if [ "$indiv_opts" = true ]; then
		echo "$OPT_NAME=\"$OPT_VALUE\"" >>$docker_env;
	fi

	docker_opts="$docker_opts $OPT_VALUE";

done

if [ "$combined_opts" = true ]; then
	echo "${combined_opts_key}=\"${docker_opts}\"" >>$docker_env
fi

install-cni container

As the name suggests, the install-cni container is responsible for installing the cni plug-in and copying the flannel and other binary files in the image to the host /etc/cni/net.d. Note that this directory must match the cni configuration items corresponding to kubelet. If you do not change the default configuration of kubelet, then the default of kubelet is also Configured this cni directory. We need to focus on the directories/files inside the install-cni container:

  • /host/etc/cni/net.d/
  • /host/opt/cni/bin/
  • /host/etc/cni/net.d/10-flannel.conflist

Here is what corresponds to my environment:


/host/etc/cni/net.d # pwd
/host/etc/cni/net.d
/host/etc/cni/net.d # ls
10-flannel.conflist  dhcp                 ipvlan               noop                 tuning
bridge               flannel              loopback             portmap              vlan
cnitool              host-local           macvlan              ptp


/host/etc/cni/net.d # cd /host/opt/cni/bin/
/host/opt/cni/bin # ls
10-flannel.conflist  dhcp                 ipvlan               noop                 tuning
bridge               flannel              loopback             portmap              vlan
cnitool              host-local           macvlan              ptp


/opt/cni/bin # ls
bridge      dhcp        host-local  loopback    noop        ptp         vlan
cnitool     flannel     ipvlan      macvlan     portmap     tuning

/opt/cni/bin # cat /host/etc/cni/net.d/10-flannel.conflist 
{
  "name": "cbr0",
  "plugins": [
    {
     "type": "flannel",
     "delegate": {
       "hairpinMode": true,
       "isDefaultGateway": true
     }
    }
  ]
}

Flannel working principle diagram

Draw a picture, it should be very clear. Note that the colored part is the information corresponding to the Volume, which can be focused on.

The process of creating a container network is: kubelet ——> flannel ——> flanneld. If Pods are created concurrently on the host, you will see that there are multiple flannel processes in the background, but it normally ends in a few seconds, and flanneld is a resident process.

Enter image description

Flannel host-gw Data Flow

Openshift also uses the Flannel host-gw container network scheme by default, and its official website also clearly draws the data flow diagram of host-gw:

Enter image description

  • The corresponding ip routes in Node 1:

    default via 192.168.0.100 dev eth0 proto static metric 100
    10.1.15.0/24 dev docker0 proto kernel scope link src 10.1.15.1
    10.1.20.0/24 via 192.168.0.200 dev eth0
    
  • The corresponding ip routes in Node 2:

    default via 192.168.0.200 dev eth0 proto static metric 100
    10.1.20.0/24 dev docker0 proto kernel scope link src 10.1.20.1
    10.1.15.0/24 via 192.168.0.100 dev eth0
    

Notes on using Flannel in Kubernetes clusters

In my cluster, kube-subnet-mgr is used to manage the subnet, not directly through etcd v2.

  • When flanneld starts, PodCIDR needs to be configured on the corresponding Node. You can check .spec.PodCIDRwhether the field has a value by getting node information.
  • There are two ways to configure the CIDR of Node:
    • Manually configure the kubelet on each Node --pod-cidr;
    • When configuring kube-controller-manager --allocate-node-cidrs=true --cluster-cidr=xx.xx.xx.xx/yy, CIDR Controller automatically configures PodCIDR for each node.
  • In addition, you will also find that each Node is marked with a lot of Annotations starting with flannel, and these Annotations will be updated every time flanneld starts RegisterNetwork. These Annotations are mainly used for Node Lease.
    • flannel.alpha.coreos.com/backend-data: "null"
    • flannel.alpha.coreos.com/backend-type: host-gw
    • flannel.alpha.coreos.com/kube-subnet-manager: "true"
    • flannel.alpha.coreos.com/public-ip: xx.xx.xx.xx
    • flannel.alpha.coreos.com/public-ip-overwrite:yy.yy.yy.yy (ps:optional)

Below is the information for a node in my environment:

# kubectl get no 10.21.36.79 -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    flannel.alpha.coreos.com/backend-data: "null"
    flannel.alpha.coreos.com/backend-type: host-gw
    flannel.alpha.coreos.com/kube-subnet-manager: "true"
    flannel.alpha.coreos.com/public-ip: 10.21.36.79
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: 2018-02-09T07:18:06Z
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    kubernetes.io/hostname: 10.21.36.79
  name: 10.21.36.79
  resourceVersion: "45074326"
  selfLink: /api/v1/nodes/10.21.36.79
  uid: 5f91765e-0d69-11e8-88cb-f403434bff24
spec:
  externalID: 10.21.36.79
  podCIDR: 10.244.29.0/24
status:
  addresses:
  - address: 10.21.36.79
    type: InternalIP
  - address: 10.21.36.79
    type: Hostname
  allocatable:
    alpha.kubernetes.io/nvidia-gpu: "0"
    cpu: "34"
    memory: 362301176Ki
    pods: "200"
  capacity:
    alpha.kubernetes.io/nvidia-gpu: "0"
    cpu: "40"
    memory: 395958008Ki
    pods: "200"
  conditions:
  - lastHeartbeatTime: 2018-02-27T14:07:30Z
    lastTransitionTime: 2018-02-13T13:05:57Z
    message: kubelet has sufficient disk space available
    reason: KubeletHasSufficientDisk
    status: "False"
    type: OutOfDisk
  - lastHeartbeatTime: 2018-02-27T14:07:30Z
    lastTransitionTime: 2018-02-13T13:05:57Z
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: 2018-02-27T14:07:30Z
    lastTransitionTime: 2018-02-13T13:05:57Z
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: 2018-02-27T14:07:30Z
    lastTransitionTime: 2018-02-13T13:05:57Z
    message: kubelet is posting ready status
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - registry.vivo.xyz:4443/bigdata_release/tensorflow1.5.0@sha256:6d61595c8e85d3724ec42298f8f97cdc782c5d83dd8f651c2eb037c25f525071
    - registry.vivo.xyz:4443/bigdata_release/tensorflow1.5.0:v2.0
    sizeBytes: 3217838862
  - names:
    - registry.vivo.xyz:4443/bigdata_release/tensorflow1.3.0@sha256:d14b7776578e3e844bab203b17ae504a0696038c7106469504440841ce17e85f
    - registry.vivo.xyz:4443/bigdata_release/tensorflow1.3.0:v1.9
    sizeBytes: 2504726638
  - names:
    - registry.vivo.xyz:4443/coreos/flannel-cni@sha256:dc5b5b370700645efcacb1984ae1e48ec9e297acbb536251689a239f13d08850
    - registry.vivo.xyz:4443/coreos/flannel-cni:v0.3.0
    sizeBytes: 49786179
  - names:
    - registry.vivo.xyz:4443/coreos/flannel@sha256:2a1361c414acc80e00514bc7abdbe0cd3dc9b65a181e5ac7393363bcc8621f39
    - registry.vivo.xyz:4443/coreos/flannel:v0.10.0-amd64
    sizeBytes: 44577768
  - names:
    - registry.vivo.xyz:4443/google_containers/pause-amd64@sha256:3b3a29e3c90ae7762bdf587d19302e62485b6bef46e114b741f7d75dba023bd3
    - registry.vivo.xyz:4443/google_containers/pause-amd64:3.0
    sizeBytes: 746888
  nodeInfo:
    architecture: amd64
    bootID: bc7a36a4-2d9b-4caa-b852-445a5fb1b0b9
    containerRuntimeVersion: docker://1.12.6
    kernelVersion: 3.10.0-514.el7.x86_64
    kubeProxyVersion: v1.7.4+793658f2d7ca7
    kubeletVersion: v1.7.4+793658f2d7ca7
    machineID: edaf7dacea45404b9b3cfe053181d317
    operatingSystem: linux
    osImage: CentOS Linux 7 (Core)
    systemUUID: 30393137-3136-4336-5537-3335444C4C30

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324399264&siteId=291194637