Author: [email protected]
Overview
Recently, our TaaS platform has encountered many network problems. It turns out that the "contiv + ovs + vlan" solution is not suitable for large-scale and high-concurrency scenarios such as TaaS. . Time is tight, and we can only use the simple and stable network solution "Flannel + host-gw" to build a small-scale cluster as an emergency alternative. Take this opportunity to learn how Flannel, who has been widely criticized for her poor performance in the past two years, has been afraid to touch it now. After half a month of the Spring Festival, the stability test and stress test proved that it is indeed very stable. Of course, calico (bgp) is our main network solution in the future.
Flannel supports multiple Backend protocols, but does not support runtime modification of Backend. The official recommendation is to use the following Backends:
- VXLAN, the performance loss is about 20~30%;
- host-gw, the performance loss is about 10%, and it requires two-layer direct connection between hosts, so it is only suitable for small clusters;
- UDP, it is recommended to only be used for debugging, because the performance is bad. If the network card supports enable udp offload, the performance is still great if the network card is directly unpacked and unpacked.
Experimental Backend, not recommended for production:
- AliVPC
- Alloc
- AWS VPC
- GCE
- IPIP
- IPSec
Flannel configuration
The official configuration of Flannel can be found at https://github.com/coreos/flannel/blob/master/Documentation/configuration.md , but note that the configuration in the documentation is not up-to-date and incomplete.
Configuration via command line
The command line configuration and description of the latest version of Flannel v0.10.0 are as follows:
Usage: /opt/bin/flanneld [OPTION]...
-etcd-cafile string
SSL Certificate Authority file used to secure etcd communication
-etcd-certfile string
SSL certification file used to secure etcd communication
-etcd-endpoints string
a comma-delimited list of etcd endpoints (default "http://127.0.0.1:4001,http://127.0.0.1:2379")
-etcd-keyfile string
SSL key file used to secure etcd communication
-etcd-password string
password for BasicAuth to etcd
-etcd-prefix string
etcd prefix (default "/coreos.com/network")
-etcd-username string
username for BasicAuth to etcd
-healthz-ip string
the IP address for healthz server to listen (default "0.0.0.0")
-healthz-port int
the port for healthz server to listen(0 to disable)
-iface value
interface to use (IP or name) for inter-host communication. Can be specified multiple times to check each option in order. Returns the first match found.
-iface-regex value
regex expression to match the first interface to use (IP or name) for inter-host communication. Can be specified multiple times to check each regex in order. Returns the first match found. Regexes are checked after specific interfaces specified by the iface option have already been checked.
-ip-masq
setup IP masquerade rule for traffic destined outside of overlay network
-kube-api-url string
Kubernetes API server URL. Does not need to be specified if flannel is running in a pod.
-kube-subnet-mgr
contact the Kubernetes API for subnet assignment instead of etcd.
-kubeconfig-file string
kubeconfig file location. Does not need to be specified if flannel is running in a pod.
-log_backtrace_at value
when logging hits line file:N, emit a stack trace
-public-ip string
IP accessible by other nodes for inter-host communication
-subnet-file string
filename where env variables (subnet, MTU, ... ) will be written to (default "/run/flannel/subnet.env")
-subnet-lease-renew-margin int
subnet lease renewal margin, in minutes, ranging from 1 to 1439 (default 60)
-v value
log level for V logs
-version
print version and exit
-vmodule value
comma-separated list of pattern=N settings for file-filtered logging
It needs to be explained as follows:
-
We get the configuration by
-kube-subnet-mgr
configuring Flannel to read the corresponding ConfigMap from the Kubernetes APIServer.-kubeconfig-file, -kube-api-url
We have no configuration, because we use DaemonSet to deploy Flannel through Pod, so Flannel and Kubernetes APIServer are authenticated and communicated through ServiceAccount. -
Another way is to read the Flannel configuration directly from etcd, and you need to configure the corresponding
-etcd
Flag at the beginning. -
-subnet-file
The default is/run/flannel/subnet.env
, generally do not need to change. Flannel will inject the environment variables corresponding to the subnet information of the local machine into the file. Flannel really gets the subnet information from here, for example:FLANNEL_NETWORK=10.244.0.0/16 FLANNEL_SUBNET=10.244.26.1/24 FLANNEL_MTU=1500 FLANNEL_IPMASQ=true
-
-subnet-lease-renew-margin
Indicates how much time before etcd lease expires before it can be automatically renewed. The default is 1h. Because the ttl time is 24h, this configuration is naturally not allowed to exceed 24h, that is, [1, 1439] min.
Configured via environment variables
The above command line configuration items can be set by changing to uppercase, underscores to underscores, and converting the FLANNELD_
prefixes into corresponding environment variables.
For example --etcd-endpoints=http://10.0.0.2:2379
, the corresponding environment variable is FLANNELD_ETCD_ENDPOINTS=http://10.0.0.2:2379
.
Deploy Flannel
Deploying Flannel via the Kubernetes DaemonSet is undisputed. At the same time, create the corresponding ClusterRole, ClusterRoleBinding, ServiceAccount, and ConfigMap. The complete Yaml description file can be referred to as follows:
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: flannel
namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-flannel-cfg
namespace: kube-system
labels:
tier: node
k8s-app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
}
]
}
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "host-gw"
}
}
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: kube-flannel
namespace: kube-system
labels:
tier: node
k8s-app: flannel
spec:
template:
metadata:
labels:
tier: node
k8s-app: flannel
spec:
imagePullSecrets:
- name: harborsecret
serviceAccountName: flannel
containers:
- name: kube-flannel
image: registry.vivo.xyz:4443/coreos/flannel:v0.10.0-amd64
command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr"]
securityContext:
privileged: true
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
volumeMounts:
- name: run
mountPath: /run
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
- name: install-cni
image: registry.vivo.xyz:4443/coreos/flannel-cni:v0.3.0
command: ["/install-cni.sh"]
#command: ["sleep","10000"]
env:
# The CNI network config to install on each node.
- name: CNI_NETWORK_CONFIG
valueFrom:
configMapKeyRef:
name: kube-flannel-cfg
key: cni-conf.json
volumeMounts:
#- name: cni
# mountPath: /etc/cni/net.d
- name: cni
mountPath: /host/etc/cni/net.d
- name: host-cni-bin
mountPath: /host/opt/cni/bin/
hostNetwork: true
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
volumes:
- name: run
hostPath:
path: /run
#- name: cni
# hostPath:
# path: /etc/kubernetes/cni/net.d
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
- name: host-cni-bin
hostPath:
path: /etc/cni/net.d
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
working principle
It's easy to confuse several things. What we usually say about Flannel (coreos/flannel) is actually flanneld. Everyone knows that Kubernetes connects to network plug-ins through the CNI standard, but when you look at the code of Flannel (coreos/flannel), you do not find that it implements the CNI interface. If you've played with other CNI plugins, you'll know that there's also a binary for kubele to call, and that will call the backend network plugin. What is this binary for Flannel (coreos/flannel)? Where is the git repo?
This binary file corresponds to the host machine /etc/cni/net.d/flannel
. Its code address is https://github.com/containernetworking/plugins. The most hateful name is flannel . some type of.
There is also a container in the above Flannel Pod called install-cni, and its corresponding script is at https://github.com/coreos/flannel-cni.
- /opt/bin/flanneld --> https://github.com/coreos/flannel
- /etc/cni/net.d/flannel --> https://github.com/containernetworking/plugins
- /install-cni.sh --> https://github.com/coreos/flannel-cni
kube-flannel container
Running in the kube-flannel container is our protagonist flanneld, and we need to pay attention to the directories/files in this container:
- /etc/kube-flannel/cni-conf.json
- /etc/kube-flannel/net-conf.json
- /run/flannel/subnet.env
- /opt/bin/flanneld
Here is what corresponds to my environment:
/run/flannel # ls /etc/kube-flannel/
cni-conf.json net-conf.json
/run/flannel # cat /etc/kube-flannel/cni-conf.json
{
"name": "cbr0",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
}
]
}
/run/flannel # cat /etc/kube-flannel/net-conf.json
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "host-gw"
}
}
/run/flannel # cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.26.1/24
FLANNEL_MTU=1500
FLANNEL_IPMASQ=true
/run/flannel # ls /opt/bin/
flanneld mk-docker-opts.sh
/run/flannel # cat /opt/bin/mk-docker-opts.sh
#!/bin/sh
usage() {
echo "$0 [-f FLANNEL-ENV-FILE] [-d DOCKER-ENV-FILE] [-i] [-c] [-m] [-k COMBINED-KEY]
Generate Docker daemon options based on flannel env file
OPTIONS:
-f Path to flannel env file. Defaults to /run/flannel/subnet.env
-d Path to Docker env file to write to. Defaults to /run/docker_opts.env
-i Output each Docker option as individual var. e.g. DOCKER_OPT_MTU=1500
-c Output combined Docker options into DOCKER_OPTS var
-k Set the combined options key to this value (default DOCKER_OPTS=)
-m Do not output --ip-masq (useful for older Docker version)
" >&2
exit 1
}
flannel_env="/run/flannel/subnet.env"
docker_env="/run/docker_opts.env"
combined_opts_key="DOCKER_OPTS"
indiv_opts=false
combined_opts=false
ipmasq=true
while getopts "f:d:icmk:?h" opt; do
case $opt in
f)
flannel_env=$OPTARG
;;
d)
docker_env=$OPTARG
;;
i)
indiv_opts=true
;;
c)
combined_opts=true
;;
m)
ipmasq=false
;;
k)
combined_opts_key=$OPTARG
;;
[\?h])
usage
;;
esac
done
if [ $indiv_opts = false ] && [ $combined_opts = false ]; then
indiv_opts=true
combined_opts=true
fi
if [ -f "$flannel_env" ]; then
. $flannel_env
fi
if [ -n "$FLANNEL_SUBNET" ]; then
DOCKER_OPT_BIP="--bip=$FLANNEL_SUBNET"
fi
if [ -n "$FLANNEL_MTU" ]; then
DOCKER_OPT_MTU="--mtu=$FLANNEL_MTU"
fi
if [ -n "$FLANNEL_IPMASQ" ] && [ $ipmasq = true ] ; then
if [ "$FLANNEL_IPMASQ" = true ] ; then
DOCKER_OPT_IPMASQ="--ip-masq=false"
elif [ "$FLANNEL_IPMASQ" = false ] ; then
DOCKER_OPT_IPMASQ="--ip-masq=true"
else
echo "Invalid value of FLANNEL_IPMASQ: $FLANNEL_IPMASQ" >&2
exit 1
fi
fi
eval docker_opts="\$${combined_opts_key}"
if [ "$docker_opts" ]; then
docker_opts="$docker_opts ";
fi
echo -n "" >$docker_env
for opt in $(set | grep "DOCKER_OPT_"); do
OPT_NAME=$(echo $opt | awk -F "=" '{print $1;}');
OPT_VALUE=$(eval echo "\$$OPT_NAME");
if [ "$indiv_opts" = true ]; then
echo "$OPT_NAME=\"$OPT_VALUE\"" >>$docker_env;
fi
docker_opts="$docker_opts $OPT_VALUE";
done
if [ "$combined_opts" = true ]; then
echo "${combined_opts_key}=\"${docker_opts}\"" >>$docker_env
fi
install-cni container
As the name suggests, the install-cni container is responsible for installing the cni plug-in and copying the flannel and other binary files in the image to the host /etc/cni/net.d
. Note that this directory must match the cni configuration items corresponding to kubelet. If you do not change the default configuration of kubelet, then the default of kubelet is also Configured this cni directory. We need to focus on the directories/files inside the install-cni container:
- /host/etc/cni/net.d/
- /host/opt/cni/bin/
- /host/etc/cni/net.d/10-flannel.conflist
Here is what corresponds to my environment:
/host/etc/cni/net.d # pwd
/host/etc/cni/net.d
/host/etc/cni/net.d # ls
10-flannel.conflist dhcp ipvlan noop tuning
bridge flannel loopback portmap vlan
cnitool host-local macvlan ptp
/host/etc/cni/net.d # cd /host/opt/cni/bin/
/host/opt/cni/bin # ls
10-flannel.conflist dhcp ipvlan noop tuning
bridge flannel loopback portmap vlan
cnitool host-local macvlan ptp
/opt/cni/bin # ls
bridge dhcp host-local loopback noop ptp vlan
cnitool flannel ipvlan macvlan portmap tuning
/opt/cni/bin # cat /host/etc/cni/net.d/10-flannel.conflist
{
"name": "cbr0",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
}
]
}
Flannel working principle diagram
Draw a picture, it should be very clear. Note that the colored part is the information corresponding to the Volume, which can be focused on.
The process of creating a container network is: kubelet ——> flannel ——> flanneld. If Pods are created concurrently on the host, you will see that there are multiple flannel processes in the background, but it normally ends in a few seconds, and flanneld is a resident process.
Flannel host-gw Data Flow
Openshift also uses the Flannel host-gw container network scheme by default, and its official website also clearly draws the data flow diagram of host-gw:
-
The corresponding ip routes in Node 1:
default via 192.168.0.100 dev eth0 proto static metric 100 10.1.15.0/24 dev docker0 proto kernel scope link src 10.1.15.1 10.1.20.0/24 via 192.168.0.200 dev eth0
-
The corresponding ip routes in Node 2:
default via 192.168.0.200 dev eth0 proto static metric 100 10.1.20.0/24 dev docker0 proto kernel scope link src 10.1.20.1 10.1.15.0/24 via 192.168.0.100 dev eth0
Notes on using Flannel in Kubernetes clusters
In my cluster, kube-subnet-mgr is used to manage the subnet, not directly through etcd v2.
- When flanneld starts, PodCIDR needs to be configured on the corresponding Node. You can check
.spec.PodCIDR
whether the field has a value by getting node information. - There are two ways to configure the CIDR of Node:
- Manually configure the kubelet on each Node
--pod-cidr
; - When configuring kube-controller-manager
--allocate-node-cidrs=true --cluster-cidr=xx.xx.xx.xx/yy
, CIDR Controller automatically configures PodCIDR for each node.
- Manually configure the kubelet on each Node
- In addition, you will also find that each Node is marked with a lot of Annotations starting with flannel, and these Annotations will be updated every time flanneld starts RegisterNetwork. These Annotations are mainly used for Node Lease.
- flannel.alpha.coreos.com/backend-data: "null"
- flannel.alpha.coreos.com/backend-type: host-gw
- flannel.alpha.coreos.com/kube-subnet-manager: "true"
- flannel.alpha.coreos.com/public-ip: xx.xx.xx.xx
- flannel.alpha.coreos.com/public-ip-overwrite:yy.yy.yy.yy (ps:optional)
Below is the information for a node in my environment:
# kubectl get no 10.21.36.79 -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
flannel.alpha.coreos.com/backend-data: "null"
flannel.alpha.coreos.com/backend-type: host-gw
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 10.21.36.79
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: 2018-02-09T07:18:06Z
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/hostname: 10.21.36.79
name: 10.21.36.79
resourceVersion: "45074326"
selfLink: /api/v1/nodes/10.21.36.79
uid: 5f91765e-0d69-11e8-88cb-f403434bff24
spec:
externalID: 10.21.36.79
podCIDR: 10.244.29.0/24
status:
addresses:
- address: 10.21.36.79
type: InternalIP
- address: 10.21.36.79
type: Hostname
allocatable:
alpha.kubernetes.io/nvidia-gpu: "0"
cpu: "34"
memory: 362301176Ki
pods: "200"
capacity:
alpha.kubernetes.io/nvidia-gpu: "0"
cpu: "40"
memory: 395958008Ki
pods: "200"
conditions:
- lastHeartbeatTime: 2018-02-27T14:07:30Z
lastTransitionTime: 2018-02-13T13:05:57Z
message: kubelet has sufficient disk space available
reason: KubeletHasSufficientDisk
status: "False"
type: OutOfDisk
- lastHeartbeatTime: 2018-02-27T14:07:30Z
lastTransitionTime: 2018-02-13T13:05:57Z
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: 2018-02-27T14:07:30Z
lastTransitionTime: 2018-02-13T13:05:57Z
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: 2018-02-27T14:07:30Z
lastTransitionTime: 2018-02-13T13:05:57Z
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- registry.vivo.xyz:4443/bigdata_release/tensorflow1.5.0@sha256:6d61595c8e85d3724ec42298f8f97cdc782c5d83dd8f651c2eb037c25f525071
- registry.vivo.xyz:4443/bigdata_release/tensorflow1.5.0:v2.0
sizeBytes: 3217838862
- names:
- registry.vivo.xyz:4443/bigdata_release/tensorflow1.3.0@sha256:d14b7776578e3e844bab203b17ae504a0696038c7106469504440841ce17e85f
- registry.vivo.xyz:4443/bigdata_release/tensorflow1.3.0:v1.9
sizeBytes: 2504726638
- names:
- registry.vivo.xyz:4443/coreos/flannel-cni@sha256:dc5b5b370700645efcacb1984ae1e48ec9e297acbb536251689a239f13d08850
- registry.vivo.xyz:4443/coreos/flannel-cni:v0.3.0
sizeBytes: 49786179
- names:
- registry.vivo.xyz:4443/coreos/flannel@sha256:2a1361c414acc80e00514bc7abdbe0cd3dc9b65a181e5ac7393363bcc8621f39
- registry.vivo.xyz:4443/coreos/flannel:v0.10.0-amd64
sizeBytes: 44577768
- names:
- registry.vivo.xyz:4443/google_containers/pause-amd64@sha256:3b3a29e3c90ae7762bdf587d19302e62485b6bef46e114b741f7d75dba023bd3
- registry.vivo.xyz:4443/google_containers/pause-amd64:3.0
sizeBytes: 746888
nodeInfo:
architecture: amd64
bootID: bc7a36a4-2d9b-4caa-b852-445a5fb1b0b9
containerRuntimeVersion: docker://1.12.6
kernelVersion: 3.10.0-514.el7.x86_64
kubeProxyVersion: v1.7.4+793658f2d7ca7
kubeletVersion: v1.7.4+793658f2d7ca7
machineID: edaf7dacea45404b9b3cfe053181d317
operatingSystem: linux
osImage: CentOS Linux 7 (Core)
systemUUID: 30393137-3136-4336-5537-3335444C4C30