Vk use virtual cluster nodes to enhance the capacity and resilience k8s
Add vk kubernetes virtual nodes in the cluster approach has been widely used very large number of customers, based on vk Pod virtual node can greatly enhance the capacity and resilience of the cluster, flexible and dynamic on-demand creation ECI Pod, trouble eliminating cluster capacity planning . Currently vk virtual node has been widely used in the following scene.
- Elasticity of demand peaks and valleys online business: such as online education, electricity suppliers and other industries have significant peaks and valleys computing features, vk use can significantly reduce the maintenance of a fixed pool of resources, reduce computing costs.
- Enhance the capacity of the cluster Pod: When the traditional network model of flannel vpc cluster because the routing table entries or vswitch network planning restrictions result in a cluster can not add more nodes, using virtual nodes can circumvent these problems, a simple and rapid improvement Pod cluster capacity.
- Data Calculation: vk using bearer Spark, Presto other computing scenarios, reduce computational cost.
- CI / CD and other types of tasks Job
Here we describe how to use virtual node to quickly create 10,000 pod, these eci pod-demand billing, it will not take up the capacity of the fixed node resource pools.
In comparison, AWS EKS can create up to 1000 Fargate Pod in a cluster. Vk virtual node-based approach can easily create over ten thousand ECI Pod.
Create multiple virtual nodes vk
Please refer to the product documentation for deploying virtual node ACK: https://help.aliyun.com/document_detail/118970.html
Because the use of multiple virtual nodes vk often used to deploy a large number of ECI Pod, we recommend caution confirm the configuration vpc / vswitch / security group, to ensure that sufficient resources vswitch ip (vk support to configure multiple ip vswitch solve the capacity problem), use the Enterprise level security group can break through the 2000 instance of restricted common security group.
Generally speaking, if the number of clusters within a single k8s eci pod less than 3000, we recommend the deployment of a single node vk. If you want to deploy more pod on vk, we recommend deploying more vk in k8s cluster nodes to expand the level of vk, vk deploy multiple nodes forms can ease the pressure of a single node vk, supporting greater eci pod capacity. Such nodes may support three vk 9000 eci pod, 10 th node may support vk to 30,000 eci pod.
In order to be easier vk horizontal expansion, we use statefulset way to deploy vk controller, each node vk vk controller manages a number of copies statefulset Pod default is 1. When you need more vk virtual nodes, only you need to modify replicas of statefulset.
# kubectl -n kube-system scale statefulset virtual-kubelet --replicas=4
statefulset.apps/virtual-kubelet scaled
# kubectl get no
NAME STATUS ROLES AGE VERSION
cn-hangzhou.192.168.1.1 Ready <none> 63d v1.12.6-aliyun.1
cn-hangzhou.192.168.1.2 Ready <none> 63d v1.12.6-aliyun.1
virtual-kubelet-0 Ready agent 1m v1.11.2-aliyun-1.0.207
virtual-kubelet-1 Ready agent 1m v1.11.2-aliyun-1.0.207
virtual-kubelet-2 Ready agent 1m v1.11.2-aliyun-1.0.207
virtual-kubelet-3 Ready agent 1m v1.11.2-aliyun-1.0.207
# kubectl -n kube-system get statefulset virtual-kubelet
NAME READY AGE
virtual-kubelet 4/4 1m
# kubectl -n kube-system get pod|grep virtual-kubelet
virtual-kubelet-0 1/1 Running 0 1m
virtual-kubelet-1 1/1 Running 0 1m
virtual-kubelet-2 1/1 Running 0 1m
virtual-kubelet-3 1/1 Running 0 1m
When we create multiple nginx pod in vk namespace in (vk ns plus the specified label, in the pod to force ns scheduled on vk node), can be found on the pod is scheduled to vk multiple nodes.
# kubectl create ns vk
# kubectl label namespace vk virtual-node-affinity-injection=enabled
# kubectl -n vk run nginx --image nginx:alpine --replicas=10
deployment.extensions/nginx scaled
# kubectl -n vk get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-546c47b569-blp88 1/1 Running 0 69s 192.168.1.26 virtual-kubelet-1 <none> <none>
nginx-546c47b569-c4qbw 1/1 Running 0 69s 192.168.1.76 virtual-kubelet-0 <none> <none>
nginx-546c47b569-dfr2v 1/1 Running 0 69s 192.168.1.27 virtual-kubelet-2 <none> <none>
nginx-546c47b569-jfzxl 1/1 Running 0 69s 192.168.1.68 virtual-kubelet-1 <none> <none>
nginx-546c47b569-mpmsv 1/1 Running 0 69s 192.168.1.66 virtual-kubelet-1 <none> <none>
nginx-546c47b569-p4qlz 1/1 Running 0 69s 192.168.1.67 virtual-kubelet-3 <none> <none>
nginx-546c47b569-x4vrn 1/1 Running 0 69s 192.168.1.65 virtual-kubelet-2 <none> <none>
nginx-546c47b569-xmxx9 1/1 Running 0 69s 192.168.1.30 virtual-kubelet-0 <none> <none>
nginx-546c47b569-xznd8 1/1 Running 0 69s 192.168.1.77 virtual-kubelet-3 <none> <none>
nginx-546c47b569-zk9zc 1/1 Running 0 69s 192.168.1.75 virtual-kubelet-2 <none> <none>
Run 10000 ECI Pod
In the above steps, we have created four vk virtual nodes, can support 12,000 ECI Pod, we only need to specify the workload can be scheduled to the node vk. Here we need to focus on the scalability of kube-proxy.
- ECI Pod vk created by default support access to the cluster ClusterIP Service, so that each ECI Pod need to watch apiserver to maintain a connection to listen svc / endpoints change. When a large pod while Running, apiserver and slb will remain Pod number of concurrent connections, we need to ensure that the number of concurrent connections slb specification can support desired.
- If the ECI Pod without accessing ClusterIP Service, you can vk statefulset of ECI_KUBE_PROXY environment variable value is set to "false", so there is not a large number of concurrent connections slb, it will reduce the pressure of apiserver.
- What I can also choose to visit the ECI Pod ClusterIP Service slb exposed to the network type, and then let the ECI Pod by privatezone way kube-proxy need not be based on whether access to the cluster Service service.
Reduce the number of virtual nodes vk
Because eci pod on vk is created on demand, when there is no eci pod vk virtual node will not take up the actual resources, so under normal circumstances we do not need to reduce the number of nodes vk. But if the user does want to reduce the number of nodes vk, we recommend to follow this procedure.
Suppose there are four current cluster nodes vk, respectively, virtual-kubelet-0 /.../ virtual-kubelet-3. We want to reduce to a node vk, then we need to remove the virtual-kubelet-1 /../ virtual-kubelet-3 three nodes.
- First off the assembly line elegant vk node, the expulsion of the above pod to other nodes, but also prohibit more pod scheduled on vk node to be deleted.
# kubectl drain virtual-kubelet-1 virtual-kubelet-2 virtual-kubelet-3
# kubectl get no
NAME STATUS ROLES AGE VERSION
cn-hangzhou.192.168.1.1 Ready <none> 66d v1.12.6-aliyun.1
cn-hangzhou.192.168.1.2 Ready <none> 66d v1.12.6-aliyun.1
virtual-kubelet-0 Ready agent 3d6h v1.11.2-aliyun-1.0.207
virtual-kubelet-1 Ready,SchedulingDisabled agent 3d6h v1.11.2-aliyun-1.0.207
virtual-kubelet-2 Ready,SchedulingDisabled agent 3d6h v1.11.2-aliyun-1.0.207
virtual-kubelet-3 Ready,SchedulingDisabled agent 66m v1.11.2-aliyun-1.0.207
The reason why you need to off the assembly line elegant vk node is eci pod on vk vk controller node is managed, delete vk controller if there eci pod on vk node, as will result in eci pod is residual, vk controller can not continue manage those pod.
- After the offline node vk, modify the number of copies of virtual-kubelet statefulset, we expect it to reduce the number of node vk.
# kubectl -n kube-system scale statefulset virtual-kubelet --replicas=1
statefulset.apps/virtual-kubelet scaled
# kubectl -n kube-system get pod|grep virtual-kubelet
virtual-kubelet-0 1/1 Running 0 3d6h
Wait for some time, we will find that node vk becomes NotReady state.
# kubectl get no
NAME STATUS ROLES AGE VERSION
cn-hangzhou.192.168.1.1 Ready <none> 66d v1.12.6-aliyun.1
cn-hangzhou.192.168.1.2 Ready <none> 66d v1.12.6-aliyun.1
virtual-kubelet-0 Ready agent 3d6h v1.11.2-aliyun-1.0.207
virtual-kubelet-1 NotReady,SchedulingDisabled agent 3d6h v1.11.2-aliyun-1.0.207
virtual-kubelet-2 NotReady,SchedulingDisabled agent 3d6h v1.11.2-aliyun-1.0.207
virtual-kubelet-3 NotReady,SchedulingDisabled agent 70m v1.11.2-aliyun-1.0.207
- Manually delete NotReady state node vk
# kubelet delete no virtual-kubelet-1 virtual-kubelet-2 virtual-kubelet-3
node "virtual-kubelet-1" deleted
node "virtual-kubelet-2" deleted
node "virtual-kubelet-3" deleted
# kubectl get no
NAME STATUS ROLES AGE VERSION
cn-hangzhou.192.168.1.1 Ready <none> 66d v1.12.6-aliyun.1
cn-hangzhou.192.168.1.2 Ready <none> 66d v1.12.6-aliyun.1
virtual-kubelet-0 Ready agent 3d6h v1.11.2-aliyun-1.0.207