Series of articles on K8S large-scale cluster optimization solution - the first article

     ▲ Click on the "DevOps and k8s full-stack technology" above to follow the official account

Since Kubernetes v1.6, it has been officially announced that a single cluster supports a maximum of 5000 nodes. But this is only in theory. In practice, from 0 to 5000, there is still a long way to go, and we need to see the tricks.

The official standards are as follows:

  • No more than 5000 nodes

  • No more than 150,000 pods

  • No more than 300,000 containers

  • No more than 100 pods per node

Master node configuration optimization

GCE recommended configuration:

  • 1-5 nodes: n1-standard-1

  • 6-10 nodes: n1-standard-2

  • 11-100 nodes: n1-standard-4

  • 101-250 nodes: n1-standard-8

  • 251-500 nodes: n1-standard-16

  • More than 500 nodes: n1-standard-32

AWS recommended configuration:

  • 1-5 nodes: m3.medium

  • 6-10 nodes: m3.large

  • 11-100 nodes: m3.xlarge

  • 101-250 nodes: m3.2xlarge

  • 251-500 nodes: c4.4xlarge

  • More than 500 nodes: c4.8xlarge

The corresponding CPU and memory are:

  • 1-5 nodes: 1vCPU 3.75G memory

  • 6-10 nodes: 2vCPU 7.5G memory

  • 11-100 nodes: 4vCPU 15G memory

  • 101-250 nodes: 8vCPU 30G memory

  • 251-500 nodes: 16vCPU 60G memory

  • More than 500 nodes: 32vCPU 120G memory

kube-apiserver optimization

high availability

  • Method 1: Start multiple kube-apiserver instances for load balancing through external LB.

  • Method 2: Setting  --apiserver-count and  --endpoint-reconciler-typecan make multiple kube-apiserver instances join the endpoints of Kubernetes Service, so as to achieve high availability.

However, since TLS will multiplex connections, neither of the above two methods can achieve real load balancing. In order to solve this problem, a current limiter can be implemented on the server side, and when the request reaches the threshold, the client is notified to back off or refuse the connection, and the client cooperates to implement the corresponding load switching mechanism.

Control connections

The following two parameters of kube-apiserver can control the number of connections:

--max-mutating-requests-inflight int           The maximum number of mutating requests in flight at a given time. When the server exceeds this, it rejects requests. Zero for no limit. (default 200)
--max-requests-inflight int                    The maximum number of non-mutating requests in flight at a given time. When the server exceeds this, it rejects requests. Zero for no limit. (default 400)

When the number of nodes is between 1000 - 3000, it is recommended to:

--max-requests-inflight=1500
--max-mutating-requests-inflight=500

When the number of nodes is greater than 3000, it is recommended to:

--max-requests-inflight=3000
--max-mutating-requests-inflight=1000

kube-scheduler and kube-controller-manager optimization

high availability

kube-controller-manager and kube-scheduler achieve high availability through leader election, and the following parameters need to be added when enabling:

--leader-elect=true
--leader-elect-lease-duration=15s
--leader-elect-renew-deadline=10s
--leader-elect-resource-lock=endpoints
--leader-elect-retry-period=2s

Control QPS

The qps limit for communicating with kube-apiserver is recommended as:

--kube-api-qps=100

Kubelet optimization

  • set up --image-pull-progress-deadline=30m

  • Setup  --serialize-image-pulls=false(requires Docker to use overlay2)

  • The maximum number of Pods allowed to run on a Kubelet single node: --max-pods=110(The default is 110, which can be set according to actual needs)

Cluster DNS High Availability

Set up anti-affinity so that the cluster DNS (kube-dns or coredns) is distributed on different nodes to avoid single point of failure:

affinity:
 podAntiAffinity:
   requiredDuringSchedulingIgnoredDuringExecution:
   - weight: 100
     labelSelector:
       matchExpressions:
       - key: k8s-app
         operator: In
         values:
         - kube-dns
     topologyKey: kubernetes.io/hostname

Wonderful article recommendation

It's the end of the year, sum up this year, full of harvest

Check out these 11 stats for kubernetes in 2022

GitOps Best Practices on Kuberentes

Common daily troubleshooting guide for Kubernetes|dry goods sharing|suitable for all kinds of basic personnel to learn

WeChat public account

   Light up the collection, the server will not be down for 10 years4619d146a1cca6a42b310cfcc55c99b2.gifb2386ceb113c26f64eb355dd98d86b83.gif

Guess you like

Origin blog.csdn.net/weixin_38320674/article/details/128229636