Single cluster exceeds 100,000 edge nodes, KubeEdge is tested on a large scale

Introduction to KubeEdge
KubeEdge is the industry's first cloud-native edge computing framework oriented to edge computing scenarios and designed specifically for edge-cloud collaboration. It realizes application collaboration, resource collaboration, and data collaboration between edge-clouds based on the native container orchestration and scheduling capabilities of Kubernetes. and device collaboration capabilities, completely opening up the scenario of cloud, edge, and device collaboration in edge computing.Insert image description here

The KubeEdge architecture mainly consists of three parts: cloud and edge. The cloud is a unified control plane, including native Kubernetes management components and KubeEdge's self-developed CloudCore component, which is responsible for monitoring changes in cloud resources and providing reliable and efficient cloud-edge message synchronization. The side is mainly the EdgeCore component, including Edged, MetaManager, EdgeHub and other modules. It is responsible for the life cycle management of the container by receiving messages from the cloud. The terminal side is mainly device mapper and eventBus, which are responsible for the access of terminal-side devices.

kubeedge-archInsert image description here

KubeEdge uses the Kubernetes control plane as its base and expands the capabilities of application collaboration, resource collaboration, data collaboration, and device collaboration between edge clouds by pulling nodes apart. Currently, the scale officially supported by the Kubernetes community is 5,000 nodes and 150,000 Pods. In the edge computing scenario, with the advent of the Internet of Everything era, this scale is far from enough. With the access of large-scale edge devices, the demand for scalability and centralized management of edge computing platforms will increase. How to use as few cloud resources and clusters as possible to manage as many edge devices as possible, simplifying infrastructure management and Operation and maintenance. On the basis of being fully compatible with the native capabilities of Kubernetes, KubeEdge optimizes the cloud edge message channel and transmission mechanism, breaking through the management scale of native Kubernetes and supporting larger-scale edge node access and management.

SLIs/SLOs

Scalability and performance are important features of Kubernetes clusters. As a user of K8s clusters, you expect service quality guarantees in the above two aspects. Before conducting large-scale performance testing of Kubernetes + KubeEdge, we need to define how to measure service indicators in large-scale cluster scenarios. The Kubernetes community has defined the following SLIs (Service Level Indicator)/SLOs (service-level objective) indicators. We will use these indicators to measure cluster service quality.

API Call Latency
Status SLI SLO
Official Single Object Mutating API P99 latency in the last 5 minutes except aggregation API and CRD, P99 <= 1s
Official Non-streaming read-only P99 API latency in the last 5 minutes except aggregation API and CRD Scope= resource, P99 <= 1s Scope=namespace, P99 <= 5s Scope=cluster, P99 <= 30s
Pod Startup Latency
Status SLI SLO
Official Stateless Pod startup time (excluding pulling images and Init Container), from pod createTimestamp to all P99 time when all containers are reported to be started and observed by watch P99 <= 5s
WIP Stateful Pod startup time (excluding pulling images and Init Container), from pod createTimestamp to when all containers are reported to be started and observed by watch P99 time TBD
The community also defines In-Cluster Network Programming Latency (the delay in which Service updates or changes in its Ready Pod are finally reflected in Iptables/IPVS rules), In-cluster network latency, and DNS Programming Latency (when Service updates or changes in its Ready Pod are reflected in the dns server delay), DNS Latency and other indicators, these indicators have not yet been quantified. Meeting all SLOs is the goal of large-scale cluster testing, so this report mainly tests Official status SLIs/SLOs.

Kubernetes scalability dimensions and thresholds
The scalability characteristics of Kubernetes not only refer to the node size, that is, Scalability != #nodes. In fact, Kubernetes scalability includes measurement standards in many dimensions, including the number of namespaces, the number of Pods, and the number of services. The number of secrets/configmap, etc. The following figure is an important dimension defined by the Kubernetes community to describe cluster scalability (still being continuously updated):

k8s-scalability

It is obviously impossible for a Kubernetes cluster to expand resource objects without limit and meet all SLIs/SLOs indicators. For this reason, the industry has defined Kubernetes resource upper limits in multiple dimensions.

  1. Pods/node 30
  2. Backends <= 50k & Services <= 10k & Backends/service <= 250
  3. Pod churn 20/s
  4. Secret & configmap/node 30
  5. Namespaces <= 10k & Pods <= 150k & Pods/namespace <= 3k
  6. ​…
    Each dimension is not completely independent. When one dimension is stretched, other dimensions will be compressed, which can be adjusted according to the usage scenario. For example, if 5k nodes are stretched to 10k nodes, the specifications in other dimensions will inevitably be affected. If all scenarios are tested and analyzed, the workload will be very huge. In this test, we will focus on selecting typical scenario configurations for testing and analysis. On the basis of meeting SLIs/SLOs, a single cluster can support 100k edge nodes and 1000k pod scale management.

Test tool
ClusterLoader2
ClusterLoader2 is an open source Kubernetes cluster load testing tool. This tool can test the SLIs/SLOs indicators defined by Kubernetes and verify whether the cluster meets various service quality standards. In addition, Clusterloader2 provides visual data for cluster problem location and cluster performance optimization. ClusterLoader2 will eventually output a Kubernetes cluster performance report, showing a series of performance indicator test results.

Clusterloader2 performance indicators:

APIResponsivenessPrometheusSimple
APIResponsivenessPrometheus
CPUProfile
EtcdMetrics
MemoryProfile
MetricsForE2E
PodStartupLatency
ResourceUsageSummary
SchedulingMetrics
SchedulingThroughput
WaitForControlledPodsRunning
WaitForRunningPods
Edgemark Edgemark
is a performance testing tool similar to Kubemark. It is mainly used in KubeEdge cluster scalability testing to simulate KubeEdge edge nodes. , build ultra-large scale with limited resources Kubernetes + KubeEdge cluster aims to expose cluster management issues that only occur in large-scale clusters. The Edgemark deployment method is as follows:

edgemark-deploy

k8s master - Kubernetes physical cluster master node
edgemark master - Kubernetes simulated cluster master node
CloudCore - KubeEdge cloud management component, responsible for edge node access
hollow pod - pod started on the physical cluster, register with edgemark master by starting edgemark in the pod Become a virtual edge node, and the edgemark master can schedule pods to the virtual edge node.
Hollow edgeNode - simulates the visible node in the cluster, which is a virtual node and is registered by the hollow pod.
Test cluster deployment plan
deploy

The Kubernetes base management plane is deployed with a single master. ETCD, Kube-apiserver, Kube-Scheduler, and Kube-Controller are all deployed with a single instance. The KubeEdge management plane CloudCore is deployed with 5 instances. Kube-apiserver is connected through the master node IP, and the southbound pass Load Balancer exposes services to the outside world, and edge nodes randomly connect to a CloudCore instance through Load Balancer polling policy.

Guess you like

Origin blog.csdn.net/xiuqingzhouyang/article/details/129167344