[Learning Edge Container Series from 0 to 1-4] Distributed Node Status Judgment Mechanism for Weak Network Environment

Introduction

In edge scenarios, the network is often unreliable, and it is easy to accidentally trigger the Kubernetes eviction mechanism, which may cause Pod eviction actions that do not meet expectations. TKE Edge pioneered a distributed node state determination mechanism, which can better identify the timing of eviction and ensure that the system is in a weak network. Normal operation, avoid service interruption and fluctuation.

In the context of edge computing, the network environment between edge nodes and the cloud is very complex, network quality cannot be guaranteed, and scenarios where the connection between APIServer and nodes are likely to be interrupted. If you use native Kubernetes without modification, the node status will often be abnormal, which will cause the Kubernetes expulsion mechanism to take effect, leading to the expulsion of Pod and the loss of Endpoint, and ultimately resulting in service interruption and fluctuation. In order to solve this problem, the TKE edge container team proposed a distributed node state determination mechanism for edge nodes in the weak network environment of edge clusters, which can better identify the timing of eviction.

background

Different from the central cloud, in the edge scenario, we must first face the weak network environment at the edge of the cloud. Edge devices are often located in edge cloud computer rooms and mobile edge sites. The network environment connected to the cloud is very complicated and not as reliable as the central cloud. This includes not only the unreliable network environment of the cloud (control end) and the edge end, but also the unreliable network environment between edge nodes. Even between different computer rooms in the same area, it is impossible to assume that the network quality between nodes is good.

Taking the smart factory as an example, the edge nodes are located in the warehouse and workshop, and the control master node is in the central computer room of Tencent Cloud.

img

The network between edge devices in warehouses and workshops and cloud clusters is more complicated. Internet, 5G, WIFI and other forms are possible, and there is no guarantee for uneven network quality; however, compared with the cloud network environment, due to the warehouse It is a local network with the edge devices in the workshop, so the network quality is definitely better than the connection with the cloud cluster, which is relatively more reliable.

The challenge

Native Kubernetes processing

The problem caused by the weak cloud edge network is that it affects the communication between the kubelet running on the edge node and the cloud APIServer. The cloud APIServer cannot receive the heartbeat of the kubelet or renew the lease, and cannot accurately obtain the running status of the node and the pod on the node. If the duration exceeds the set threshold, APIServer will consider the node unavailable and take the following actions:

  • The state of the lost node is set to NotReady or Unknown state, and NoSchedule and NoExecute taints are added
  • The Pod on the lost node is expelled and rebuilt on other nodes
  • The Pod on the lost node is removed from the Endpoint list of the Service

Demand scenario

Look at an audio and video streaming scene. Audio and video services are an important application scenario for edge computing, as shown in the figure:

img

Taking into account user experience and company costs, audio and video streaming often needs to increase the edge cache hit rate to reduce back to the source. It is common practice to schedule the same file requested by the user to the same service instance and to cache files in the service instance.

However, in the case of native Kubernetes, if the Pod is frequently rebuilt due to network fluctuations, on the one hand, it will affect the cache effect of the service instance, and on the other hand, it will cause the scheduling system to schedule user requests to other service instances. Undoubtedly, these two points will have a great impact on the CDN effect, and it is even unacceptable.

In fact, the edge node is completely operating normally, and Pod eviction or reconstruction is actually completely unnecessary. In order to overcome this problem and maintain the continuous availability of services, the TKE edge container team proposed a distributed node state determination mechanism.

solution

Design Principles

Obviously, in the edge computing scenario, it is unreasonable to only rely on the connection between the edge end and the APIServer to judge whether the node is normal. In order to make the system more robust, additional judgment mechanisms need to be introduced.

Compared with the cloud and the edge, the network between edge nodes is more stable. How to use a more stable infrastructure to improve accuracy? We pioneered the edge health distributed node status judgment mechanism. In addition to considering the connection between the node and APIServer, we also introduced the edge node as an evaluation factor to make a more comprehensive status judgment on the node. After testing and a lot of practice, this mechanism greatly improves the accuracy of the system's node state judgment under the weak cloud edge network, and escorts the stable operation of the service.

The main principle of the mechanism:

  • Each node periodically detects the health status of other nodes
  • All nodes in the cluster regularly vote to determine the status of each node
  • The cloud and edge nodes jointly determine the node status

First, detect and vote between nodes to jointly determine whether a specific node has an abnormal state, and ensure that the majority of nodes can determine the specific state of the node; in addition, although the network state between nodes is generally optimal However, it should be noted that the network situation between edge nodes is also very complicated, and the network between them is not 100% reliable.

Therefore, the network between nodes cannot be completely trusted, and the state of a node cannot be determined by the node alone. The joint decision by the cloud edge is more reliable. Based on this consideration, we made the following design:

img

Program characteristics

It should be noted that when the cloud determines that the node is abnormal, but other nodes think that the node is normal, although the existing Pod will not be expelled, in order to ensure the stability of the incremental service, no new Pod will be scheduled to the node , The normal operation of the stock also benefits from the edge autonomy of the edge cluster;

In addition, due to the particularity of the edge network and topology, there is often a single point of failure of the network between the node groups. For example, in the example of the factory, although the warehouse and the workshop belong to the area of ​​the factory, there may be a network between the two The connection relies on a critical link. Once this link is interrupted, it will cause the split between the node groups. Our solution can ensure that the two split node groups will always maintain a majority when they determine each other. Being judged to be abnormal, avoid being judged to be abnormal and causing Pod to be scheduled to only a small number of nodes, resulting in excessive node load.

In addition, edge devices are likely to be located in different regions and not communicate with each other. It is obviously inappropriate to let nodes on the network not communicate with each other. In order to deal with this situation, our solution also supports grouping of nodes, and the nodes in each grouping check the status of each other. Considering that it is possible to regroup nodes, the mechanism also supports real-time grouping of nodes without the need to redeploy detection components or reinitialize.

The detection mechanism is turned off by default. If you need to operate, you can enter the basic information-turn on Edge Health (default off). If you need to group nodes, you can continue to turn on "Open Multi-Region", and then group the nodes. The grouping method is to edit and add nodes accordingly. Label; if the nodes are not grouped after the multi-region check is enabled, each node is a group by default, and other nodes will not be checked.

img

img

During the development of this feature, we also discovered a node taint-related Kubernetes community bug and proposed a fix .

Future outlook

In the future, we will support more inspection methods to enhance stability in various scenarios; in addition, some current open source decentralized cluster state detection management projects cannot fully satisfy edge scenarios in some scenarios, such as cluster splits. In the later stage, we will try to integrate and learn to meet our needs.

Open source project SuperEdge

Currently, this component has been open sourced as part of the edge container project SuperEdge ( https://github.com/superedge/superedge). Welcome to star. Below is the WeChat group. WeChat companies can join WeChat.

img

Public cloud product TKE Edge

The product has been open the whole amount, please feel free to edge container service console to experience ~

Previous wonderful recommendations of the edge series

Guess you like

Origin blog.51cto.com/14120339/2595412