Illustration of the core implementation principle of the kubernetes controller StatefulSet

StatefulSet is a standard implementation of stateful application management in k8s. Today, let's learn about the scenarios and principles behind its design, so as to understand its scope of application and scenarios.

1. Basic Concepts

First introduce some basic things that need to be considered in stateful applications, and then we will look at the key implementation of statefulSet in the next chapter

1.1 Stateful and Stateless

image.png

In the daily development of applications, they can usually be divided into two categories: stateful and stateless. For example, web services are usually stateless, and web application data mainly comes from middleware such as back-end storage and cache, and does not save itself. Data such as redis, es, etc. are also part of the application itself, so it can be seen that the stateful application itself will contain two parts: application and data

1.2 Consistency and Data

image.png

Consistency is a very common problem in distributed systems. As mentioned above, stateful applications include data parts. Are data and consistency the same thing? The answer is not necessarily. In applications such as zookeeper, data is written to most nodes in the cluster through the zab protocol, while in applications such as kafka, the consistency design requirements are relatively low, so it can be seen that The consistency of stateful application data is more determined by the system design of the corresponding scenario

1.3 Identification

In some applications, the identity is a part of the system itself. For example, zookeeper affects the election of the final zab protocol through the id of the server. The allocation of partitions in kafka is also allocated according to the corresponding id.

1.4 Monotonic ordered update

image.png

Usually, in a distributed system, at least partition tolerance must be guaranteed to prevent the failure of some nodes from causing the entire system to become unavailable. The management strategy of Pods in the statefulset in k8s is to ensure that Pod-by-Pod updates are as safe as possible, rather than parallel startup. or stop all pods

1.5 Scaling and Failover

In k8s, the horizontal expansion and contraction are very simple. It is a matter of deleting and adding a Pod, but for stateful applications, we don’t know these things, such as how to balance the data after expansion, and how to failover after a node fails. Do, these are all things that a stateful application needs to consider by itself

2. Core Implementation

The overall process of the implementation mechanism of StatefulSet is relatively simple. Next, it will be explained in turn according to Pod management, state calculation, state management, and update strategy.

2.1 Pod release and adopt

The names of the pods in the statefulset are set according to certain rules, and the names themselves have meaning. When k8s updates the statefulset, it will first filter the pods belonging to the current statefulset, and do the following operationsimage.png

The association between the controller and the Pod in K8s is mainly through two parts: controllerRef and label. When statefulset is filtering Pods, if it finds that the controllerRef of the corresponding pod is the current statefulset but its label or name does not match, it will Try to release the corresponding Pod

On the contrary, if the label and name of the corresponding Pod are found to match, but the controllerRef is not the current statefulSet, the corresponding controllerRef will be updated to the current statefulset, which is called adopt.

This process can ensure that the Pod associated with the current statefulset is either associated with the current object, or I will release you, so that the consistency of the Pod can be maintained, and even if someone modifies the corresponding Pod, it will be adjusted to eventual consistency

2.2 Classification of copies

image.png

After the correction of the Pod state in the first step, statefulset will traverse all its own Pods and divide the Pods into two categories: valid copies and invalid copies (condemned). The names of the Pods mentioned above are also ordered. If there are N copies of Pods, the names are {0...N-1}. The valid and invalid Pods are distinguished here according to the corresponding index order. If the current copy exceeds the current copy, it is an invalid copy.

2.3 Monotonic Update

Monotonic update mainly refers to when the corresponding Pod management strategy is not parallel management, as long as any Pod in the current Replicas (valid copy) is created, terminated or not ready, it will wait for the corresponding Pod to be ready, that is, you To update a Pod of a statefulset, the corresponding Pod must be RunningAndReady

func allowsBurst(set *apps.StatefulSet) bool {
    return set.Spec.PodManagementPolicy == apps.ParallelPodManagement
}

2.4 Counter-based rolling update

image.png

The implementation of rolling update is relatively obscure. It is mainly implemented by controlling the replica count. First, check whether the version of the corresponding Pod is the latest version in reverse order. If it is not found, delete the corresponding Pod directly, and decrement the currentReplica count by one, so that When checking the corresponding Pod, you will find that the corresponding Pod does not exist, you need to generate new Pod information for the corresponding Pod, and the latest copy will be used to update it.

func newVersionedStatefulSetPod(currentSet, updateSet *apps.StatefulSet, currentRevision, updateRevision string, ordinal int) *v1.Pod {
	// 如果发现当前的Pod的索引小于当的副本计数,则表明当前Pod还没更新到,但实际上可能因为别的原因
    // 需要重新生成Pod模板,此时仍然使用旧的副本配置
    if currentSet.Spec.UpdateStrategy.Type == apps.RollingUpdateStatefulSetStrategyType &&
        (currentSet.Spec.UpdateStrategy.RollingUpdate == nil && ordinal < int(currentSet.Status.CurrentReplicas)) ||
        (currentSet.Spec.UpdateStrategy.RollingUpdate != nil && ordinal < int(*currentSet.Spec.UpdateStrategy.RollingUpdate.Partition)) {
        pod := newStatefulSetPod(currentSet, ordinal)
        setPodRevision(pod, currentRevision)
        return pod
    }
    // 使用新的配置生成新的Pod配置
    pod := newStatefulSetPod(updateSet, ordinal)
    setPodRevision(pod, updateRevision)
    return pod
}

2.5 Cleanup of invalid copies

The cleaning of invalid copies should mainly occur when the corresponding statefulset shrinks. If the corresponding copy is found to have been abandoned, it will be deleted directly. By default, the principle of monotonicity is also required here, that is, only one copy is updated each time.

2.6 Deletion-Based Monotonic Updates

        if getPodRevision(replicas[target]) != updateRevision.Name && !isTerminating(replicas[target]) {
            klog.V(2).Infof("StatefulSet %s/%s terminating Pod %s for update",
                set.Namespace,
                set.Name,
                replicas[target].Name)
            err := ssc.podControl.DeleteStatefulPod(set, replicas[target])
            status.CurrentReplicas--
            return &status, err
        }

The version detection of the Pod is at the end of the corresponding consistency synchronization. When the code goes to the current position, it proves that the current statefulSet satisfies the monotonicity, and all the Pods in the valid copy are in the RunningAndReady state. At this time, the reverse order starts. Version check, if the version is found to be inconsistent, the number of parallel updates allowed is determined according to the current number of partitions. After deletion here, the corresponding event will be triggered, thereby triggering the next scheduling event and triggering the next round of consistency check

2.7 OnDelete Strategy

   if set.Spec.UpdateStrategy.Type == apps.OnDeleteStatefulSetStrategyType {
        return &status, nil
    }

In addition to RollingUpdate, there is another StatefulSet update strategy, namely OnDelete, which must manually delete the corresponding Pod to trigger the consistency check, so for those statefulsets that want to update only the specified index, you can try this strategy, and only delete the corresponding index each time, so that only The specified index will be updated to the latest version

2.8 State Storage

State storage is actually what we often call PVC. When a Pod is created and updated, if it finds that the corresponding PVC does not exist, it will create a corresponding PVC according to the configuration in the statefulset, and update the configuration of the corresponding Pod.

3. Stateful Application Summary

It can be seen from the core implementation analysis that the implementation of stateful applications is actually based on the combination of consistent state, monotonic update, and persistent storage. The state of RunningAndReady is guaranteed to be ordered, and data is saved through persistent storage

image.png

The importance of ordering, two common designs in distributed systems are partitions and replicas, where replicas are mainly to ensure availability, and partitions are mainly to distribute data evenly, both of which are usually based on the current cluster. The node is allocated. If our node is upgraded offline for a short time, the data is stored in the corresponding PVC. After the recovery, the information of the node can be quickly restored and rejoined the cluster. Therefore, if we develop this kind of distributed distribution in the future When applying, the underlying recovery and management can be handed over to k8s, and the data is stored in PVC. For applications, you only need to pay attention to the cluster management and data distribution of the system. That is, this is also the change brought by cloud native.

I am here today. It has not been updated for a long time. The process of reading the source code is not easy. Welcome to forward, share and communicate, and make progress together.

kubernetes study notes address: https://www.yuque.com/baxiaoshi/tyado3

WeChat account: baxiaoshi2020 , pay attention to the announcement number to read more source code analysis articles, Graphical source codemore articles follow www.sreguide.com

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324140207&siteId=291194637