Kubernetes StatefulSet source code analysis

Author: [email protected],Based on Kubernetes 1.9

Abstract: Kubernetes StatefulSet is stable in version 1.9. It is believed that more and more enterprises will use it to deploy stateful applications, such as Mysql, Zookeeper, ElasticSearch, Redis, etc. in the future. This article is a source code analysis of StatefulSet, including its Inner Structure, the core logic of Sync, the main process description of Update, a complete Code Logic Diagram and some thoughts.

Inner Structure

Below is a diagram of the internal structure of a simple StatefulSet Controller at work.Enter image description

NewStatefulSetController

Like other Controllers, StatefulSet Controller is also responsible for starting when ControllerManager is initialized.


// NewStatefulSetController creates a new statefulset controller.
func NewStatefulSetController(
	podInformer coreinformers.PodInformer,
	setInformer appsinformers.StatefulSetInformer,
	pvcInformer coreinformers.PersistentVolumeClaimInformer,
	revInformer appsinformers.ControllerRevisionInformer,
	kubeClient clientset.Interface,
) *StatefulSetController {
	
    ...

	ssc := &StatefulSetController{
		kubeClient: kubeClient,
		control: NewDefaultStatefulSetControl(
			NewRealStatefulPodControl(
				kubeClient,
				setInformer.Lister(),
				podInformer.Lister(),
				pvcInformer.Lister(),
				recorder),
			NewRealStatefulSetStatusUpdater(kubeClient, setInformer.Lister()),
			history.NewHistory(kubeClient, revInformer.Lister()),
		),
		pvcListerSynced: pvcInformer.Informer().HasSynced,
		queue:           workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "statefulset"),
		podControl:      controller.RealPodControl{KubeClient: kubeClient, Recorder: recorder},

		revListerSynced: revInformer.Informer().HasSynced,
	}

	podInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
		// lookup the statefulset and enqueue
		AddFunc: ssc.addPod,
		// lookup current and old statefulset if labels changed
		UpdateFunc: ssc.updatePod,
		// lookup statefulset accounting for deletion tombstones
		DeleteFunc: ssc.deletePod,
	})
	ssc.podLister = podInformer.Lister()
	ssc.podListerSynced = podInformer.Informer().HasSynced

	setInformer.Informer().AddEventHandlerWithResyncPeriod(
		cache.ResourceEventHandlerFuncs{
			AddFunc: ssc.enqueueStatefulSet,
			UpdateFunc: func(old, cur interface{}) {
				oldPS := old.(*apps.StatefulSet)
				curPS := cur.(*apps.StatefulSet)
				if oldPS.Status.Replicas != curPS.Status.Replicas {
					glog.V(4).Infof("Observed updated replica count for StatefulSet: %v, %d->%d", curPS.Name, oldPS.Status.Replicas, curPS.Status.Replicas)
				}
				ssc.enqueueStatefulSet(cur)
			},
			DeleteFunc: ssc.enqueueStatefulSet,
		},
		statefulSetResyncPeriod,
	)
	ssc.setLister = setInformer.Lister()
	ssc.setListerSynced = setInformer.Informer().HasSynced

	// TODO: Watch volumes
	return ssc
}

The very familiar code style is to create the corresponding eventBroadcaster, and then register the corresponding eventHandler for the corresponding objectInformer:

  • StatefulSetController main ListWatch Pod and StatefulSet objects;
  • The Pod Informer registers the add/update/delete EventHandler, and these three EventHandlers will add the StatefulSet corresponding to the Pod to the StatefulSet Queue.
  • The StatefulSet Informer also registers the add/update/event EventHandler, and will also add the StatefulSet to the StatefulSet Queue.
  • At present, the StatefulSetController has not yet sensed the EventHandler of the PVC Informer, so it will continue to be processed according to the PVC Controller. When the StatefulSet Controller creates and deletes a Pod, it calls the apiserver to create and delete the corresponding PVC.
  • Similar to RevisionController, the corresponding Revision will be created or deleted when StatefulSet Controller Reconcile.

StatefulSetController sync

Next, it will enter the worker of StatefulSetController (there is only one worker, that is, only one go routine). The worker will pop out a StatefulSet object from the StatefulSet Queue, and then execute sync to perform the Reconcile operation.


// sync syncs the given statefulset.
func (ssc *StatefulSetController) sync(key string) error {
	startTime := time.Now()
	defer func() {
		glog.V(4).Infof("Finished syncing statefulset %q (%v)", key, time.Now().Sub(startTime))
	}()

	namespace, name, err := cache.SplitMetaNamespaceKey(key)
	if err != nil {
		return err
	}
	set, err := ssc.setLister.StatefulSets(namespace).Get(name)
	if errors.IsNotFound(err) {
		glog.Infof("StatefulSet has been deleted %v", key)
		return nil
	}
	if err != nil {
		utilruntime.HandleError(fmt.Errorf("unable to retrieve StatefulSet %v from store: %v", key, err))
		return err
	}

	selector, err := metav1.LabelSelectorAsSelector(set.Spec.Selector)
	if err != nil {
		utilruntime.HandleError(fmt.Errorf("error converting StatefulSet %v selector: %v", key, err))
		// This is a non-transient error, so don't retry.
		return nil
	}

	if err := ssc.adoptOrphanRevisions(set); err != nil {
		return err
	}

	pods, err := ssc.getPodsForStatefulSet(set, selector)
	if err != nil {
		return err
	}

	return ssc.syncStatefulSet(set, pods)
}
  • In sync, match all revisions according to setLabel, and then check whether any OwnerReference is empty in these revisions. If there is, it means that there are Orphaned revisions.

    Note: As long as a History Revision is detected, it will trigger to patch all Resivions:
    {"metadata":{"ownerReferences":[{"apiVersion":"%s","kind":"%s"," name":"%s","uid":"%s","controller":true,"blockOwnerDeletion":true}],"uid":"%s"}}

  • Call getPodsForStatefulSet to get the Pods that this StatefulSet should manage.

    • Get all Pods under Namesapce corresponding to the StatefulSet;
    • Execute the ClaimPods operation: Check whether the Label of the set and the pod match. If the Label does not match, you need to release the Pod, and then check whether the format of the pod name and the StatefulSet name match. For those that match, and the ControllerRef UID is the same, no processing is required.
    • If the Selector and ControllerRef do not match, execute the ReleasePod operation and patch the Pod:{“metadata":{"ownerReferences":[{"$patch":"delete","uid":"%s"}],"uid":"%s"}}
    • For Pods whose Label and name formats can match, but the controllerRef is empty, execute AdoptPod and patch the Pod:{“metadata":{"ownerReferences":[{"apiVersion":"%s","kind":"%s","name":"%s","uid":"%s","controller":true,"blockOwnerDeletion":true}],"uid":"%s"}}

UpdateStatefulSet

The implementation of syncStatefulSet just calls UpdateStatefulSet.

func (ssc *defaultStatefulSetControl) UpdateStatefulSet(set *apps.StatefulSet, pods []*v1.Pod) error {

	// list all revisions and sort them
	revisions, err := ssc.ListRevisions(set)
	if err != nil {
		return err
	}
	history.SortControllerRevisions(revisions)

	// get the current, and update revisions
	currentRevision, updateRevision, collisionCount, err := ssc.getStatefulSetRevisions(set, revisions)
	if err != nil {
		return err
	}

	// perform the main update function and get the status
	status, err := ssc.updateStatefulSet(set, currentRevision, updateRevision, collisionCount, pods)
	if err != nil {
		return err
	}

	// update the set's status
	err = ssc.updateStatefulSetStatus(set, status)
	if err != nil {
		return err
	}

	glog.V(4).Infof("StatefulSet %s/%s pod status replicas=%d ready=%d current=%d updated=%d",
		set.Namespace,
		set.Name,
		status.Replicas,
		status.ReadyReplicas,
		status.CurrentReplicas,
		status.UpdatedReplicas)

	glog.V(4).Infof("StatefulSet %s/%s revisions current=%s update=%s",
		set.Namespace,
		set.Name,
		status.CurrentRevision,
		status.UpdateRevision)

	// maintain the set's revision history limit
	return ssc.truncateHistory(set, pods, revisions, currentRevision, updateRevision)
}

The main process of UpdateStatefulSet is:

  • ListRevisions gets all the Revisions of the StatefulSet and sorts them according to Revision from small to large.
  • getStatefulSetRevisions获取currentRevison和UpdateRevision。
    • Only when the Partition is not 0 in the RollingUpdate strategy, will some Pods be updateRevision.
    • Otherwise, all Pods must maintain currentRevision.
  • updateStatefulSet is the core logic of the StatefulSet Controller, responsible for creating, updating, and deleting Pods, so that the declarative target can be maintained:
    • Make the target state always have Spec.Replicas Running And Ready Pods.
    • If the update strategy is RollingUpdate and the Partition is 0, all Pods are guaranteed to correspond to Status.CurrentRevision.
    • If the update strategy is RollingUpdate and the Partition is not 0, the Pods whose ordinal is smaller than Partition keep Status.CurrentRevision, and the Pods whose ordinal is greater than or equal to Partition are updated to Status.UpdateRevision.
    • If the update policy is OnDelete, the update of the corresponding Pods will be triggered only when the Pods are deleted, that is to say, it is not associated with Revisions.
  • truncateHistory maintains no more than the number of History Revisions .Spec.RevisionHistoryLimit.

updateStatefulSet

updateStatefulSet is the core of the entire StatefulSetController.


func (ssc *defaultStatefulSetControl) updateStatefulSet(
	set *apps.StatefulSet,
	currentRevision *apps.ControllerRevision,
	updateRevision *apps.ControllerRevision,
	collisionCount int32,
	pods []*v1.Pod) (*apps.StatefulSetStatus, error) {
	// get the current and update revisions of the set.
	currentSet, err := ApplyRevision(set, currentRevision)
	if err != nil {
		return nil, err
	}
	updateSet, err := ApplyRevision(set, updateRevision)
	if err != nil {
		return nil, err
	}

	// set the generation, and revisions in the returned status
	status := apps.StatefulSetStatus{}
	status.ObservedGeneration = new(int64)
	*status.ObservedGeneration = set.Generation
	status.CurrentRevision = currentRevision.Name
	status.UpdateRevision = updateRevision.Name
	status.CollisionCount = new(int32)
	*status.CollisionCount = collisionCount

	replicaCount := int(*set.Spec.Replicas)
	// slice that will contain all Pods such that 0 <= getOrdinal(pod) < set.Spec.Replicas
	replicas := make([]*v1.Pod, replicaCount)
	// slice that will contain all Pods such that set.Spec.Replicas <= getOrdinal(pod)
	condemned := make([]*v1.Pod, 0, len(pods))
	unhealthy := 0
	firstUnhealthyOrdinal := math.MaxInt32
	var firstUnhealthyPod *v1.Pod

	// First we partition pods into two lists valid replicas and condemned Pods
	for i := range pods {
		status.Replicas++

		// count the number of running and ready replicas
		if isRunningAndReady(pods[i]) {
			status.ReadyReplicas++
		}

		// count the number of current and update replicas
		if isCreated(pods[i]) && !isTerminating(pods[i]) {
			if getPodRevision(pods[i]) == currentRevision.Name {
				status.CurrentReplicas++
			} else if getPodRevision(pods[i]) == updateRevision.Name {
				status.UpdatedReplicas++
			}
		}

		if ord := getOrdinal(pods[i]); 0 <= ord && ord < replicaCount {
			// if the ordinal of the pod is within the range of the current number of replicas,
			// insert it at the indirection of its ordinal
			replicas[ord] = pods[i]

		} else if ord >= replicaCount {
			// if the ordinal is greater than the number of replicas add it to the condemned list
			condemned = append(condemned, pods[i])
		}
		// If the ordinal could not be parsed (ord < 0), ignore the Pod.
	}

	// for any empty indices in the sequence [0,set.Spec.Replicas) create a new Pod at the correct revision
	for ord := 0; ord < replicaCount; ord++ {
		if replicas[ord] == nil {
			replicas[ord] = newVersionedStatefulSetPod(
				currentSet,
				updateSet,
				currentRevision.Name,
				updateRevision.Name, ord)
		}
	}

	// sort the condemned Pods by their ordinals
	sort.Sort(ascendingOrdinal(condemned))

	// find the first unhealthy Pod
	for i := range replicas {
		if !isHealthy(replicas[i]) {
			unhealthy++
			if ord := getOrdinal(replicas[i]); ord < firstUnhealthyOrdinal {
				firstUnhealthyOrdinal = ord
				firstUnhealthyPod = replicas[i]
			}
		}
	}

	for i := range condemned {
		if !isHealthy(condemned[i]) {
			unhealthy++
			if ord := getOrdinal(condemned[i]); ord < firstUnhealthyOrdinal {
				firstUnhealthyOrdinal = ord
				firstUnhealthyPod = condemned[i]
			}
		}
	}

	if unhealthy > 0 {
		glog.V(4).Infof("StatefulSet %s/%s has %d unhealthy Pods starting with %s",
			set.Namespace,
			set.Name,
			unhealthy,
			firstUnhealthyPod.Name)
	}

	// If the StatefulSet is being deleted, don't do anything other than updating
	// status.
	if set.DeletionTimestamp != nil {
		return &status, nil
	}

	monotonic := !allowsBurst(set)

	// Examine each replica with respect to its ordinal
	for i := range replicas {
		// delete and recreate failed pods
		if isFailed(replicas[i]) {
			glog.V(4).Infof("StatefulSet %s/%s is recreating failed Pod %s",
				set.Namespace,
				set.Name,
				replicas[i].Name)
			if err := ssc.podControl.DeleteStatefulPod(set, replicas[i]); err != nil {
				return &status, err
			}
			if getPodRevision(replicas[i]) == currentRevision.Name {
				status.CurrentReplicas--
			} else if getPodRevision(replicas[i]) == updateRevision.Name {
				status.UpdatedReplicas--
			}
			status.Replicas--
			replicas[i] = newVersionedStatefulSetPod(
				currentSet,
				updateSet,
				currentRevision.Name,
				updateRevision.Name,
				i)
		}
		// If we find a Pod that has not been created we create the Pod
		if !isCreated(replicas[i]) {
			if err := ssc.podControl.CreateStatefulPod(set, replicas[i]); err != nil {
				return &status, err
			}
			status.Replicas++
			if getPodRevision(replicas[i]) == currentRevision.Name {
				status.CurrentReplicas++
			} else if getPodRevision(replicas[i]) == updateRevision.Name {
				status.UpdatedReplicas++
			}

			// if the set does not allow bursting, return immediately
			if monotonic {
				return &status, nil
			}
			// pod created, no more work possible for this round
			continue
		}
		// If we find a Pod that is currently terminating, we must wait until graceful deletion
		// completes before we continue to make progress.
		if isTerminating(replicas[i]) && monotonic {
			glog.V(4).Infof(
				"StatefulSet %s/%s is waiting for Pod %s to Terminate",
				set.Namespace,
				set.Name,
				replicas[i].Name)
			return &status, nil
		}
		// If we have a Pod that has been created but is not running and ready we can not make progress.
		// We must ensure that all for each Pod, when we create it, all of its predecessors, with respect to its
		// ordinal, are Running and Ready.
		if !isRunningAndReady(replicas[i]) && monotonic {
			glog.V(4).Infof(
				"StatefulSet %s/%s is waiting for Pod %s to be Running and Ready",
				set.Namespace,
				set.Name,
				replicas[i].Name)
			return &status, nil
		}
		// Enforce the StatefulSet invariants
		if identityMatches(set, replicas[i]) && storageMatches(set, replicas[i]) {
			continue
		}
		// Make a deep copy so we don't mutate the shared cache
		replica := replicas[i].DeepCopy()
		if err := ssc.podControl.UpdateStatefulPod(updateSet, replica); err != nil {
			return &status, err
		}
	}

	// At this point, all of the current Replicas are Running and Ready, we can consider termination.
	// We will wait for all predecessors to be Running and Ready prior to attempting a deletion.
	// We will terminate Pods in a monotonically decreasing order over [len(pods),set.Spec.Replicas).
	// Note that we do not resurrect Pods in this interval. Also not that scaling will take precedence over
	// updates.
	for target := len(condemned) - 1; target >= 0; target-- {
		// wait for terminating pods to expire
		if isTerminating(condemned[target]) {
			glog.V(4).Infof(
				"StatefulSet %s/%s is waiting for Pod %s to Terminate prior to scale down",
				set.Namespace,
				set.Name,
				condemned[target].Name)
			// block if we are in monotonic mode
			if monotonic {
				return &status, nil
			}
			continue
		}
		// if we are in monotonic mode and the condemned target is not the first unhealthy Pod block
		if !isRunningAndReady(condemned[target]) && monotonic && condemned[target] != firstUnhealthyPod {
			glog.V(4).Infof(
				"StatefulSet %s/%s is waiting for Pod %s to be Running and Ready prior to scale down",
				set.Namespace,
				set.Name,
				firstUnhealthyPod.Name)
			return &status, nil
		}
		glog.V(4).Infof("StatefulSet %s/%s terminating Pod %s for scale dowm",
			set.Namespace,
			set.Name,
			condemned[target].Name)

		if err := ssc.podControl.DeleteStatefulPod(set, condemned[target]); err != nil {
			return &status, err
		}
		if getPodRevision(condemned[target]) == currentRevision.Name {
			status.CurrentReplicas--
		} else if getPodRevision(condemned[target]) == updateRevision.Name {
			status.UpdatedReplicas--
		}
		if monotonic {
			return &status, nil
		}
	}

	// for the OnDelete strategy we short circuit. Pods will be updated when they are manually deleted.
	if set.Spec.UpdateStrategy.Type == apps.OnDeleteStatefulSetStrategyType {
		return &status, nil
	}

	// we compute the minimum ordinal of the target sequence for a destructive update based on the strategy.
	updateMin := 0
	if set.Spec.UpdateStrategy.RollingUpdate != nil {
		updateMin = int(*set.Spec.UpdateStrategy.RollingUpdate.Partition)
	}
	// we terminate the Pod with the largest ordinal that does not match the update revision.
	for target := len(replicas) - 1; target >= updateMin; target-- {

		// delete the Pod if it is not already terminating and does not match the update revision.
		if getPodRevision(replicas[target]) != updateRevision.Name && !isTerminating(replicas[target]) {
			glog.V(4).Infof("StatefulSet %s/%s terminating Pod %s for update",
				set.Namespace,
				set.Name,
				replicas[target].Name)
			err := ssc.podControl.DeleteStatefulPod(set, replicas[target])
			status.CurrentReplicas--
			return &status, err
		}

		// wait for unhealthy Pods on update
		if !isHealthy(replicas[target]) {
			glog.V(4).Infof(
				"StatefulSet %s/%s is waiting for Pod %s to update",
				set.Namespace,
				set.Name,
				replicas[target].Name)
			return &status, nil
		}

	}
	return &status, nil
}

Main process:

  • Get the StatefulSet Object corresponding to currentRevision and updateRevision, and set generation, currentRevision, updateRevision and other information to StatefulSet status.

  • Divide the pods obtained by getPodsForStatefulSet into two slices:

    • valid replicas slice: : 0 <= getOrdinal(pod) < set.Spec.Replicas
    • condemned pods slice: set.Spec.Replicas <= getOrdinal(pod)
  • If there are some ordinals in valid replicas that do not have corresponding Pods, create a Pods Object corresponding to Revision, and then create a corresponding Pod instance when it is detected that the Pod is not actually created:

    • If the update strategy is RollingUpdate and Partition is 0 or ordinal < Partition, use currentRevision to create the Pod Object.
    • If RollingUpdate is used to update the policy and the Partition is not 0 and ordinal >= Partition, use updateRevision to create the Pod Object.
  • Find the first unhealthy Pod from the two slices of valid repilcas and condemned pods. (ordinal smallest unhealth pod)

    healthy pods means:pods is running and ready, and not terminating.

  • For the StatefulSet that is being deleted (DeletionTimestamp is not empty), do nothing and return the current status directly.

  • Traverse the pods in the valid replicas to ensure that the pods with index in [0, spec.replicas) in the valid replicas are Running and Ready:

    • If a pod is detected as Failed (pod.Status.Phase = Failed), delete the pod and renew the pod object (note that the revisions match)
    • If this pod has not been recreated, create it.
    • If ParallelPodManagement = "OrderedReady", return the current status directly. Otherwise ParallelPodManagement = "Parallel", then loop to detect the next one.
    • If the pod is being deleted and ParallelPodManagement = "OrderedReady", return status to end.
    • If the pod is not in RunningAndReady state and ParallelPodManagement = "OrderedReady", return status to end.
    • Check whether the pod matches the identity and storage of the statefulset. If there is a mismatch, call apiserver Update Stateful Pod to updateIdentity and updateStorage (and create the corresponding PVC), return status, and end.

    Pod is Running and Ready means:
    pod.Status.Phase = Runnin,
    pod.Status.Condition = Ready

  • Traverse the pods in the condemned replicas, index from large to small, to ensure that these pods are eventually deleted:

    • If the Pod is being deleted (DeletionTimestamp) and the Pod Management is OrderedReady, block it, return status, and the process ends.
    • If it is the OrderedReady policy, the Pod is not in the Running and Ready state, and the pod is not the first unhealthy pod, it returns status and the process ends.
    • Otherwise, delete the statefulset pod.
    • Obtain Revision according to the controller-revision-hash Label of the pod. If it is equal to currentRevision, update status.CurrentReplicas; if it is equal to updateRevision, update status.UpdatedReplicas;
    • If it is the OrderedReady strategy, status is returned and the process ends.
  • OnDelete update strategy: Deleting a Pod will trigger the update of the ordinal. If the UpdateStrategy Type is OnDelete, it means that Recreate will only be triggered when the corresponding Pods are manually deleted, so the status is returned directly, and the process ends.

  • RollingUpdate update strategy: (If the Partition is not set, it is equivalent to 0, which means that all pods are updated in a rolling manner.) If the UpdateStrategy Type is RollingUpdate, updateMin is Partitionobtained as the minimum value of the update replicas index interval according to the configuration in RollingUpdate, and the valid replicas are traversed, and the index is from the maximum value to Descending order of updateMin:

    • If the pod revision is not updateRevision and is not being deleted, delete the pod, update status.CurrentReplicas, and return status, the process ends.
    • If the pod is not healthy, it will wait for it to become healthy, so the status is returned directly here, and the process ends.

Identity Match

In updateStatefulSet Reconcile, the identity match will be checked. What is included?


StatefulSetPodNameLabel        = "statefulset.kubernetes.io/pod-name"


// identityMatches returns true if pod has a valid identity and network identity for a member of set.
func identityMatches(set *apps.StatefulSet, pod *v1.Pod) bool {
	parent, ordinal := getParentNameAndOrdinal(pod)
	return ordinal >= 0 &&
		set.Name == parent &&
		pod.Name == getPodName(set, ordinal) &&
		pod.Namespace == set.Namespace &&
		pod.Labels[apps.StatefulSetPodNameLabel] == pod.Name
}
  • Pod name and statefulset name content match.
  • namespace matches.
  • Pod's Label: statefulset.kubernetes.io/pod-nameTrue match with the Pod name.

Storage Match

In updateStatefulSet Reconcile, the Storage match will be checked. How does it match?

// storageMatches returns true if pod's Volumes cover the set of PersistentVolumeClaims
func storageMatches(set *apps.StatefulSet, pod *v1.Pod) bool {
	ordinal := getOrdinal(pod)
	if ordinal < 0 {
		return false
	}
	volumes := make(map[string]v1.Volume, len(pod.Spec.Volumes))
	for _, volume := range pod.Spec.Volumes {
		volumes[volume.Name] = volume
	}
	for _, claim := range set.Spec.VolumeClaimTemplates {
		volume, found := volumes[claim.Name]
		if !found ||
			volume.VolumeSource.PersistentVolumeClaim == nil ||
			volume.VolumeSource.PersistentVolumeClaim.ClaimName !=
				getPersistentVolumeClaimName(set, &claim, ordinal) {
			return false
		}
	}
	return true
}

Code Logic Diagram

Based on the above analysis, the following is a relatively complete code logic diagram of StatefulSetController. (It does not support pictures larger than 2MB, so it is not very clear, but it is basically mentioned in the previous content.)

Enter image description

think

An exception occurred during rolling update

In the previous blog post, there is a problem left in the analysis of Kubernetes StatefulSet : What happens if a Pod fails to update when the StatefulSet is updated rollingly?

Through the analysis of the rolling update part of the above source code analysis, we know that:

  • If UpdateStrategy Type is RollingUpdate, according to RollingUpdate Partition(Partition is not set equal to 0, which means rolling update of all pods), updateMin is configured as the minimum value of update replicas index interval, and valid replicas are traversed, and the index decreases from the maximum value to updateMin. :
    • If the pod revision is not updateRevision and is not being deleted, delete the pod, update status.CurrentReplicas, and return status, the process ends.
    • If the pod is not healthy, it will wait for it to become healthy, so the status is returned directly here, and the process ends.

Knowing this, the answer to this question is simple:

  • If the update strategy is RollingUpdate, during the rolling update process one by one, if the Pod cannot reach the Running and Ready state when an ordinal replica is updated, the entire rolling update process will be blocked here. Replicas that have not been updated will not trigger the update, and replicas that have been updated successfully will keep the updated version, and there is no automatic rollback mechanism. In the next sync, it is detected that the Pod isFailed (pod.Status.Phase = Failed), and the failed pod will be deleted and recreated.

When podManagementPolicy is set to Parallel, where is it reflected?

Question: When is podManagementPolicy: "Parallel" reflected? Scale time? When RollingUpdate?

  • In the previous code analysis, the paragraph in updateStatefulSet - "traverse the pods in valid replicas to ensure that the pods with index in the valid replicas [0, spec.replicas) are Running and Ready": if an ordinal replica is found, it should be created but still If it has not been created, create will be triggered. If podManagementPolicy is set to Parallel, delete then createother replicas that should be created will continue without waiting for the previously created replicas to become Running and Ready.
  • In the previous code analysis, the section in updateStatefulSet - "traverse the pods in the condemned replicas, index from large to small to ensure that these pods are eventually deleted": podManagementPolicy is set to Parallel, if you find that an ordinal replica is being deleted, continue Delete other replicas that should be deleted without waiting for the previously deleted replica to rebuild and become Running and Ready.

Therefore Parallel is reflected in the following scenarios:

  • When the StatefulSet is initially deployed, create pods in parallel.
  • When deleting a StatefulSet in a cascade, delete pods in parallel.
  • When Scale up, create pods in parallel.
  • When Scale down, delete pods in parallel.

During rolling update, it will not be affected by the configuration of podManagementPolicy. RollingUpdate is performed one by one and in the order of ordinal from large to small to ensure the former principle of Running and Ready.

What if the update strategy is OnDelete? That situation is different from RollingUpdate, because the update process is reflected in the two stages mentioned above, so Parallel will work.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324440197&siteId=291194637