[Repost] Getting started from scratch K8s | Stateful Application Orchestration – StatefulSet

Getting started from scratch K8s | Stateful Application Orchestration – StatefulSet

https://www.kubernetes.org.cn/6724.html

 

Author | Jiuzhu Alibaba Technical Expert

This article is compiled from Lecture 22 of "CNCF x Alibaba Cloud Native Technology Open Course".

Follow the "Alibaba Cloud Native" public account and reply to the keyword "Getting Started" to download the PPT of the K8s series of articles from scratch.

Introduction: The deployment and delivery of stateful applications has always been one of the difficulties in the field of application operation and maintenance. Common stateful requirements such as the persistent state of the disk, each machine needs an independent and stable network identification, and the certainty of the release order. In response to such problems, Kubernetes provides a StatefulSet controller as a workload to help stateful application deployment and landing in the K8s environment.

1. "Stateful" requirements

We talked about Deployment as an application orchestration management tool before. What functions does it provide for us?

As shown below:

1.png

  • First of all, it supports the definition of the expected number of Pods. The Controller will maintain the number of Pods in the desired version and the expected number for us;
  • Second, it supports the configuration of the Pod release method. After the configuration is complete, the Controller will update the Pod according to the strategy we give. At the same time, during the update process, it will also ensure that the number of unusable Pods is within the range we define;
  • Third, if we encounter problems during the release process, Deployment also supports one-click rollback.

To put it simply, Deployment believes that all the same versions of Pods it manages are identical copies. In other words, from the perspective of the Deployment Controller, all Pods of the same version, regardless of the application or behavior deployed inside, are exactly the same.

Such a capability is supported for stateless applications. What if we encounter some stateful applications?

demand analysis

For example, some requirements shown in the figure below:
2.png

All of the above requirements cannot be met by Deployment, so the Kubernetes community provides us with a resource called StatefluSet to manage stateful applications.

StatefulSet: controller for stateful application management

In fact, many stateless applications in the community are also managed through StatefulSet. Through the study of this article, everyone will also understand why we also manage some stateless applications through StatefulSet.

3.png

As shown on the right side of the figure above, the Pods in the StatefulSet are all numbered, starting from 0 until the number of defined replicas decreases by one. Each Pod has an independent network identification: a hostname, an independent pvc and pv storage. In this case, different Pods under the same StatefulSet have different network identifications and their own exclusive storage disks, which can meet the needs of most stateful applications.

As shown on the right side of the above picture:

  • First, each Pod will have an Order number, which will be created, deleted, and updated according to the number;
  • Secondly, by configuring a headless service, each Pod has a unique network identifier (hostname);
  • Third, by configuring the pvc template, which is the pvc template, each Pod has one or more pv storage disks;
  • Finally, a certain number of grayscale releases are supported. For example, there are now three copies of StatefulSet, we can specify to upgrade only one or two of them, or even three to the new version. In this way, to achieve the purpose of grayscale upgrade.

Second, use case interpretation

StatefulSet example creation

4.png

The left side of the above figure is a Service configuration. By configuring the headless Service, we actually want to achieve the goal: expect the Pod in the StatefulSet to have an independent network identification. The Service name here is called nginx.

The right side of the above figure is a StatefulSet configuration. There is a serviceName in the spec also called nginx. Use this serviceName to specify which Service this StatefulSet corresponds to.

There are several other familiar fields in this spec, such as selector and template. selector is a label selector. The label selection logic defined by the selector must match the labels in the metadata in the template, including app: nginx. Define an nginx container in the template, the image version used by this container is the alpine version, and the exposed port 80 is used as a web service.

Finally, a volumeMounts is defined in template.spec, this volumeMounts is not from a Volumes in the spec, but from volumeClaimTemplates, which is the pvc template. We have defined a pvc name called www-storage in the pvc template. This pvc name, we will also write volumeMounts as a volume name, mounted in the / usr / share / nginx / html directory. In this way, each pod has an independent pvc and is mounted to the corresponding directory in the container.

Service, StatefulSet state

5.png

After creating the two objects above, we can see that the Service nginx resource has been successfully created through the get command.

At the same time, you can see by looking at endpoints that this backend has registered three IPs and ports. These three IPs correspond to Pod's IPs, and the ports correspond to the 80 ports configured in the previous spec.

Finally, get sts (abbreviation of StatefulSet) nginx-web. As you can see from the result, there is a column called READY with a value of 3/3. The denominator 3 is the desired number in the StatefulSet, and the numerator 3 indicates that the Pod has reached the desired number of states in READY.

Pod, PVC status

The get pod in the figure below shows that the three Pods are in Running state and are READY. Its IP is the endpoint address seen earlier.

6.png

You can see the name of NAME through get pvc, the prefix is ​​www-storage, the middle is nginx-web, and the suffix is ​​a serial number. Through analysis, we can know that www-storage is the name defined in volumeClaimTemplates, the name defined in the middle is StatefulSet, and the serial number at the end corresponds to the serial number of the Pod, that is, the three PVCs are bound by three Pods. In this way, different Pods enjoy different PVCs; PVC will also bind a corresponding PV to achieve the purpose of binding different PVs with different Pods.

Pod version

7.png

Earlier we learned that Deployment uses ReplicaSet to manage the Pod version and the expected number of Pods, but in the StatefulSet, the StatefulSet Controller manages the subordinate Pods, so the StatefulSet uses the Pod label to identify the version of this Pod, called here controller-revision-hash. This label is similar to the Pod template hash injected by the Deployment and StatefulSet in the Pod.

As shown in the figure above, the controller-revision-hash is viewed through the get pod. The hash here is the template version corresponding to the first creation of the Pod. You can see the suffix is ​​677759c9b8. Let's record here first, and then do the Pod upgrade, and then see if the controller-revision-hash will change.

Update mirror

8.png

By executing the above command, you can see that in the StatefulSet configuration below the image above, the image in the StatefulSet has been updated to the new version of the mainline.

View new version status

9.png

By querying the revision hash through the get pod command, you can see that the controller-revision-hash behind the three pods has been upgraded to the new Revision hash, which later becomes 7c55499668. Through the creation time of these three Pods, it can be found that the Pod with the serial number 2 is the earliest, and then the serial numbers are 1 and 0. This means that during the upgrade process, the actual upgrade sequence is 2-1-0, and the Pod is gradually upgraded to the new version through such a reverse order, and the Pod we upgraded also reuses the PVC used by the previous Pod. Therefore, the data in the PV storage disk will still be mounted on the new Pod.

The upper right of the above figure is the data seen in the StatefulSet's status. There are several important fields:

  • currentReplica: indicates the number of the current version
  • currentRevision: indicates the current version number
  • updateReplicas: indicates the number of new versions
  • updateRevision: indicates the version number to be updated

Of course, you can also see that currentReplica and updateReplica, as well as currentRevision and updateRevision are the same, which means that all Pods have been upgraded to the required version.

3. Operation demonstration

StatefulSet orchestration file

First of all, here has been connected to a cluster of Alibaba Cloud, there are three nodes in the cluster.
10.png

Now to create a StatefulSet and corresponding Service, first look at the corresponding layout file.
11.png

As shown in the example in the figure above, nginx corresponding to Service exposes port 80 to the outside world. The metadata in the StatefulSet configuration defines the name as nginx-web; the containers in the template define the image information; and finally define a volumeClaimTemplates as the PVC template.

Start creating

12.png

After executing the above command, we have successfully created Service and StatefulSet. Through get pod, you can see that the Pod created first has a serial number of 0; through get pvc, you can see that the PVC with serial number 0 has been bound to the PV.

13.png

The Pod with the sequence number 0 has already been created and the status is ContainerCreating.

14.png

When the Pod with the serial number 0 is created, the Pod with the serial number 1 starts to be created, and then see that the new PVC has also been successfully created, followed by the Pod with the serial number 2.

15.png

You can see that before each Pod is created, a PVC is created. After the PVC is created, the Pod binds to the PV from the Pending state, then becomes ContainerCreating, and finally reaches Running.

View status

Then check the state of StatefulSet through kubectl get sts nginx-web -o yaml.

16.png

As shown in the figure above, the expected number of replicas is 3, the currently available number is 3, and the latest version is reached.

17.png

Then look at Service and endpoints, you can see that the Port of Service is 80, and the endpoint has three corresponding IP addresses.

18.png

Come to get pod again, you can see that the three pods correspond to the IP addresses of the above endpoints.

The results of the above operations are: three PVCs and three Pods have reached the desired state, and in the status reported by the StatefulSet, there are three replicas and currentReplicas.

Upgrade operation

19.png

Here again, kubectl set image is a fixed wording for declaring images; StatefulSet indicates a voluntary type; nginx-web is the resource name; nginx = nginx: mainline, nginx before the equal sign is the container name we defined in the template, and the latter nginx: mainline is the image version that you want to update.

Through the above command, the image in the StatefulSet has been successfully updated to the new version.

20.png

Looking at the status through get pod, nginx-web-1 and nginx-web-2 have entered the Running state. The corresponding controller-revision-hash is already a new version. Then the pod of nginx-web-0, the old pod has been deleted, and the new pod is still in the Creating state.

21.png

Check the status again, all Pods are already in Running status.

22.png

Looking at the StatefulSet information, the currentRevision defined in the status in the StatefulSet has been updated to a new version, indicating that the three Pods that the StatefulSet has acquired have entered the new version.

23.png

How to check whether these three Pods still reuse the previous network logo and storage disk?

In fact, the hostname configured by the headless service is only linked to the Pod name, so as long as the upgraded Pod name is the same as the old Pod name, the network ID used by the previous Pod can be used.

Regarding the storage disks, you can see the status of PVCs from the above picture. Their creation time has not changed. It is the time when the Pod was first created, so now the upgraded Pod uses the PVC used in the old Pod.

24.png

For example, you can view one of the Pods. This Pod also has a declared volume. The name www-storage-nginx-web-0 in the persistentVolumeClaim corresponds to the PVC with the serial number 0 seen in the PVC list. Used by old Pods. During the upgrade process, the Controller deletes the old Pod and creates a new Pod with the same name. The new Pod still reuses the PVC used by the old Pod.

In this way, the purpose of network storage can be reused before and after the upgrade.

Four, architecture design

Management Mode

StatefulSet may create three types of resources.

  • The first resource: ControllerRevision

Through this resource, StatefulSet can easily manage different versions of template templates.

For example: For the nginx mentioned above, the first template version at the beginning of creation will create a corresponding ControllerRevision. When the image version is modified, the StatefulSet Controller will create a new ControllerRevision. You can understand that each ControllerRevision corresponds to each version of the Template, and also corresponds to each version of the ControllerRevision hash. In fact, the ControllerRevision hash defined in the Pod label is the name of ControllerRevision. Through this resource StatefulSet Controller to manage different versions of template resources.

  • The second resource: PVC

If volumeClaimTemplates is defined in the StatefulSet, the StatefulSet will create a PVC based on this template before creating the Pod and add the PVC to the Pod volume.

If the user defines volumeClaimTemplates in the pvc template of the spec, StatefulSet creates a PVC according to the template before creating the Pod and adds it to the volume corresponding to the Pod. Of course, you can not define the pvc template in the spec, then the created Pod will not mount a single pv.

  • The third resource: Pod

StatefulSet creates, deletes, and updates Pods in sequence, and each Pod has a unique serial number.
25.png

As shown in the figure above, StatefulSet Controller is Owned three resources: ControllerRevision, Pod, PVC.

The difference here is that the current version of StatefulSet will only add OwnerReference in ControllerRevision and Pod, but will not add OwnerReference in PVC. As mentioned in the previous series of articles, the resource with OwnerReference will delete the subordinate resources by cascading by default when the resource under management is deleted. Therefore, after the StatefulSet is deleted by default, the ControllerRevision and Pod created by the StatefulSet will be deleted, but the PVC will not be deleted by cascading because the OwnerReference is not written to the PVC.

StatefulSet controller

26.png

The picture above shows the workflow of the StatefulSet controller. Let's briefly introduce the entire workflow.

First, register the Event Handler of the Informer to handle the changes of StatefulSet and Pod. In the logic of the Controller, every time a StatefulSet or Pod changes, it will find the corresponding StatefulSet and put it in the queue. Immediately after being taken out of the queue for processing, the first operation is Update Revision, that is, first check the template in the current StatefulSet, whether there is a corresponding ControllerRevision. If not, it means that the template has been updated, and the Controller will create a new version of Revision, and there will be a new ControllerRevision hash version number.

Then the Controller will take out all the version numbers and sort them according to the serial number. During this sorting process, if there is a missing Pod, it will be created according to the serial number, if it is found to have excess Pod, it will be deleted according to the serial number. When the number of Pods and Pod serial numbers are guaranteed to meet the number of Replica, the Controller will check to see if the Pod needs to be updated. In other words, the difference between these two steps is that Manger pods in order to check whether all Pods meet the serial number; and the latter Update in order to check whether the desired version of the Pod meets the requirements, and update by serial number.

Update in order The update process is shown in the figure above. In fact, this process is relatively simple, that is, deleting the Pod. After deleting the Pod, it is actually the next trigger event. After the Controller gets this success, it will find that the Pod is missing, and then create a new Pod from the previous step Manger pod in order. After this, the Controller will do an update status, which is the status information that was seen through the command line before.

Through this entire process, StatefulSet achieves the ability to manage stateful applications.

Capacity expansion simulation

27

Suppose the initial configuration of StatefulSet replicas is 1, there is a Pod0. Then after modifying the replicas from 1 to 3, we actually create Pod1 first. By default, we wait for Pod1 status READY before creating Pod2.

As you can see from the picture above, the Pods under each StatefulSet are created starting from sequence number 0. Therefore, a StatefulSet with replicas N is created with a Pod sequence number [0, N), 0 is an open curve, and N is a closed curve, that is, when N> 0, the sequence numbers are 0 to N-1.

Expansion management strategy

28.png

Some students may have questions: If I do n’t want to create and delete according to the serial number, then StatefulSet also supports other creation and deletion logic, which is why some people in the community also manage stateless applications through StatefulSet. Its advantage is that it can have a unique network identification and network storage, and can also be expanded and reduced in a concurrent manner.

There is a field in StatefulSet.spec called podMangementPolicy field. The optional strategies of this field are OrderedReady and Parallel, which is the former by default.

As in the example we just created, podMangementPolicy is not defined in the spec. Then the controller defaults to OrderedReady as a strategy, and then in the case of OrderedReady, expansion and contraction are strictly performed in the order of Order. You must wait for the previous Pod state to be Ready before expanding the next Pod. When shrinking, delete in reverse order, and delete the sequence number from large to small.

For example, when expanding from Pod0 to Pod0, Pod1, and Pod2 on the right side of the figure above, you must first create Pod1, and wait for Pod1 Ready to create Pod2. In fact, there is a possibility: For example, when creating Pod1, Pod0 may become the NotReady state for some reason, which may be the cause of the host machine or the reason of the application itself. At this time, the Controller will not create Pod2, so not only the previous Pod we created needs to be Ready, but all the previous Pods must be Ready, and then the next Pod will be created. In the example above, if you want to create Pod2, then Pod0 and Pod1 must be ready.

Another strategy is called Parallel. As the name implies, it is to expand and shrink in parallel. You do not need to wait for the previous Pod to be Ready or delete it before processing the next one.

Post simulation

29.png

Assuming that the StatefulSet template1 here corresponds to the logical Revision1, then the three Pods under the StatefulSet belong to the Revision1 version. After we modified the template, such as the image, the Controller upgraded the Pods one by one in reverse order. In the figure above, you can see that the Controller first created a Revision2, corresponding to the creation of a resource such as ControllerRevision2, and the name of the Resource ControllerRevision2 as a new Revision hash. After upgrading Pod2 to the new version, delete Pod0 and Pod1 one by one, and then create Pod0 and Pod1.

Its logic is actually very simple. During the upgrade process, the Controller will delete the Pod with the highest serial number and meet the conditions. Then after the deletion, the next time the Controller is doing reconcile, it will find the Pod that lacks this serial number, and then follow the new version Create the Pod.

Spec field analysis

30.png

First look at the first few fields in the spec. Replica and Selector are the fields we are more familiar with.

  • Replica is mainly the expected quantity;
  • Selector is an event selector and must match the conditions defined in spec.template.metadata.labels;
  • Template: Pod template, which defines the basic information template of the Pod to be created;
  • VolumeClaimTemplates: List of PVC templates. If this is defined in the spec, PVC will be created before the Pod template. After the PVC is created, the created PVC name is injected into the Pod created according to the Template as a volume.

31.png

  • ServiceName: the name corresponding to the Headless Service. Of course, if someone does not need this function, Service will be assigned a non-existent value, and the Controller will not do the verification, so you can write a fake ServiceName. However, it is recommended to configure a Headless Service for each Service, regardless of whether the Pod under the StatefulSet requires a network identification;
  • PodMangementPolicy: Pod management strategy. As mentioned earlier, the optional strategies for this field are OrderedReady and Parallel, which is the former by default;
  • UpdataStrategy: Pod upgrade strategy. This is a structure, described in detail below;
  • RevisionHistoryLimit: Limit the number of ControllerRevisions to keep history (default is 10). It should be noted that the clear version here, there must be no related Pod corresponding to these versions, if there are Pods still in this version, this ControllerRevision cannot be deleted.

Upgrade strategy field analysis

32.png

On the right side of the above picture, you can see that StatefulSetUpdateStrategy has a type field, which defines two types: one is RollingUpdate; one is OnDelete.

  • RollingUpdate is actually a bit similar to the upgrade in Deployment, which is to upgrade according to the rolling upgrade method;
  • OnDelete is upgraded when it is deleted. It is called to prohibit active upgrade. The Controller does not actively upgrade the surviving Pods, but through OnDelete. For example, there are currently three old Pods, but the upgrade strategy is OnDelete, so when updating the mirror in the spec, the Controller will not upgrade the three Pods to the new version one by one, but when we shrink the Replica, the Controller will first Delete the Pod. When we expand the capacity next time, the Controller will expand the new version of the Pod.

In RollingUpdateStatefulSetSetStrategy, you can see there is a field called Partition. This Partition means that the number of Pods in the old version is retained during the rolling upgrade. Many students who have just finished StatefulSet may think this is the number of new versions of grayscale, which is wrong.

For example: suppose there is currently a StatefulSet with replicas of 10. When we update the version, if the Partition is 8, it does not mean that we need to update the 8 Pods to the new version, but that we need to keep the 8 Pods as the old version. , Only update 2 new versions as grayscale. When Replica is 10, the following Pod serial number is [0,9), so when we configure Partition to 8, actually retain [0,7) these 8 Pods are old versions, only [8,9) Enter the new version.

To summarize, suppose replicas = N, Partition = M (M

V. Summary of this section

This is the end of the main content of this article, here is a brief summary for everyone:

    • StatefulSet is a common workload in Kubernetes. Its initial goal is to deploy stateful applications, but it also supports the deployment of stateless applications;
    • Unlike Deployment, StatefulSet directly operates Pod to expand / contract / publish, and is not controlled by other workloads like ReplicaSet;
    • The characteristics of StatefulSet are: support for each Pod exclusive PVC, have a unique network identification, and can also reuse PVC and network identification after upgrade release;

Guess you like

Origin www.cnblogs.com/jinanxiaolaohu/p/12503531.html