Cloud Native: An in-depth analysis of the best practices and most common problems of Kubernetes Operator

1. Introduction to Kubernetes Operator

  • Kubernetes Operator is a set of processes that connect to the main API and watch time. They generally watch limited resource types. When the relevant watch event is triggered, the operator responds and performs specific actions. This may be limited to interacting with the main API, but will usually involve performing some operations on some other system (which can be an in-cluster or off-cluster resource).
  • Kubernetes operators are processes that connect to the main API and monitor events, typically on a limited number of resource types. When relevant events occur, operators react and perform specific actions.

Insert image description here

  • An operator is implemented as a set of controllers, each of which monitors a specific resource type. When a relevant event occurs on a monitored resource, a coordination cycle is initiated. Operators are a collection of controllers, and each controller watches a specified resource type. When the watched resource time is triggered, the coordination cycle will also start. During the coordination cycle, it is the controller's responsibility to check whether the current state matches the desired state described by the watched resource. Interestingly, by design, time is not passed into the coordination cycle, which forces you to consider the entire state of the instance. This approach is called level-based rather than edge-based. as opposed to edge-based). This comes from the design of electronic circuits, horizontal triggering is the idea of ​​receiving an event (such as an interrupt) and reacting to a state, while edge-based triggering is the idea of ​​receiving an event and reacting to a state change.
  • Horizontal triggering is less efficient because it forces a re-evaluation of the complete state rather than just focusing on what has changed, but it is considered more suitable in complex and unreliable environments where signals may be lost or retransmitted multiple times. . This design choice affects the way we write controller code. The following provides a high-level summary:

Insert image description here

  • When requests are sent to the API server, especially for create and delete requests, they will go through the stages described above. Note that webhooks can also be specified to perform requested changes and validations. If the operator introduces a CRD (custom resource definition), these webhooks may also have to be defined. Generally speaking, the operator process will open a port to implement the webhook endpoint.
  • If the operator introduces a new CRD, the Operator SDK will help you build it. To ensure that the CRD complies with the best practices of the Kubernetes extension API, please follow these conventions. All the best practices mentioned here are in the operator-utils code base and are embodied as runnable examples. In the operator project, operator-utils can also be imported as a library to provide some useful tools.

2. Create watches

  • As we said, the controller monitors events on the resource, which is achieved by abstracting the watch. watch is a mechanism that receives a type (core type or CRD). A watch mechanism is generally created by specifying the following:
    • The resource type you want to watch;
    • The handler maps events on the monitored type to one or more instances that call the coordination cycle. The monitoring type and instance type do not have to be the same;
    • A predicate is a set of customizable functions that can filter events.
  • The above mentioned content is recorded as follows:

Insert image description here

  • Generally speaking, it is feasible to open multiple watches of the same type (kind) because watches are multiplexed. You should also try to filter events as much as possible. Here is an example of a predicate to filter events on secret resources. We are only interested in secret resource events of type TLS:
isAnnotatedSecret := predicate.Funcs{
    
    
    UpdateFunc: func(e event.UpdateEvent) bool {
    
    
        oldSecret, ok := e.ObjectOld.(*corev1.Secret)
        if !ok {
    
    
            return false
        }
        newSecret, ok := e.ObjectNew.(*corev1.Secret)
        if !ok {
    
    
            return false
        }
        if newSecret.Type != util.TLSSecret {
    
    
            return false
        }
        oldValue, _ := e.MetaOld.GetAnnotations()[certInfoAnnotation]
        newValue, _ := e.MetaNew.GetAnnotations()[certInfoAnnotation]
        old := oldValue == "true"
        new := newValue == "true"
        // if the content has changed we trigger if the annotation is there
        if !reflect.DeepEqual(newSecret.Data[util.Cert], oldSecret.Data[util.Cert]) ||
            !reflect.DeepEqual(newSecret.Data[util.CA], oldSecret.Data[util.CA]) {
    
    
            return new
        }
        // otherwise we trigger if the annotation has changed
        return old != new
    },
    CreateFunc: func(e event.CreateEvent) bool {
    
    
        secret, ok := e.Object.(*corev1.Secret)
        if !ok {
    
    
            return false
        }
        if secret.Type != util.TLSSecret {
    
    
            return false
        }
        value, _ := e.Meta.GetAnnotations()[certInfoAnnotation]
        return value == "true"
    },
}
  • A very common pattern is to observe events on created (and owned) resources and periodically perform a reconcile cycle on the CR that owns those resources. To do this, you can use the EnqueueRequestForOwner handler, which is done as follows:
err = c.Watch(&source.Kind{
    
    Type: &examplev1alpha1.MyControlledType{
    
    }}, &handler.EnqueueRequestForOwner{
    
    })
  • Another, less common case is when an event is spread across multiple resources. Consider a situation where a controller injects a route for a TLS secret. Multiple routes in the same namespace can point to the same secret. If the secret changes, all routes need to be updated. Therefore, a watch mechanism needs to be created on the secret type, with a handler like this:
type enqueueRequestForReferecingRoutes struct {
    
    
        client.Client
}

// trigger a router reconcile event for those routes that reference this secret
func (e *enqueueRequestForReferecingRoutes) Create(evt event.CreateEvent, q workqueue.RateLimitingInterface) {
    
    
        routes, _ := matchSecret(e.Client, types.NamespacedName{
    
    
                Name:      evt.Meta.GetName(),
                Namespace: evt.Meta.GetNamespace(),
        })
        for _, route := range routes {
    
    
                q.Add(reconcile.Request{
    
    NamespacedName: types.NamespacedName{
    
    
                        Namespace: route.GetNamespace(),
                        Name:      route.GetName(),
                }})
        }
}

// Update implements EventHandler
// trigger a router reconcile event for those routes that reference this secret
func (e *enqueueRequestForReferecingRoutes) Update(evt event.UpdateEvent, q workqueue.RateLimitingInterface) {
    
    
        routes, _ := matchSecret(e.Client, types.NamespacedName{
    
    
                Name:      evt.MetaNew.GetName(),
                Namespace: evt.MetaNew.GetNamespace(),
        })
        for _, route := range routes {
    
    
                q.Add(reconcile.Request{
    
    NamespacedName: types.NamespacedName{
    
    
                        Namespace: route.GetNamespace(),
                        Name:      route.GetName(),
                }})
        }
}

3. Resource Reconciliation Cycle

  • The reconciliation cycle is where the framework transfers control to us after the watch event is passed. As explained before, no time type information is obtained in this reconcile cycle because it works based on level triggering.
  • The following is a model for managing a common reconcile cycle of a CRD controller. Like any other model, it does not reflect any specific use case, but hopefully it will help solve problems encountered when writing operators:

Insert image description here

  • As you can see from the picture above, the main steps are:
    • Retrieve CR instances of interest;
    • Confirm the validity of the instance and will not do anything on illegal instances;
    • Initialize the instance: If some values ​​of the instance have not been initialized, they will be processed in this step;
    • Determine the deletion status of the instance. If the instance is being deleted, some special cleanup needs to be done.
  • Manage the business logic of the controller. If the above steps are passed, you can finally manage and execute the reconcile logic of the instance. This logic is different for each controller.

4. Resource verification

  • There are two types of verification:
    • Syntax verification: Verify by defining OpenAPI rules;
    • Semantic validation: This can be accomplished by creating a ValidatingAdmissionConfiguration.
  • Note: The validity of the CR cannot be verified in the controller. Once the CR is accepted by the API Server, it will be stored in Etcd. After the CR is stored in Etcd, the controller that manages the CR resource cannot reject it. If the CR is not Legal, the controller will error when trying to use or handle it.
  • Recommendation: Since there is no guarantee that the ValidatingAdmissionConfiguration is created or works properly, the CR should still be verified inside the controller. If the CR is invalid, you should avoid creating an infinite error loop.

① Grammar check

  • OpenAPI validation rules can be added as describedGenerating CRD.
  • Recommendation: Perform syntax validation for custom resource models as much as possible. Use syntax validation whenever possible because it is relatively simple and prevents malformed CRs from being stored in etcd.

② Semantic verification

  • Semantic verification is to ensure that fields have reasonable values ​​so that the entire resource record is meaningful. Semantic validation business logic depends on the concept represented by the CR and must be coded by the developer of the operator.
  • If semantic validation is required for a given CR, then the operator needs to expose a webhook and a ValidatingAdmissionConfiguration should be created as part of the operator deploymen.
  • Current limitations:
    • In OpenShift 3.11, ValidatingAdmissionConfigurations is still in technology preview (will be supported starting in 4.1);
    • Operator SDK does not support scaffolding webhook, you can use kubebuilder to implement it:
kubebuilder webhook --group crew --version v1 --kind FirstMate --type=mutating --operations=create,update

③ Verify the resources in the controller

  • The best approach is to reject an invalid CR directly instead of accepting it and saving it in Etcd and then applying error conditions to it. Of course, it is also possible that ValidatingAdmissionConfiguration has not been deployed or is not available at all, so it is still a good practice to perform semantic verification in the controller code. What should be done is that it can be shared between ValidatingAdmissionConfiguration and the controller. This part of the structured code.
  • The code in the controller that calls the verification method is as follows:
if ok, err := r.IsValid(instance); !ok {
    
    
    return r.ManageError(instance, err)
}
  • Note that if validation fails, manage this error as described in the Error Management section. The IsValid function is as follows:
func (r *ReconcileMyCRD) IsValid(obj metav1.Object) (bool, error) {
    
    
    mycrd, ok := obj.(*examplev1alpha1.MyCRD)
 // validation logic
}

5. Resource initialization

  • A good convention in Kubernetes is that the user only initializes the resource fields he needs, and others can be omitted. But from a coder's and debugger's perspective, it's actually better to initialize all fields. This allows you to not always verify that fields are defined when coding, and can easily troubleshoot error conditions.
  • To initialize resources, there are two options:
    • Define the initialization method in the controller;
    • Define a MutatingAdmissionConfiguration (a program similar to ValidatingAdmissionConfiguration);
  • Define an initialization method in the controller, the code should look like this example:
if ok := r.IsInitialized(instance); !ok {
    
    
    err := r.GetClient().Update(context.TODO(), instance)
    if err != nil {
    
    
        log.Error(err, "unable to update instance", "instance", instance)
        return r.ManageError(instance, err)
    }
    return reconcile.Result{
    
    }, nil
}
  • If the result of the IsInitialized method returns true, update the instance and return, which will immediately start another reconciliation cycle. The second call to the IsInitialized method will return false, and the code logic will be executed to the next part.

① Resource Finalization

  • If the resource is not part of an operator-controlled CR, but action needs to be taken when deleting the CR, a finalizer must be used. Finalizers provide a mechanism to notify the Kubernetes control plane that an action needs to be performed before standard Kubernetes garbage collection logic can be performed. Resources can have one or more finalizers, and each controller should manage its own finalizer and ignore others.
  • Pseudocode algorithm for managing finalizers:
    • If needed, add a finalizer in the initialization method.
    • When the resource is deleted, check whether the finalizer owned by this controller exists.
      • Cleanup is successful, remove finalizer and update CR;
      • If it fails decide whether to try again or give up and possibly leave garbage behind (this is acceptable in some cases);
      • If it does not exist, return directly;
      • If it exists, execute the following cleanup logic: If the cleanup logic requires adding additional resources, you need to remember that other resources cannot be created in the namespace being deleted. Deleting the namespace will trigger the finalizer and delete all resources under it.
  • The code looks like this:
if util.IsBeingDeleted(instance) {
    
    
    if !util.HasFinalizer(instance, controllerName) {
    
    
        return reconcile.Result{
    
    }, nil
    }
    err := r.manageCleanUpLogic(instance)
    if err != nil {
    
    
        log.Error(err, "unable to delete instance", "instance", instance)
        return r.ManageError(instance, err)
    }
    util.RemoveFinalizer(instance, controllerName)
    err = r.GetClient().Update(context.TODO(), instance)
    if err != nil {
    
    
        log.Error(err, "unable to update instance", "instance", instance)
        return r.ManageError(instance, err)
    }
    return reconcile.Result{
    
    }, nil
}

② Resource ownership

  • Resource ownership is a native concept in Kubernetes that determines how resources are deleted. By default, when a resource is deleted, its sub-resources are also deleted (this behavior can be turned off by setting cascade=false). This behavior helps ensure proper garbage collection of resources, especially when the resource controls other resources in a multi-level hierarchy (deployment-> repilcaset->pod).
  • Recommendation: If a controller creates a resource and its lifecycle is associated with other resources (kubernetes core resources or other CRs), then this resource should be set as the owner of the other resources, as follows:
controllerutil.SetControllerReference(owner, obj, r.GetScheme())
  • Additional rules regarding ownership are as follows:
    • Parent and child resources must be in the same namespace;
    • Namespace resources can own cluster resources, and an object can have an owner list. If multiple namespace objects own the same cluster resource, each object should declare ownership without overwriting the ownership of other objects;
    • Cluster resources cannot have namespace resources;
    • A cluster resource can own another cluster resource.

6. Status management

  • Status is a standard part of a resource and is used to report the status of the resource. Status will be used here to report the results of the last execution of the coordination loop. More information can also be added to Status.
  • Under normal circumstances, if resources need to be updated every time a reconcile cycle is executed, this will trigger the update time, which will lead to infinite triggering of the reconcile cycle. Therefore, as described above, Status should be treated as a subresource. Using this method, the state of the resource can be updated without adding the ResourceGeneration metadata field.
  • Use the following command to update the status:
err = r.Status().Update(context.Background(), instance)
  • Now you need to write a predicate for the watch mechanism to discard update events that do not increase ResourceGeneration. You can use GenerationChangePredicate to complete this function. As mentioned above, when using finalizer, it should be set during initialization. If finalizer is the only item initialized, since it is part of the metadata item, ResourceGeneration will not be incremented.
  • To illustrate this use case, here is a modified version of predicate:
type resourceGenerationOrFinalizerChangedPredicate struct {
    
    
 predicate.Funcs
}

// Update implements default UpdateEvent filter for validating resource version change
func (resourceGenerationOrFinalizerChangedPredicate) Update(e event.UpdateEvent) bool {
    
    
 if e.MetaNew.GetGeneration() == e.MetaOld.GetGeneration() && reflect.DeepEqual(e.MetaNew.GetFinalizers(), e.MetaOld.GetFinalizers()) {
    
    
  return false
 }
 return true
}
  • Assume status looks like this:
type MyCRStatus struct {
    
    
 // +kubebuilder:validation:Enum=Success,Failure
 Status     string      `json:"status,omitempty"`
 LastUpdate metav1.Time `json:"lastUpdate,omitempty"`
 Reason     string      `json:"reason,omitempty"`
}
  • You can write a function to manage and ensure the successful execution of the reconcile cycle:
func (r *ReconcilerBase) ManageSuccess(obj metav1.Object) (reconcile.Result, error) {
    
    
 runtimeObj, ok := (obj).(runtime.Object)
 if !ok {
    
    
  log.Error(errors.New("not a runtime.Object"), "passed object was not a runtime.Object", "object", obj)
  return reconcile.Result{
    
    }, nil
 }
 if reconcileStatusAware, updateStatus := (obj).(apis.ReconcileStatusAware); updateStatus {
    
    
  status := apis.ReconcileStatus{
    
    
   LastUpdate: metav1.Now(),
   Reason:     "",
   Status:     "Success",
  }
  reconcileStatusAware.SetReconcileStatus(status)
  err := r.GetClient().Status().Update(context.Background(), runtimeObj)
  if err != nil {
    
    
   log.Error(err, "unable to update status")
   return reconcile.Result{
    
    
    RequeueAfter: time.Second,
    Requeue:      true,
   }, nil
  }
 } else {
    
    
  log.Info("object is not RecocileStatusAware, not setting status")
 }
 return reconcile.Result{
    
    }, nil
}

7. Error management

  • If the controller enters an error condition and an error is returned in the reconcile method, the operator will print the error log to standard output and the reconlie event will be scheduled again immediately (the default scheduler should actually detect if the The same error occurs and the corresponding scheduling time is increased. From experience, this does not happen). If the error persists, the error loop will always exist, and the error condition will be invisible to the user.
  • There are two ways to notify the user that an error has occurred, and they can be used simultaneously:
    • Return an error in the object's status field;
    • Generate an event describing the error.
  • Additionally, if the error resolves itself, the reconcile cycle should be rescheduled after a period of time. Generally speaking, cycle time grows exponentially, so on each iteration, the reconcile event cycle gets longer and longer (e.g. twice the amount of time each time).
  • Now build state management to handle error conditions:
func (r *ReconcilerBase) ManageError(obj metav1.Object, issue error) (reconcile.Result, error) {
    
    
    runtimeObj, ok := (obj).(runtime.Object)
    if !ok {
    
    
        log.Error(errors.New("not a runtime.Object"), "passed object was not a runtime.Object", "object", obj)
        return reconcile.Result{
    
    }, nil
    }
    
    var retryInterval time.Duration
    r.GetRecorder().Event(runtimeObj, "Warning", "ProcessingError", issue.Error())
    if reconcileStatusAware, updateStatus := (obj).(apis.ReconcileStatusAware); updateStatus {
    
    
        lastUpdate := reconcileStatusAware.GetReconcileStatus().LastUpdate.Time
        lastStatus := reconcileStatusAware.GetReconcileStatus().Status
        status := apis.ReconcileStatus{
    
    
            LastUpdate: metav1.Now(),
            Reason:     issue.Error(),
            Status:     "Failure",
        }

        reconcileStatusAware.SetReconcileStatus(status)
        err := r.GetClient().Status().Update(context.Background(), runtimeObj)
        if err != nil {
    
    
            log.Error(err, "unable to update status")
            return reconcile.Result{
    
    
                RequeueAfter: time.Second,
                Requeue:      true,
            }, nil
        }

        if lastUpdate.IsZero() || lastStatus == "Success" {
    
    
            retryInterval = time.Second
        } else {
    
    
            retryInterval = status.LastUpdate.Sub(lastUpdate).Round(time.Second)
        }
    } else {
    
    
        log.Info("object is not RecocileStatusAware, not setting status")
        retryInterval = time.Second
    }

    return reconcile.Result{
    
    
        RequeueAfter: time.Duration(math.Min(float64(retryInterval.Nanoseconds()*2), float64(time.Hour.Nanoseconds()*6))),
        Requeue:      true,
    }, nil
}
  • Note that this function immediately sends an event, then updates the status with an error condition, and finally calculates when to reschedule the next reconcile. The algorithm attempts to double the time of each loop, up to six hours. Six hours is a good upper limit since events last approximately 6 hours, so this should ensure there is always an active event describing the current error condition.

Guess you like

Origin blog.csdn.net/Forever_wj/article/details/135029603
Recommended