A brief analysis of the source code of the process of deleting pod in kubernetes

1 Overview:

1.1 Code environment

The version information is as follows:
a, kubernetes cluster: v1.15.4

1.2 Brief description of the process of Pod deletion

When the user executes the kubectl delete pod command (actually with grace-period=30s), it is actually accessing the DELETE interface of kube-apiserver (at this time, the business logic only updates the meta information of the Pod object (DeletionTimestamp field and DeleteGracePeriodSeconds field) ), the record is not deleted in etcd), at this time the execution of the kubectl command will block and show that the pod is being deleted. When the kubelet component listens to the update event of the Pod object, it starts to execute the corresponding callback method (because the DeleteTimestamp field exists, the killPod() method will be executed in the business logic). After a short period of time, kubelet will listen to the pod delete event, and call the corresponding callback method (access the DELETE interface of kube-apiserver with grace-period=0), and then the DELETE interface of kube-apiserver will go The pod object is deleted in etcd. At this time, the user kubectl get pod can really not see the pod object, because the record is really deleted.


The DELETE interface of kube-apiserver is triggered for the first time, and the method returns directly after entering the if statement. The debug screenshot is as follows:
Insert picture description here
The DELETE interface of kube-apiserver is triggered for the second time, without entering the if statement, continue to execute the subsequent business logic ( Delete the pod object from the etcd photo), the debug screenshot is as follows:
Insert picture description here


2 Main source code analysis

2.1 HANDLER of the DELETE interface of kube-apiserver

kube-apiserver is a web server, it will register HTTP Handler when it starts, and the DELETE interface is as follows

func (a *APIInstaller) registerResourceHandlers(path string, storage rest.Storage, ws *restful.WebService) (*metav1.APIResource, error) {

    switch action.Verb {
        
    case "DELETE": // Delete a resource.  删除一个api资源对象
                /*
                其他代码
                */
                //handler的主要逻辑在于restfulDeleteResource()方法
                handler := metrics.InstrumentRouteFunc(action.Verb, group, version, resource, subresource, requestScope, metrics.APIServerComponent,
                
                    restfulDeleteResource(gracefulDeleter, isGracefulDeleter, reqScope, admit))
                    
                route := ws.DELETE(action.Path).To(handler).                    
                    /*
                    其他代码
                    */
                    Returns(http.StatusOK, "OK", versionedStatus).                    
                /*
                其他代码
                */
                addParams(route, action.Params)
                routes = append(routes, route)
    }
}

The business logic of the DELETE interface is actually a static method located in staging/src/k8s.io/apiserver/pkg/endpoints/handlers/delete.go

func restfulDeleteResource(r rest.GracefulDeleter, allowsOptions bool, scope handlers.RequestScope, admit admission.Interface) restful.RouteFunction {
    return func(req *restful.Request, res *restful.Response) {
        //调用一个静态方法,来自staging/src/k8s.io/apiserver/pkg/endpoints/handlers/delete.go
        handlers.DeleteResource(r, allowsOptions, &scope, admit)(res.ResponseWriter, req.Request)
    }
}


//当用户执行kubectl delete pod PODA时,本方法会被触发两次。
//第一次由kubectl的访问而触发
//第二次由kubelet组件的statusManager模块的访问而触发。
func DeleteResource(r rest.GracefulDeleter, allowsOptions bool, scope *RequestScope, admit admission.Interface) http.HandlerFunc {
    return func(w http.ResponseWriter, req *http.Request) {        
        trace := utiltrace.New("Delete " + req.URL.Path)    
        /*
        其他代码
        */
        options := &metav1.DeleteOptions{}        
        trace.Step("About to delete object from database")

        result, err := finishRequest(timeout, func() (runtime.Object, error) {
        	//重点在 r.Delete(...)
            obj, deleted, err := r.Delete(ctx, name, rest.AdmissionToValidateObjectDeleteFunc(admit, staticAdmissionAttrs, scope), options)
            /*
            其他代码
            */
            return obj, err
        })
        /*
        检查性代码
        */
        trace.Step("Object deleted from database")

        status := http.StatusOK
        /*
        其他代码
        */
        //向客户端返回响应
        transformResponseObject(ctx, scope, trace, req, w, status, outputMediaType, result)
    }
}

func (e *Store) Delete(ctx context.Context, name string, deleteValidation rest.ValidateObjectFunc, options *metav1.DeleteOptions) (...) {
    key, err := e.KeyFunc(ctx, name)
    /*
    检查性代码、无关紧要的代码
    */
    if graceful || pendingFinalizers || shouldUpdateFinalizers {
        //更新pod对象的元数据
        err, ignoreNotFound, deleteImmediately, out, lastExisting = e.updateForGracefulDeletionAndFinalizers(ctx, name, key, options, preconditions, deleteValidation, obj)
    }

	//第一次来到此处,直接返回
    // !deleteImmediately covers all cases where err != nil. We keep both to be future-proof.
    if !deleteImmediately || err != nil {
        return out, false, err
    }

    //第二次才会到达此处
    klog.V(6).Infof("going to delete %s from registry: ", name)
    //从etcd中删除对象
    e.Storage.Delete(...)
}

func (e *Store) updateForGracefulDeletionAndFinalizers(...) (...){
    /*
    其他代码
    */
    graceful, pendingGraceful, err := rest.BeforeDelete(e.DeleteStrategy, ctx, existing, options)
    /*
    其他代码
    */
}

func BeforeDelete(...) (...){
    //修改目标对象的元数据:DeletionTimestamp字段和DeletionGracePeriodSeconds字段
    objectMeta.SetDeletionTimestamp(&now)
    objectMeta.SetDeletionGracePeriodSeconds(options.GracePeriodSeconds)
}

2.1 kubelet processing flow

2.1.1 kubelet listens to the update event of the pod object

The syncPod() will be executed in the main loop, and the syncPod() logic will execute the kl.killPod(...) method


func (kl *Kubelet) syncPod(o syncPodOptions) error {
      /*
        其他代码
        */
    //pod对象具备DeletionTimestamp字段则进入if语句
    if !runnable.Admit || pod.DeletionTimestamp != nil || apiPodStatus.Phase == v1.PodFailed {    
        //killPod(..)调用容器运行时来停止pod中容器
        if err := kl.killPod(pod, nil, podStatus, nil); err != nil {
        /*
            其他代码
        */
        } else {
          /*
            其他代码
        */
        }
        return syncErr
    }

}
func (kl *Kubelet) killPod(pod *v1.Pod, runningPod *kubecontainer.Pod, status *kubecontainer.PodStatus, gracePeriodOverride *int64) error {
	var p kubecontainer.Pod
	 /*
            其他代码
    */
	// 调用容器运行时停止pod中的容器
	if err := kl.containerRuntime.KillPod(pod, p, gracePeriodOverride); err != nil {
		return err
	}
	if err := kl.containerManager.UpdateQOSCgroups(); err != nil {
		klog.V(2).Infof("Failed to update QoS cgroups while killing pod: %v", err)
	}
	return nil
}

2.1.2 kubelet listens to the delete event of the pod object

The statusManager's coroutine will execute m.kubeClient.CoreV1().Pods(pod.Namespace).Delete(pod.Name, deleteOptions), so that kube-apiserver will delete the pod object from etcd.


//kubelet组件有一个statusManager模块,它会for循环调用syncPod()方法
//方法内部有机会调用kube-apiserver的DELETE接口(强制删除,非平滑)
func (m *manager) syncPod(uid types.UID, status versionedPodStatus) {
    /*
    其他代码
    */
    //当pod带有DeletionTimestamp字段,并且其内容器已被删除、持久卷已被删除等的多条件下,才会进入if语句内部
    if m.canBeDeleted(pod, status.status) {
        deleteOptions := metav1.NewDeleteOptions(0)
        deleteOptions.Preconditions = metav1.NewUIDPreconditions(string(pod.UID))
        
        //强制删除pod对象:kubectl delete pod podA --grace-period=0
        err = m.kubeClient.CoreV1().Pods(pod.Namespace).Delete(pod.Name, deleteOptions)
        
        /*
        其他代码
        */
    }
}

3 Official English Document-Termination of Pods

Because Pods represent running processes on nodes in the cluster, it is important to allow those processes to gracefully terminate when they are no longer needed (vs being violently killed with a KILL signal and having no chance to clean up).
Users should be able to request deletion and know when processes terminate, but also be able to ensure that deletes eventually complete.
#当一个用户发送一个delete pod的请求,系统会记录一个平滑时间后往Pod中每个容器的主进程发送一个TERM信号
When a user requests deletion of a Pod, the system records the intended grace period before the Pod is allowed to be forcefully killed, and a [ TERM signal ] is sent to the main process in each container. 
#当平滑时间到达,KILL信号发送到Pod中每个容器的主进程,apiServer也将Pod对象删除
Once the grace period has expired, the [ KILL signal ] is sent to those processes, and the Pod is then deleted from the API server. 
If the Kubelet or the container manager is restarted while waiting for processes to terminate, the termination will be retried with the full grace period.

An example flow:
	1. 
		User sends command to delete Pod, with default grace period (30s)
	2. 
		The Pod in the API server is updated with the time beyond which the Pod is considered “dead” along with the grace period.
	3. 
		Pod shows up as [ "Terminating" ] when listed in client commands
	4. 
		(simultaneous with 3) When the [ Kubelet ] sees that a Pod has been marked as terminating because the time in 2 has been set, it begins the pod shutdown process.

		4.1. 
				If the pod has defined a preStop hook, it is invoked inside of the pod. If the preStop hook is still running after the grace period expires, step 2 is then invoked with a small (2 second) extended grace period.
		4.2. 
				The processes in the Pod are sent the [ TERM signal ].
	5. 
		(simultaneous with 3) Pod is removed from endpoints list for service, and are no longer considered part of the set of running pods for replication controllers. Pods that shutdown slowly cannot continue to serve traffic as load balancers (like the service proxy) remove them from their rotations.
	6. 
		When the [ grace period expires ], any processes [ still running ] in the Pod are killed with [ SIGKILL ].
	7. 
		The Kubelet will finish deleting the Pod on the API server by setting grace period 0 (immediate deletion). The Pod disappears from the API and is no longer visible from the client.

By default, all deletes are graceful within 30 seconds. The kubectl delete command supports the --grace-period=<seconds> option which allows a user to override the default and specify their own value. 
The value 0 force deletes the pod. In kubectl version >= 1.5, you must specify an additional flag --force along with --grace-period=0 in order to perform force deletions.

Guess you like

Origin blog.csdn.net/nangonghen/article/details/109305635