1 Problem to be solved
When the cluster is allocated to multiple users, it is necessary to use quotas to limit the user's resource usage, including the number of CPU cores, memory size, and the number of GPU cards, to prevent resources from being exhausted by some users, resulting in unfair resource allocation.
In most cases, a native of cluster ResourceQuota
mechanism can solve the problem. But with the expansion of the cluster size and the increase of task types, we need to adjust the rules of quota management:
ResourceQuota
For single-cluster design, but in fact, development / production are often used in multi-cluster environment.- Most tasks such clusters through
deployment
,mpijob
and other high-level resource object commit, we hope that high-level resource object commit phase will be able to judge the quota. However,ResourceQuota
when calculating resource requestpod
granularity, which can not meet this demand.
Based on the above problems, we need to carry out quota management by ourselves. And Kubernetes provides a dynamic admission mechanism, allowing us to write custom plug-ins to achieve request admission. Our quota management plan starts with this.
2 Principles of cluster dynamic admission
After the request to enter the K8s cluster is received by the API server, it will go through the following stages of sequential execution:
- Authentication/Authentication
- Access control (change)
- Format verification
- Access control (verification)
- Endurance
The request will be processed accordingly in the first four stages mentioned above, and it will be judged whether it is allowed to pass. After all stages have passed, it can be persisted, that is, stored in the etcd database, which becomes a successful request. Among them, the admission control (change) phase, mutating admission webhook
will be called, you can modify the contents of the request. In the admission control (verification) phase, validating admission webhook
it will be called can be checked whether the requested content to meet certain requirements, to determine whether to allow or deny the request. These webhook
support the expansion, can be independently developed and deployed to the cluster.
Although the admission control (change) stage, webhook
also can examine and reject the request, but it can not guarantee the order is called, can not restrict other webhook
resource requests to be modified. Therefore, we deploy quota for verification validating admission webhook
, arranged on admission control (validation) phase calls, check the requested resource, you can achieve resource quota management purposes.
3 plan
3.1 How to deploy the verification service in the cluster
Use custom in K8s cluster validating admission webhook
need to deploy:
ValidatingWebhookConfiguration
Configuration (need to enable ValidatingAdmissionWebhook cluster), used to define objects to what resources (pod
,deployment
,mpijob
etc.) for verification, and to provide an address for service callback verification of the actual processing. Recommended in a cluster configurationService
way to provide calibration service address.- The actual process of verification services through
ValidatingWebhookConfiguration
accessible address configuration can be.
A single cluster environment, will serve to verify deployment
the way deployed in a cluster. In a multi-cluster environment, you can choose:
- Use virtual kubelet, cluster federation and other solutions to merge multiple clusters into a single cluster, which degenerates into a single cluster deployment.
- The verification service in
deloyment
the deployed one or more clusters, but it leads to communication service network for each cluster.
It should be noted that whether it is a single-cluster or a multi-cluster environment, the service processing verification requires resource monitoring, which is generally implemented by a single point. Therefore, it is necessary to choose the master.
3.2 How to implement verification service
3.2.1 Validation service architecture design
3.2.1.1 Basic component composition
- Server the API : Request inlet cluster, calls
validating admission webhook
to the verification request - API : Access service interface, using the AdmissionReview data structure agreed by the cluster as request and return
- Quota usage service : request resource usage interface
- Admissions : access service implementation, including
deployment
andmpijob
different types of access to resources - Resource validator : Quota verification for resource requests
- Quota adapter : connect to external quota service for validator to query
- Resource usage manager : Resource usage manager , maintain resource usage, realize quota judgment
- Informers : K8s provided by the watch mechanism to monitor cluster resources, including
deployment
andmpijob
, in order to maintain the current resource use - Store : Store resource usage data, which can be realized by connecting to the local memory of the service, or realized by connecting to the Redis service
3.2.1.2 The basic process of resource quota judgment
User-created deployment
resources, for example:
- User to create
deployment
resources need to include the definition of the application group information is specifiedannotation
, for exampleti.cloud.tencent.com/group-id: 1
, as used herein, represents a group of application1
resources (if there is no group information with the application, depending on the scene, directly rejected, or submit to the default set of applications, For example, an application group0
, etc.). - Request by the API server to receive, because the cluster is configured correctly
ValidatingWebhookConfiguration
, so the admission control validation phase, it will request the deployment of a clustervalidating admission webhook
of API , using the structure specified K8sAdmissionReviewRequest
as a request and look forward toAdmissionReviewResponse
the structure as a return. - Quota check service receives the request, it will enter the handles
deployment
resource admission logic, according to the action change request is CREATE or UPDATE to calculate the resource request or require a new application for release. - From
deployment
thespec.template.spec.containers[*].resources.requests
field to apply for resource extraction, such ascpu: 2
andmemory: 1Gi
to apply express. - Resource validator to find quota adapter to obtain application set
1
quota information, such ascpu: 10
andmemory: 20Gi
, to quota representation. Together with the apply obtained above, apply for resources from the resource usage manager . - Resource usage manager has been through the informer acquiring monitor
deployment
resource usage and maintenance of the store in. Store can use local memory, so there is no external dependency. Or useRedis
as a storage medium, convenient service expansion. - Resource usage manager received resource validator when requested, you can store found application group
1
resource has been occupied by the current situation, for examplecpu: 8
, andmemory: 16Gi
to usage indicate. The inspection found that apply + usage <= quota is considered that the quota is not exceeded, the request is passed, and finally returned to the API server .
The above is the basic process for realizing resource quota checking. There are some details worth adding:
- The API of the verification service must use https to expose the service.
- For unused resource types, such as
deployment
,mpijob
etc., you need to implement the corresponding admission and informer . - Each resource types may have different versions, such as
deployment
there areapps/v1
,apps/v1beta1
and so on, need to be compatible according to the actual situation of the cluster. - When the UPDATE request is received, according to the resource type required
pod
if the field changes, whether it is necessary to reconstruct the current existingpod
instance, the number of computing resources in the correct application. - In addition K8s own resource types, such as
cpu
other types of resources needed if quotas control customized, such as GPU type, etc., need to request the corresponding pre-allocated resources is goodannotations
, such asti.cloud.tencent.com/gpu-type: V100
- In the resource usage manager carried usage, process applications and determine the quotas, there may be competition for resources , but the actual quota by checking resource creation fails and other issues. Next we will explain these two issues.
3.2.2 About resource application competition
Due to the existence of concurrent resource requests:
- Usage needs to be able to be updated immediately after resource request
- Concurrency control is required for usage updates
In the above step 7, when the Resource usage manager verifies the quota, it needs to query the current resource occupation of the application group, that is, the usage value of the application group . This usage value by the informers responsible for updating and maintenance, but because of the resource request is validating admission webhook
passed to the informer can be observed, there is a time difference. During this process, there may still be resource requests, so the usage value is inaccurate. Therefore, usage needs to be able to be updated immediately after resource request.
And the update of usage requires concurrency control, for example:
- Application Group
2
's quota wascpu: 10
, Usage iscpu: 8
- Into the two requests
deployment1
anddeployment2
applications using the application group2
, they apply the samecpu: 2
- Need to be determined
deployment1
, calculating Apply + Usage =cpu: 10
, does not exceed the quota value,deployment1
requests are allowed through. - usage is updated to
cpu: 10
- To judge again
deployment2
, because the usage is updated tocpu: 10
, apply + usage = is calculatedcpu: 12
, which exceeds the value of quota , so the request is not allowed to pass.
The above process, easy to find usage is critical shared variables, you need to query and update the order. If deployment1
and deployment2
uncontrolled use usage is cpu: 8
, it will lead deployment1
, and deployment2
requests are passed, so that the actual quota limit exceeded. In this way, the user may take over resource specified quota.
Possible solutions:
- The resource application enters the queue, and is consumed and processed by a single point of service in turn.
- Lock the critical section where the shared variable usage is located, and query and update the value of usage in the lock .
3.2.3 About resource creation failure
Due to resource competition, we require usage to be able to be updated immediately after resource request, but this also brings new problems. At 4. Admission Control (validation) , after phase, will enter the requested resource object 5. The persistence phase, the process may also be abnormal (such as other webhook
refuses the request, or the cluster off, ETCD failure, etc.) As a result, the task was not actually submitted to the cluster database successfully. In this case, we verify phase, it has increased the usage value, put no actual occupation of quotas counted as taking up the task of quotas. In this way, a user may consume insufficient resources as prescribed quota.
To solve this problem, the background service will periodically update the usage value of each application group globally . Thus, if the emergence of validation phase increased usage value, but the task is actually committed to the database failed, when the global update, usage value of the final will again be updated to accurately value the moment the application set of resource utilization in the cluster.
However, in rare cases, global update will happen this time: a success will ultimately credited etcd persistent resource object creation request has passed
webhook
validation but not completed persistence moment. The presence of such moments, leading to a global update will bring users still occupy more than the quota of.
For example, in the previous example, afterdeployment1
updating the usage value, a global update happened to occur. At this time,deployment1
the information just has not been credited etcd, it will update the global usage again updated the old value, this will causedployment2
can also be adopted, thus exceeding the quota limit.
But in general, from validation to the persistence of a very short time. Low frequency in the global updates, such cases rarely occur . In the future, if there is further demand, more complex solutions can be used to circumvent this problem.
3.2.3 native ResourceQuota
works
Quota Management K8s cluster natively in ResourceQuota
response to these applications compete for resources and resource creation fails problems, a similar solution:
Real-time update to resolve application competition issues
After checking the quota, the resource usage is updated immediately. The optimistic lock that comes with the K8s system guarantees concurrent resource control (see the implementation of checkQuotas in the K8s source code for details ) and solves the problem of resource competition.
checkQuotas
The most relevant source code interpretation in:
// now go through and try to issue updates. Things get a little weird here:
// 1. check to see if the quota changed. If not, skip.
// 2. if the quota changed and the update passes, be happy
// 3. if the quota changed and the update fails, add the original to a retry list
var updatedFailedQuotas []corev1.ResourceQuota
var lastErr error
for i := range quotas {
newQuota := quotas[i]
// if this quota didn't have its status changed, skip it
if quota.Equals(originalQuotas[i].Status.Used, newQuota.Status.Used) {
continue
}
if err := e.quotaAccessor.UpdateQuotaStatus(&newQuota); err != nil {
updatedFailedQuotas = append(updatedFailedQuotas, newQuota)
lastErr = err
}
}
Here quotas
is the quota after the calibration information, wherein newQuota.Status.Used
the field records the resource usage of the quota. If the resource request for the quota is passed, when this code is run, Used
the amount of newly applied resources has been added to the field. Subsequently, the Equals
function is called, that is, if Used
the field has not changed, indicating that no new resources application. Otherwise, it will run to e.quotaAccessor.UpdateQuotaStatus
immediately go to the quotas in accordance with the information etcd newQuota.Status.Used
updates.
Timed global update to solve the problem of creation failure
Regularly update the resource usage globally (see the implementation of Run in the K8s source code for details ) to solve possible resource creation failures.
Run
The most relevant source code interpretation in:
// the timer for how often we do a full recalculation across all quotas
go wait.Until(func() { rq.enqueueAll() }, rq.resyncPeriod(), stopCh)
Here rq
is ResourceQuota
a corresponding reference from the controller object. The Controller run Run
cycle, continuous control of all ResourceQuota
objects. Cycle, uninterrupted regular call enqueueAll
, that is, all the ResourceQuota
press-queue, modify its Used
value, for a global update.
4 Reference
- Controlling Access to the Kubernetes API
- Dynamic Admission Control
- A Guide to Kubernetes Admission Controllers
- Deep understanding of Kubernetes Admission Webhook
- https://github.com/kubernetes/kubernetes/blob/v1.13.0/test/images/webhook/main.go
- Admission Webhooks: Configuration and Debugging Best Practices - Haowei Cai, Google