Volcano v1.1.0 released, CNCF's only container batch calculation project

On October 30, the Volcano community officially released the v1.1.0 version.

Volcano is a batch computing platform built on Kubernetes. It is derived from the HUAWEI CLOUD AI container. It provides job management, batch scheduling, dependency management, resource reservation and other capabilities. It supports multiple mainstream industries including TensorFlow, Spark, MPI, and Slurm. The computing framework mainly helps users to quickly migrate AI, big data and other resource-consuming, computing-intensive services from traditional Batch and HPC systems to cloud native.

New version features include:

Support HDRF. HDRF is a fairness scheduling strategy based on a weight tree. In Volcano, the leaf nodes of the weight tree represent Pods to be scheduled, and the non-leaf nodes include Task, Job, PodGroup, and Queue. All nodes have a positive weight, which indicates the relative importance of the node. During the scheduling process, the Pod scheduling order is automatically determined according to the weight tree, and the dynamic adjustment of the weight is supported.

Support automatic target task identification and resource reservation. The scheduler will automatically identify the job with the highest priority and the longest waiting time in the current pending job queue as the target job. In the subsequent scheduling cycle, the scheduler will lock several nodes for the job. The locked node will reject the delivery of the new job before the target job is scheduled, wait for the load currently running on itself to gradually withdraw, and make as much free resources as possible to prepare for the delivery of the target job.

Support scheduling performance monitoring. By cooperating with Grafana, Prometheus and other open source components, you can intuitively view Volcano's real-time scheduling situation, including core indicators such as the total number of current system jobs, status distribution, real-time throughput, and latency. The monitoring component will also provide a reference for the automatic tuning of scheduler performance.

Other updates. This version also adds other optimization points, such as queue weight verification, support for Pending job custom re-entry retry times, support for Arm64, etc.

repair. This version fixes bugs such as blocking of low-priority task scheduling after the failure of allocate action to schedule high-priority tasks, and the queue capacity exceeding the limit when the specified minAvailable is less than the number of job copies.

Release details page: https://github.com/volcano-sh/volcano/releases/tag/v1.1.0

Guess you like

Origin www.oschina.net/news/119662/volcano-1-1-0-released