Supporting high-performance computing scenarios, Boyun Container Cloud creates an intelligent computing power engine

As the trend of Kubernetes as the next-generation infrastructure for AI, big data, and high-performance batch computing gradually becomes clear, more and more enterprises have put forward higher requirements for Kubernetes in terms of deep learning, scientific computing, and high-performance rendering.

 

Project Challenge

As a general container scheduling solution, native Kubernetes still has a certain gap with the business scheduling requirements in high-performance computing scenarios, which are mainly reflected in:

 

Work perspective scheduling capability to be improved

Kubernetes itself performs resource scheduling from a resource perspective, and Pod exists as a basic scheduling unit. Kubernetes implements the scheduling management of each container by means of sequential scheduling, and lacks the ability to schedule from the perspective of business operations.

 

In the application scenarios of big data, artificial intelligence, and high-performance computing, multiple containers are often required to perform calculations at the same time. However, the native sequential scheduling method of Kubernetes is completely unable to meet the scheduling requirements in big data and artificial intelligence scenarios.

 

For example, a big data application needs to run 1 Driver container + 10 Executor containers. If the containers are scheduled one by one in a sequential manner, when the last Executor container is started, the scheduling fails due to insufficient resources, and eventually the service cannot be started. Although the nine Executor containers created above are running normally, the platform cannot provide normal services for the big data application, which is also a waste of resources in terms of resource usage.

 

Or, when multiple job tasks are submitted at the same time, a deadlock may be caused due to insufficient resources, which will cause the actual resources of the cluster to be fully occupied, resulting in the worst case that all job tasks cannot run.

 

GPU resource sharing and segmentation to be implemented

Since the open source Kubernetes itself has relatively weak management capabilities for GPUs, it cannot meet the requirements of GPU sharing on-demand scheduling.

  • Each container can request one or more GPUs, and GPU resource segmentation is not supported

  • K8S nodes must have the corresponding drivers pre-installed

  • The nvidia-docker program must be preinstalled

 

In a container, to call NVIDIA's GPU, it needs to be called through nvidia-docker. nvidia-docker is a docker that can use GPU. It does a layer of encapsulation on docker, and calls GPU to docker through nvidia-docker-plugin.

 

From K8S 1.8 version, it is recommended to use Device-Plugins to call GPU.

 

Big data and other scene genes to be added

The core workload of Kubernetes is designed for Internet applications such as stateless applications and microservice applications, and it is better to support horizontal scaling and rolling upgrades. In the field of big data and high-performance computing, it is very difficult to directly use Kubernetes to complete corresponding jobs and tasks.

 

Resource isolation of traditional solutions to be improved

With the rise of the Hadoop ecosystem, Yarn began to use cgroups to isolate and manage CPU resources in terms of resource isolation, and used the memory isolation mechanism of JVM to isolate and manage memory resources; for the isolation of disk IO and network IO, currently The community is still discussing; for the isolation of the file system environment, a complete file system isolation solution has never been achieved.

 

On the whole, Yarn's resource isolation capability is relatively weak, which leads to the problem of resource preemption between different tasks when multiple tasks run on the same worker node, and different tasks affect each other.

 

To be enhanced elasticity on-demand expansion

The peaks of big data applications often have obvious periodic characteristics. For example, real-time computing resource consumption is mainly during the day. However, big data resource management platforms generally lack flexible management capabilities and cannot rapidly expand capacity according to application requirements. In order to cope with business peaks and sudden computing tasks, it is only possible to reserve enough resources to ensure the normal operation of tasks. .

 

solution

Starting in 2020, more and more big data, high-performance computing and other businesses have begun to migrate to K8S. Boyun's intelligent computing power engine realizes job-oriented scheduling by introducing group scheduling, fair scheduling and other methods, and solves the problem of pod-oriented scheduling of native K8S scheduler; by introducing CNCF batch computing project volcano, it realizes big data ecological jobs and queues Management capabilities: By introducing the Boyun container cloud platform, GPU scheduling capabilities, resource isolation and resource abstraction supply capabilities are realized. Therefore, the ability of containers as a resource platform is further strengthened , and preparations are made for improving computing power services in scenarios such as big data, artificial intelligence, and high-performance computing.

 

The Boyun intelligent computing engine is composed of three parts:

  • Business layer : The interface of Boyun's intelligent computing engine is called by business software to meet the management requirements of batch computing and scheduling of job tasks;

  • Scheduling layer : The Boyun intelligent computing engine provides overall scheduling, computing and other service capabilities;

  • Resource layer : A large number of physical machines or virtual machines provide computing power for Boyun's enterprise-level K8S clusters.

 

Implementing flexible job scheduling algorithms

Queue jobs based on the following principles to improve resource utilization and job throughput of the entire cluster.

  • Cluster resource usage

  • Assignment submission time

  • Job resource requests

  • job priority

  • job exclusivity

  • work to prevent starvation

     

Realize GPU multi-dimensional scheduling capability

The platform realizes the unified management of GPU, multi-dimensional isolation, resource sharing and other scheduling capabilities. Supporting multiple nodes can be configured with different numbers and types of GPU cards to achieve unified GPU card management; supports tenant or namespace level GPU resource isolation, supports isolation according to GPU card type; supports multiple business sharing GPU cards, supports GPU video memory Isolation to improve GPU card resource utilization.

 

Implementing MPI-based jobs

Generally, a running MPI job consists of two parts: master/worker. The master is responsible for starting the mpirun command, and the worker is responsible for executing the real computing job.

 

The platform implements the definitions corresponding to master/worker by defining settings for multiple pod templates. With gang scheduling, all pods in a job can be started at the same time, and job management has been achieved. Support job life cycle management function, when mpirun ends, end the whole task.

 

Implementing jobs based on the Spark framework

Deployment in Kubernetes is natively supported since Spark version 2.3.

 

Taking advantage of containers, packaging the runtime into an image can speed up distribution and improve portability. With the advantages of Kubernetes, management functions such as rapid deployment, elastic expansion, performance monitoring, and log collection of containerized applications can be realized.

 

Implement queue and priority preemption

Allocate the resources of the entire cluster to different queues, so that different users can configure and use different queues as needed. When resources in a queue are free, they can be provided to jobs in another queue.

 

When there are many jobs running at the same time, the platform can realize that high-priority jobs can preempt low-priority job resources, so as to realize advance scheduling management. At the same time, the platform can also prevent low-priority tasks from "starving to death" and not running for a long time.

 

Implementing high-performance container networking

In scenarios such as big data and supercomputing, thousands of computing instances need to be started and run in a short period of time, which places higher requirements on container network performance.

 

The BeyondFabric network solution developed by Boyun has gone through 5 iterations and has been running stably in the production environment of many customers. At present, the BeyondFabric network solution has achieved high-performance support for indicators such as startup time, network bandwidth, and network ready time in computing services.

 

At the same time, BeyondFabric network can provide better container network performance for Windows systems.

 

Implement deep isolation of user resources

At the tenant level, the platform provides multi-tenant sharing of underlying physical resources (computing, storage, and network) to isolate applications, data, and virtual networks of different tenants. For tenant-owned applications, tenants can freely choose to isolate or open up.

 

At the resource level, Docker starts multiple applications running in independent sandboxes on a Linux without affecting each other. Isolate the CPU, memory, network, storage, process, etc. of different containers.

 

Realize resource elasticity on-demand expansion

Elastic scaling is an important feature of container cloud and an important business scenario for implementing container cloud. With the elastic capability of the container cloud, resources can be rapidly expanded during peak business hours, avoiding excessive resource reservation for business peaks.

 

Application scenarios

As a representative of the traditional distributed computing mode, HPC high-performance computing is still widely used in many fields such as industrial simulation, visual rendering, meteorological environment, oil exploration, and scientific research.

 

With the explosion of cloud-native technologies, Kubernetes, as a tool for cloud-native application orchestration and management, is accepted and selected by more and more applications. Many users began to hope to migrate HPC applications to run in containers, and use the powerful functions of Kubernetes for job management.

 

Industrial Simulation

A user has a large number of Windows-based HPC applications. Before migrating to a container environment, he often encountered high resource usage, jobs were not isolated, and maintenance required manual operations in the background.

 

With the migration of HPC applications to the container cloud, the platform optimizes job tasks to shorten computing time; the platform provides sound resource isolation, reduces data security risks caused by job submissions in different departments, and greatly improves operational efficiency. The number of instances created by a single job submission has increased from  300+ to  1000+ now , and the scale of memory resources used is about  15-20T  .

 

visual rendering

The user's current rendering business is still dominated by stand-alone services. The existing software is not flexible enough for batch computing job scheduling, and has poor cluster control capabilities. Job configuration, creation, and release are still mainly manual operations.

Through the containerization transformation of the business, the capabilities of high-performance resource scheduling, second-level elastic scaling, and unified GPU management are realized, and large-scale rendering needs are easily met.

 

Solution advantage

Boyun's intelligent computing power engine solution realizes a unified scheduling management platform for high-performance computing scenarios based on container technology. This solution has the following technical advantages:

 

 

Unified Resource Management

  • Support Linux/Windows computing resource pool

  • Support GPU computing

  • Support batch and node group management

  • Improve resource utilization

 

Flexible scheduling mechanism

  • Support gang-scheduling mechanism

  • Support mainstream scheduling algorithms

  • Support mechanisms such as job starvation prevention and exclusivity

  • Improve job throughput

 

Efficient job submission

  • Support HPC, big data, artificial intelligence and other operations

  • Support MPI, TensorFlow quick submission

  • Routine jobs such as ETL are supported through DAG mode

  • Jobs run in containers to improve job isolation

 

Various queuing strategies

  • Multi-queue management to support resource preemption

  • Support setting priority

  • Support queue visualization

 

Comprehensive data visualization

  • Interfacing with storage systems such as S3

  • View data online in real time

 

Centralized log alerts

  • View job logs online

  • View job monitoring data online

  • Configure alarm rules online

     

Perfect tenant system

  • Divide tenant resources

  • Tenant data isolation

  • Operation permission audit

 

Compatible with Quanxinchuang Ecology

  • Support X86, Haiguang, ARM platform

  • Support operating systems such as Kylin, Tongxin, etc.

  • Has obtained mainstream certification

 

Summarize

Through containerization technology, Boyun's intelligent computing power engine solution makes full use of containerization technology, enabling big data, artificial intelligence, high-performance computing and other scenarios to further improve resource utilization efficiency and reduce operation and maintenance management under the application of containerization technology The complexity has further released the value of cloud-native technologies and supported enterprises in high-performance computing scenarios to complete digital transformation.

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324134168&siteId=291194637