Empowering artificial intelligence: Ray on vSphere open source plug-in released

The last year has seen explosive progress in the fields of machine learning and artificial intelligence . High-quality generative AI solutions like ChatGPT have aroused public interest and have extended to the commercial sector. Organizations and individuals alike are considering how to leverage this technology to accelerate impact and delight customers.

While these general models are great, they are often less than perfect for industry-specific use cases. Publicly available training data cannot provide models with the expertise needed to solve each enterprise-specific use case. To meet these demands, many organizations are investing in tuning and training their own models. To do this, they need to expand their computing space beyond an engineer's laptop or existing build tools. Data scientists and machine learning engineers need tools that help them scale their workloads with controllable access to the computing resources that match them.

To address these challenges, we're excited to announce a partnership between VMware and Anyscale, the creator of Ray. Ray is a distributed Python workload scheduler optimized for machine learning workloads , bringing serverless scalability to training and inference workloads. In terms of parallel processing and distributed computing, Ray has a wide range of applications and excellent performance.

Anyscale and VMware have partnered to create an open source plug-in for running Ray on vSphere using a virtual machine. The plugin enables system administrators to provide data science teams with computing infrastructure that meets their needs. When data science teams are able to use compute to run the workloads that support their data exploration, cleansing, and model experimentation, businesses can reduce the time it takes to go from raw data to a tuned and differentiated model, driving targeted business outcomes . The process is like DevOps, but this time the goal is to deliver a working model into production.

How does it work?

A Ray cluster consists of a head node and worker nodes.

The head node is responsible for managing the cluster and adjusting the number of working nodes in the cluster. These distributed worker nodes are responsible for training, fine-tuning and serving models.

To start working, the head node's autoscaler needs to know how big and where it can serve the cluster, which requires a cluster configuration file.

To achieve this, our plugin extends Ray Autoscaler to work directly with virtual machines on vSphere.

To coordinate Ray workloads, the Ray Autoscaler plug-in invokes a vSphere cluster. A vSphere cluster is a group of hosts whose resources become part of the cluster resources. A cluster manages the resources of all the hosts in it. The cluster supports vSphere High Availability (HA) and vSphere Distributed Resource Scheduler (DRS). These features ensure that Ray clusters are fault-tolerant, isolated from other mission-critical workloads, and optimally allocate computing resources.

Configure vSphere Providers

The following figure shows a sample Ray cluster configuration file for use with vSphere. In the Provider section, we must specify the type as vSphere, and specify the credentials for the vSphere cluster and the datastore where the Ray cluster will be deployed.

Also, in the worker node and head configuration, we can configure resource pools to isolate Ray Workers from other workloads. In order to improve performance, we can also specify to freeze the virtual machine (Frozen VM). This frozen virtual machine will be used as an instant clone (Instant clone) to quickly expand worker nodes.

What's next?

 

What we share today is just the first step. We are currently exploring how to leverage unused compute to train ML models when data centers are idle, enabling organizations to get more value from their data centers without impacting production workloads. It's also good for the planet!

We're ready to usher in a new era of automation and simplify access to machine learning with our Ray on vSphere plugin. Trials and feedback are welcome, please send questions to [email protected] . 

This article was written by Ala Dewberry, Senior Product Manager, OCTO xLabs; and Sean Huntley, Product Engineer, OCTO.

Content Source|Public Account: VMware China R&D Center

If you have any questions, please scan the official account below to contact us~

The country's first IDE that supports multi-environment development——CEC-IDE Microsoft has integrated Python into Excel, and Uncle Gui participated in the framework formulation. Chinese programmers refused to write gambling programs and were pulled out 14 teeth, with 88% body damage . Podman Desktop, an open-source imitation Song font, breaks through 500,000 downloads. Automatically skips opening screen advertisements. The application "Li Tiao Tiao" stops updating indefinitely. There is a remote code execution vulnerability Xiaomi filed mios.cn website domain name
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4238514/blog/10102420