Intensive management and control, on-demand allocation: Hewhale focuses on AI for Science and allocates scientific research computing power to universities.

As human society enters the intelligent stage of the information age, data has gradually become one of the basic production factors, and computing power has therefore become an important productive force. "Study Times" published an article "Why Computing Power is So Important" on September 3. The article pointed out that breakthroughs in artificial intelligence technology and industrial digital applications have put forward higher requirements for computing power ; at the Huawei Full Connect Conference on September 20 Meng Wanzhou also said in a speech that computing power is the core driving force for the development of artificial intelligence and determines the speed of AI iteration and innovation.

At present, domestic political, industry, academic and research circles are paying increasing attention to artificial intelligence and computing power, especially the scientific research community. In March this year, the Ministry of Science and Technology, together with the Natural Science Foundation of China, officially launched the "Artificial Intelligence-driven Scientific Research ( AI for Science)" special deployment work , which aims to use artificial intelligence methods to conduct computationally intensive research based on scientific data and computing power support. , efficient and iterative scientific exploration, bringing new breakthroughs to scientific research work.

The scientific research paradigm is constantly upgrading, but traditional infrastructure is gradually unable to respond to the software and hardware support required by the emerging AI for Science. This article will focus on how to efficiently schedule and manage scientific research computing power , and introduce the data science collaboration platform ModelWhale to research teams in various fields, with a view to providing assistance for scientific research driven by artificial intelligence.

1. Scientific research expectations and current situation

Scientific research expectations: Efficient and accurate deployment of computing power during the research process to maximize the availability of resources within the organizational team

Actual situation: There is a lack of suitable channels for integrating computing and storage resources, and it is impossible to take into account both the computing power cluster and the personal practice environment.

Artificial intelligence-driven scientific research projects are generally large-scale and complex computing tasks, such as GPU cluster model training, large language model deployment and call, etc., which have high hardware requirements. Ordinary personal PCs cannot meet the corresponding computing power needs locally.

Therefore, universities and scientific research institutions will pre-purchase high-specification servers, but such servers are generally scattered and difficult to integrate and utilize at the organizational level. Cluster scheduling of resources on the cloud is theoretically feasible, but the related deployment, operation and maintenance work is cumbersome and It is highly professional , and it takes a lot of time and effort to find the right candidates within the organization; even if it is successfully operated and maintained, researchers often suffer from the inability to balance the computing cluster and personal practice environment .

2. Efficient computing power scheduling management for artificial intelligence-driven scientific research

Heterogeneous integration, intensive management and control, on-demand allocation, and agile response . ModelWhale's powerful computing power scheduling management makes it possible for personal computers to call LLM large language models, and also maximizes the availability of computing power resources within the organizational team.

Heterogeneous Convergence: Privatized Deployment and Operation and Maintenance of Computing Power Access

The independently deployed ModelWhale can choose local servers (the first choice for customers who need to efficiently utilize existing hardware equipment), private clouds or cloud services provided by major cloud vendors for computing power access. No matter which method is used, it is based on cloud native technology . The ModelWhale of the solution can be flexibly connected and supports cross-cloud scheduling.

After the deployment is completed, ModelWhale will provide a full set of operation and maintenance services and a complete after-sales mechanism to follow up the entire process. For general problems, online remote support is available; for severe or complex problems, the ModelWhale team can also go to the local area to solve them, eliminating the need to waste human resources within the research organization on related operation and maintenance work.

Intensive management and control: unified management of computing power of various specifications

Choosing a local server to complete the computing power access means that the existing resources in universities and scientific research institutions have been integrated. The next step is the unified management of computing power of various specifications, that is, how to use the integrated resources more conveniently. , and assigned to project groups of different teachers and researchers.

Through ModelWhale, managers of large organizations can use the graphical operation interface to split the computing power according to the number of cores and memory size, and then allocate it to different groups according to different usage needs . For example, large and complex computing tasks often occur in the AI ​​for Science process, so higher-specification CPU computing power or GPU clusters are required. Then, more basic computing power resources can be simultaneously allocated to teaching teams within universities for course use. Practice to ensure that the computing power of all specifications is not idle.

In addition, ModelWhale also provides a resource application mechanism. When existing computing and storage resources are insufficient, project team managers can initiate applications to obtain computing power supplies in time to meet different research needs.

Computing resources are allocated to different project groups on demand

On-demand allocation: refined and flexible computing power scheduling

If the focus of unified management of computing power of various specifications is from large organizations in universities and scientific research institutions to different project groups within the organization, then refined and flexible computing power scheduling is more concerned with the allocation of computing power among personnel within the project team. .

The use of computing power resources in an artificial intelligence-driven scientific research project team belongs to a "high specification + high concurrency " scenario. How to allocate limited computing power to more researchers in the team? Like organizational administrators, project team managers can also complete the allocation and control of remote resources through simple point-and-click operations, down to each member of the group, including configuring the type and duration of specific resources that can be used. By migrating computing power to the cloud, the research environment of researchers within the project team is no longer limited to the network or computers in offices and laboratories. Personal PCs can also conduct relevant research anytime and anywhere.

Real-time control of the computing power of researchers in the group by project team administrators and directors is also a way to eliminate resource waste ; when computing power is in short supply, ModelWhale not only provides a resource queuing mechanism, but also supports team members Configure the priority of resource use so that it can complete some relatively important research work first; finally, the computing power application mechanism is also applicable to the project research group. The application will be reviewed by the management staff. After passing, the corresponding resources will be automatically issued according to the needs of each researcher. resource.

Computing Resource Management - Resource Usage Interface

Agile response: Ready-to-use computing resources

Computing power, like the analysis environment and mirroring, is a ready-to-use part of ModelWhale: after obtaining the computing power assigned by the project team manager, researchers in the team can independently select the required computing power before starting the project, and can do so with one click Complete the resource call and start the data research work; during the research process, you can also check the usage of the platform's computing power, memory, and disk at any time; after the project is closed and the computing power is used up, the resources will be automatically released for other needs in the group used by researchers.

When encountering large and complex computing tasks, the newly launched Pipeline function supports task orchestration and parallel computing . This function is part of the offline training of the model and can make the relevant computing power scheduling during the training process more efficient.

3. Conclusion

Led by technological revolution and top-level policies, the scientific research community is paying more and more attention to data, computing power, and artificial intelligence.

The data science collaboration platform ModelWhale Scientific Research Edition focuses on collaborative innovation in data-driven research. It is digital infrastructure with the mission of promoting the reform of the AI ​​for Science scientific research paradigm and strengthening organized scientific research : a one-stop shop focusing on research objects from data, algorithms to models. Full-process management improves the reproducibility of scientific research from the infrastructure level and helps create a good scientific research ecosystem for collaboration; based on FAIR principles and open scientific research concepts, it provides a safe and complete public sharing portal and online interaction for data and other research and production materials . Workbench ; heterogeneous integration, intensive management and control, on-demand allocation, agile response, and powerful computing power scheduling management make it possible for personal computers to call LLM large language models, and also maximize the availability of computing resources within the organizational team; introduce the ModelOps concept , assisting the full life cycle management of large models.

The ModelWhale scientific research version covers earth sciences, biomedicine, humanities and social sciences and other professional fields, and has implemented best practices in national scientific research institutions such as the National Meteorological Information Center and the China Natural Resources Aviation Geophysical Exploration and Remote Sensing Center. We hope to serve everyone engaged in Supported by pioneers in innovative data research and their teams. For any related needs, you are welcome to enter the ModelWhale official website to register and experience, or click [Contact Product Consultant (Mobile Jump)] to communicate with us.

Guess you like

Origin blog.csdn.net/ModelWhale/article/details/133359088