DeepSpeed4Science: Leveraging advanced AI system optimization technology to enable scientific discovery

eee760d0db82385a6b61e4da892608de.jpeg

65596a9d4853ad068ae91b7dfb62c02c.jpeg


Reprinted from the official Chinese blog of the Microsoft DeepSpeed ​​team

Translated from the official English blog: Announcing the DeepSpeed4Science Initiative: Enabling large-scale scientific discovery through sophisticated AI system technologies

7c583b69cdfae108186f625a2494b3a2.png

Figure 1: Overview of the DeepSpeed4Science approach: development of AI system technology tailored to accelerate scientific discovery and cope with its complexity.


18a1b23df4a9257e56cad1ff27868c98.jpeg


Over the next decade, deep learning may revolutionize the natural sciences, enhancing our ability to model and predict natural phenomena. This could herald a new era of scientific exploration, leading to major advances in everything from drug development to renewable energy. In response to this opportunity and Microsoft's mission of " empowering every person and every organization in the world to achieve extraordinary things ", the Microsoft DeepSpeed ​​team launched a new program called DeepSpeed4Science , aiming to help field experts unlock today's largest technologies through AI system technology innovation. scientific mysteries.

The DeepSpeed ​​system is an industry-leading open source AI system framework developed by Microsoft, which provides unprecedented scale and speed for deep learning training and inference on various AI hardware. Figure 1 illustrates our basic approach to this new initiative, DeepSpeed4Science. By leveraging DeepSpeed's current technology solutions (training, inference, and compression) as foundational technology enablers, DeepSpeed4Science will create a suite of AI system technologies tailored to accelerate scientific discovery to address its unique complexities beyond those used in Common technical approaches to accelerating general-purpose large language models (LLMs).

We work closely with internal and external teams with scientific AI models to identify and solve domain-specific AI system challenges. This includes climate science, drug design, biological understanding, molecular dynamics simulations, cancer diagnostics and monitoring, catalyst/materials discovery, and other areas.

Our long-term vision is to develop DeepSpeed4Science into a software platform and unified code repository for sharing advanced AI technologies that support scientific discovery. DeepSpeed4Science is designed to be inclusive and echo Microsoft's "AI for Good" commitment . This is reflected in the program’s support for a series of iconic scientific models, which represent some of the most critical AI4Science application scenarios.

In this article, we show how DeepSpeed4Science can help solve two key AI system challenges in structural biology research:

(1) Solved the memory explosion problem in the Evoformer-centered protein structure prediction model;

(2) Provide long sequence support for AI models to better understand the evolution of viruses that cause pandemics.

52abad4c43a7eae8dc981dfc1785b500.jpeg

DeepSpeed4Science's new system technology can be used in many iconic models that push the boundaries of science, empowering AI-driven scientific discoveries.

Currently, DeepSpeed4Science is proud to support several key scientific models from Microsoft Research AI4Science , Microsoft WebXT/Bing, U.S. Department of Energy National Laboratories , and multiple universities.

935b4ad2ebe0b735f0c0523df9c60e94.jpeg

Scientific Foundation Model (SFM), Microsoft Research AI4Science

96e66938986c2decfa83c165fcfb46f8.jpeg

7f5466f0f0492017bc7502eaffed7a68.jpeg

Figure 2: Scientific Foundation Model (SFM) and its current exploration: Distributional Graphormer.

The Science Foundation Model (SFM) aims to create a unified large-scale foundation model to support natural science discovery, supporting multiple inputs, multiple scientific fields (e.g., drugs, materials, biology, health, etc.) and computational tasks. The DeepSpeed4Science partnership will provide the SFM team with new training and inference technologies to support ongoing research on projects like their new generative AI methods, such as the Distributional Graphormer.

ClimaX, Microsoft Research AI4Science

b2704a08128463cc6ee615e39dc39a69.jpeg

Figure 3: ClimaX is the first base model designed to perform a variety of weather and climate modeling tasks.

Our climate is changing, leading to an increase in the frequency of extreme weather events. To mitigate negative impacts, it is increasingly important to predict where these events will occur. ClimaX is the first base model designed to perform a variety of weather and climate modeling tasks. It can ingest many data sets with different variables and resolutions to improve the accuracy of weather forecasts. DeepSpeed4Science is creating new system support and acceleration strategies for ClimaX to efficiently pretrain/fine-tune larger base models while processing very large high-resolution image data (e.g., tens to hundreds of petabytes) and long sequences.

AI Powered Ab Initio Molecular Dynamics (AI2MD), Microsoft Research AI4Science

873a5a5482d12b997b0b71d05a1be664.jpeg

Figure 4: One million steps of molecular dynamics simulation: RBD-protein interaction with protein inhibitor.

This project simulates the dynamics of large (million-atom) molecular systems using AI-driven force field models to approximate first-principles computational accuracy while maintaining the efficiency and scalability of classical molecular dynamics. These simulations are efficient enough to generate trajectories long enough to observe chemically meaningful events. Typically, this process requires millions or even billions of reasoning steps. This poses a major challenge to optimizing the inference speed of the graph neural network (GNN) + LLM model, and DeepSpeed4Science will provide new acceleration strategies for this.

 Microsoft Weather, Microsoft WebXT/Bing

56a1224f9e18d79f4d7f4cdf6ddee9b4.jpeg

 Figure 5: Microsoft Precipitation Forecast (forecast for the next 4 hours every 4 minutes)

Microsoft Weather provides precise weather information to help users make better decisions about their lifestyle, health, work and activities - including accurate 10-day global weather forecasts updated multiple times every hour. Previously, Microsoft Weather benefited from DeepSpeed ​​technology to accelerate their multi-GPU training environment. Currently, DeepSpeed4Science is working with the Microsoft WebXT Weather Forecast team to further enhance the latest features and improvements of Microsoft’s weather forecast service.

24fb613cb086c477bdebca2276dc2748.jpeg

DeepSpeed4Science’s journey begins with two groundbreaking LLM-based AI models for structural biology research: OpenFold from Columbia University, an open-source, high-fidelity protein structure prediction model ; and GenSLMs from Argonne National Laboratory, an ACM Grant-Awarded Den Bell Prize-winning language model for learning the evolution of the SARS-CoV-2 (COVID-19) genome .

Featured in this release, they represent two common AI system challenges facing today’s AI-driven structural biology research. We will discuss how DeepSpeed4Science empowers these scientific studies in the next section.

Additionally, DeepSpeed4Science recently expanded its scope to support a more diverse range of scientific models. For example, in our work with Argonne National Laboratory to train a trillion-parameter science model on the Aurora Exascale system, DeepSpeed4Science technology will help them meet the performance requirements and scalability required for this critical mission.

Additionally, in partnership with Oak Ridge National Laboratory and the National Cancer Institute (NCI) for cancer surveillance, DeepSpeed4Science will help extract and classify information with high fidelity from unstructured clinical text for use by the MOSSAIC project. Brookhaven National Laboratory will also adopt DeepSpeed4Science technology to support the development of large-scale digital twin models using LLMs to produce more realistic simulation data for clean energy research.

You can find more details about our external collaborators and their scientific missions at deepspeed4science.ai .

80c0544321d01be9d5c1f79bb37b313e.jpeg

Demonstration (I): DeepSpeed4Science eliminates the memory explosion problem of Evoformer-centric structural biology models through DS4Sci_EvoformerAttention

7c7da926c9dc3977921340ddd5e0f1a9.jpeg

14c5d630e0379471d2e9ebef578ac88f.jpeg

Figure 6: OpenFold’s prediction of PDB chain 7B3A_A during training.

OpenFold is an open source community rendition of DeepMind's AlphaFold2, making it possible to train or fine-tune AlphaFold2 on new datasets. Researchers have used it to retrain AlphaFold2 from scratch, generate new model parameter sets, study the early training stages of AlphaFold2 (Figure 6), and develop new protein folding systems.

b2add876a48e66b5e528b6723a9b1556.png

Figure 7: Peak memory requirements for training a variant of the Multiple Sequence Alignment (MSA) Attention kernel (including bias) in OpenFold. A new solution from DeepSpeed4Science, DS4Sci_EvoformerAttention, significantly reduces OpenFold’s peak training memory requirements (up to 13 times) without affecting model quality.

Although OpenFold uses state-of-the-art system technology for performance and memory optimization, training AlphaFold2 from scratch is still computationally expensive. The model parameters at the current stage are small, with only 93 million parameters, but it contains several special attention variants that require very large intermediate memory.

During the "fine-tuning" phase of standard AlphaFold2 training, just one of these variants generated over 12GB of tensors at half precision, making its peak memory requirement far greater than that of a language model of the same size. Even with techniques like activation checkpointing and DeepSpeed ​​ZeRO optimization, this memory explosion problem still severely limits the sequence length and MSA depth of trainable models.

Furthermore, the approximation strategy can significantly affect the accuracy and convergence of the model while still causing memory explosion, as shown on the left side of Figure 7 (orange). To address this common system challenge in structural biology research (e.g., protein structure prediction and equilibrium distribution prediction), DeepSpeed4Science works by designing customized precise attention for a variant of attention that appears widely in such scientific models, namely EvoformerAttention. kernel to solve this memory efficiency problem.

Specifically, we designed a set of highly memory-efficient DS4Sci_EvoformerAttention kernels composed of sophisticated fusion/matrix tiling strategies and dynamic memory reduction methods as high-quality machine learning modules for use by the broader biological research community. By integrating into OpenFold, these custom cores provide significant acceleration during training and significantly reduce the peak memory requirements of the model's training and inference. This allows OpenFold to experiment with larger, more complex models and using longer sequences on a wider range of hardware. Detailed information about this technology can be found here.

Presentation (II): DeepSpeed4Science provides long sequence support for genome-based models (e.g., GenSLMs) through systematic and algorithmic approaches

3ac85d83180af4aa4f52510a65d2a95a.jpeg

Figure 8: GenSLMs: COVID genome model that won the 2022 ACM Gordon Bell Prize (25B/33B model based on GPT-NeoX). It is used to learn the latent space describing the biological significance of the SARS-CoV-2 genome. This GIF shows a projection of latent space colored according to important features for an important protein family, malate dehydrogenase.

GenSLMs, a 2022 ACM Gordon Bell Award-winning genomic model from Argonne National Laboratory, can learn the evolution of the SARS-CoV-2 (COVID-19) genome by training on genomic data from large language models (LLMs). It aims to change how new variants of pandemic-causing viruses, specifically SARS-CoV-2, are identified and classified.

GenSLMs represent the first genome-based models that can be generalized to other prediction tasks. A good understanding of the latent space can help GenSLMs address new domains beyond just viral sequences and extend their ability to model bacterial pathogens and even eukaryotes (e.g., understanding things like function, pathway membership, and evolutionary relationships).

To achieve this scientific goal, GenSLMs and similar models require very long sequence support for training and inference, which is beyond the long sequence strategies of general-purpose LLMs like FlashAttention. With DeepSpeed4Science's new design, scientists can now build and train models with significantly longer context windows, allowing them to explore previously inaccessible relationships.

4c48e13bfa84384a9fee7a0d7c988385.png

Figure 9: Maximum sequence length of two GenSLMs models supported by different frameworks at different scales. Using NVIDIA DGX, there are eight 40G A100 GPUs per node.

c451f2956080be0991e86ba38f427ddd.jpeg

Specifically at the system level, we released the latest Megatron-DeepSpeed ​​framework including long sequence support and other new optimizations. Scientists can now benefit from a synergistic combination of our newly added memory optimization techniques (such as attention mask asynchronous processing and positional code segmentation), tensor parallelism, pipeline parallelism, sequence parallelism, ZeRO-based data parallelism, and model state asynchronous processing. , train large scientific models such as their GenSLMs with longer sequences. Figure 9 shows that our new version increases the longest sequence length of the 25B and 33B models of GenSLMs by 12 times and 14 times, respectively, compared to the previous Megatron-DeepSpeed ​​version.

In terms of supported sequence length, this new Megatron-DeepSpeed ​​framework also significantly exceeds NVIDIA's Megatron-LM (up to 9.8 times and 9.1 times for 25B and 33B models respectively). For example, the Argonne Labs team's GenSLMs 25B model, which had an original sequence length of 42K on 64 GPUs, can now be trained with 512K nucleotide sequences.

This greatly increases model quality and the scope of scientific discoveries without losing accuracy. For those domain scientists who prefer algorithmic strategies such as relative position encoding techniques, this new version is also integrated.

Reprinted from the official Zhihu account of the Microsoft DeepSpeed ​​team: zhihu.com/people/deepspeed

Editor丨Qian Xinyue

Related Reading | Related Reading

DeepSpeed ​​Ulysses: System optimization for training extremely long sequence Transformer models

Shocking release, unlocking super power for all | DeepSpeed-Chat is open source!

‍‍‍‍‍‍‍

Introduction to Kaiyuan Society

Kaiyuanshe (English name: "KAIYUANSHE") was established in 2014. It is an open source community composed of individual volunteers who volunteer to contribute to the open source cause and based on the principles of "contribution, consensus, and co-governance " . Kaiyuan Society has always maintained the concept of "vendor neutrality, public welfare, and non-profit", with the vision of "based on China, contributing to the world, and promoting open source as a way of life in the new era" , and with "open source governance, international integration, community development, and project incubation" Our mission is to create a healthy and sustainable open source ecosystem.

Kaiyuan Society actively cooperates closely with communities, universities, enterprises and government-related units that support open source. It is also the first member of OSI, a global open source protocol certification organization, in China.

Since 2016, it has held the China Open Source Annual Conference (COSCon) continuously, continuously released the "China Open Source Annual Report", and jointly launched the "China Open Source Pioneer List", "China Open Source Code Power List", etc., which has had a wide impact at home and abroad. force.

558b5303ce8d28b868c18fe17d760ae3.gif

Guess you like

Origin blog.csdn.net/kaiyuanshe/article/details/133223323
Recommended