COMPUTEX2023|NVIDIA GRACE HOPPER's super chip designed to accelerate generative AI is in full production

Huang Shixun | Generative AI |COMPUTEX2023

NVIDIA | Mockups | COMPUTEX | GH200

2023 is the year of the big language model, generative AI, ChatGPT, and AIGC. GPU is an important hardware foundation for large-scale deep learning and high-performance computing, and large language models, such as generative AI and ChatGPT, use the computing power of GPU to quickly train and infer to obtain higher model effects and a wider range of Application scenarios. Especially in the field of game development, the use of large language models can enhance the experience of the game's plot advancement and artificial intelligence character performance, and the accelerated training of NVIDIA GPUs can make these features smoother. Due to NVIDIA's leading position in GPU hardware design and optimization, it provides a solid technical foundation for the rapid development of large language models.

At present, the number of large-scale AI models developed by China and the United States accounts for more than 80% of the global total. China ranks second in the world, second only to the United States. Among them, more than 79 large-scale AI models with a scale of more than 1 billion parameters have been released. Zhao Zhiyun, director of the China Institute of Scientific and Technological Information and director of the New Generation Artificial Intelligence Development Research Center of the Ministry of Science and Technology, said that my country's early deployments in the field of artificial intelligence have laid a solid foundation for the development of large models, and have established a comprehensive theoretical approach. And the systematic research and development capabilities of software and hardware technology have formed a large-scale model technology group that keeps up with the world's cutting-edge technology.

At present, the number of large-scale AI models with a parameter scale of more than 1 billion in my country has reached 79, and the geographical and field distribution is relatively concentrated. 14 provinces and regions across the country are carrying out large-scale model research and development, mainly in Beijing and Guangdong. Among them, there are 28 in Beijing and 22 in Guangdong. At the same time, the application of large models is constantly expanding and deepening. On the one hand, large-scale models in the general field such as Wenxin Yiyan, Tongyi Qianwen, Zidong Taichu, Xinghuo Cognition, etc. are developing rapidly, creating a cross-industry general artificial intelligence capability platform. Accelerated penetration in medical care, industry, education, etc.; on the other hand, for specialized large models in vertical fields such as biopharmaceuticals, remote sensing, and meteorology, give full play to its deep advantages in the field to provide high-quality professional solutions for specific business scenarios.

On May 29, NVIDIA launched the DGX GH200 AI supercomputer at Computex 2023, a cutting-edge system equipped with 256 Grace Hopper superchips and NVIDIA NVLink switch system, featuring 1 exaflop performance and 144TB shared memory. The launch of the supercomputer has caused a sensation in the field of artificial intelligence, marking Nvidia's lead again in the field of large-scale AI model technology and hardware design. Its powerful computing and network technologies have brought broader prospects for the application and development of generative AI, large-scale language models and recommendation systems, and further expanded the boundaries of AI. In addition, the DGX GH200 is the first supercomputer to pair a Grace Hopper superchip with an NVLink switch system, delivering 48 times more bandwidth than previous graphics cards, opening the door to new areas of exploration for AI pioneers and cloud service providers.

DGX GH200 and generative AI

Nvidia has released a series of generative AI-oriented products and services, including the large-memory generative AI supercomputer DGX GH200, the full-scale production of the Grace Hopper super chip GH200, the new accelerated Ethernet platform Spectrum-X, and customized AI model foundry services , Cooperating with WPP to create a generative AI content engine, etc., a number of initiatives have provided a broader prospect for the application and development of generative AI.

In addition, Nvidia released the MGX server specification, and more than 1,600 generative AI companies have adopted Nvidia technology.

At present, the market value of Nvidia has reached 963.2 billion US dollars, and it is only one step away from joining the "trillion market value club", becoming the fifth largest company in the market value of listed companies in the United States and the first trillion-dollar market value company founded by Chinese.

Exascale computing power, Google Cloud, Meta, Microsoft's first trials

Nvidia recently released a system pinnacle that uses the latest GPU and CPU - the new large-memory AI supercomputer DGX GH200, which is expected to be available by the end of this year.

The supercomputer is designed to support large next-generation models for generative AI language applications, recommendation systems, and data analytics workloads. DGX GH200 integrates advanced accelerated computing and network technologies, and is the first supercomputer to pair the Grace Hopper superchip with the Nvidia NVLink Switch system.

Using a new interconnection method, 256 Grace Hopper super chips can work together like a single giant GPU, providing 1EFLOPS performance and 144TB shared memory, which is nearly 500 times more memory than the previous generation DGX A100 320GB system.

Google Cloud, Meta, Microsoft, etc. are the first companies to gain access. Nvidia intends to provide the DGX GH200 design blueprint to other cloud service providers and ultra-large-scale computing vendors so that they can further customize DGX GH200 for their infrastructure.

NVIDIA is also building its own DGX GH200-based large-scale AI supercomputer NVIDIA Helios, which will be launched by the end of this year. In addition, the DGX GH200 supercomputer includes NVIDIA software, providing AI workflow management, enterprise-grade cluster management, accelerated computing, storage and network infrastructure libraries, and more than 100 frameworks, pre-trained models and development tools to simplify the development of AI production and deploy.

Nvidia's Base Command software can help manage AI workflows, enterprise clusters, accelerated computing and storage, network infrastructure, and more, while the AI ​​Enterprise software layer provides a number of frameworks, pre-trained models, and development tools to simplify development of AI production and deploy. The launch of the DGX GH200 supercomputer will help promote the development of AI technology, provide faster and more powerful AI computing capabilities for all walks of life, and accelerate the application and implementation of AI technology.

GH200 chip is fully put into production

Nvidia has announced that it has put into full production the GH200 Grace Hopper super chip, which will power AI and high-performance computing workloads.

GH200-based systems have been adopted by manufacturers worldwide, offering more than 400 configurations, all based on NVIDIA's latest Grace Hopper and Ada Lovelace architectures.

The GH200 Grace Hopper super chip uses NVIDIA NVLink-C2C interconnection technology, combining the NVIDIA Grace CPU and Hopper GPU architectures in the same package, providing a total bandwidth of up to 900GB/s, which is higher than the standard PCIe Gen5 lane bandwidth in traditional acceleration systems 7 times, while the interconnect power consumption is reduced to 1/5 of the original, which can meet the demanding generative AI and high-performance computing (HPC) applications. Several global hyperscale computing enterprises and supercomputing center customers are expected to adopt the GH200-powered systems, which will be available later this year.

Build a generative AI supercomputer with hundreds of millions of dollars

In addition, Jensen Huang also announced the launch of the NVIDIA Spectrum-X platform, designed to improve the performance and efficiency of Ethernet-based AI clouds.

Based on network innovation, Spectrum-X tightly couples NVIDIA Spectrum-4 switches and BlueField-3 DPUs to achieve 1.7 times overall AI performance and energy efficiency improvement, and enhances multi-tenancy through performance isolation to maintain consistent and predictable performance .

Spectrum-X is highly versatile and can be used in various AI applications, interoperating with Ethernet-based stacks, and supporting developers to build software-defined cloud-native AI applications. The world's major cloud computing providers are using the Spectrum-X platform to expand generative AI services. Spectrum-X, Spectrum-4 switches, BlueField-3 DPU, and more are now available from system builders such as Dell, Lenovo, AMD, and others.

NVIDIA is building a very large-scale generative AI supercomputer, Israel-1, in its Israeli data center as a blueprint and testbed for the Spectrum-X reference design. The supercomputer will use Dell PowerEdge XE9680 server, Nvidia HGX H100 supercomputing platform, Spectrum-X platform with built-in BlueField-3 DPU and Spectrum-4 switch, and is expected to be worth hundreds of millions of dollars. The platform supports 256 ports of 200Gb/s connected through a single switch, or 16,000 ports in a two-tier leaf-spine topology to support the growth and expansion of the AI ​​cloud while maintaining high levels of performance and minimizing network latency .

The world's leading cloud computing provider is adopting the Spectrum-X platform to expand generative AI services. Spectrum-X, Spectrum-4 switches, BlueField-3 DPU, and more are now available from system builders such as Dell, Lenovo, AMD, and others.

MGX Server Specification Modular Reference Architecture

Jensen Huang also released the NVIDIA MGX server specification, which provides a modular reference architecture for system manufacturers to adapt to a wide range of AI, HPC and NVIDIA Omniverse applications.

MGX supports NVIDIA's full line of GPUs, CPUs, DPUs, and network adapters, as well as a wide range of x86 and Arm processors, allowing manufacturers to more effectively meet each customer's unique budget, power delivery, thermal design, and mechanical requirements.

ASRock Rack, ASUS, GIGABYTE, Pegatron, QCT, Supermicro, etc. will use MGX to build next-generation accelerated computers, which can cut development costs by up to 3/4 , and shorten the development time by 2/3 to only 6 months. MGX can start with a base system architecture optimized for accelerated computing for its server chassis, then choose GPUs, DPUs, and CPUs. At the same time, MGX provides flexible multi-generation compatibility of NVIDIA products to ensure that manufacturers can reuse existing designs and easily adopt next-generation products. MGX also integrates easily into cloud and enterprise data centers.

In addition to the MGX specification, Jensen Huang also announced that Nvidia is partnering with Japanese telecommunications giant SoftBank to build a distributed network of data centers in Japan. The network will provide 5G services and generative AI applications on a common cloud platform. The data center will use the MGX series (including Grace Hopper, BlueField-3 DPU and Spectrum Ethernet switches) to provide the high-precision timing required by 5G protocols and improve spectral efficiency to reduce costs and energy consumption.

These systems help explore applications in areas such as autonomous driving, AI factories, AR/VR, computer vision, and digital twins. Future uses could include 3D videoconferencing and holographic communications. This will provide more efficient, flexible and advanced solutions in these fields, and promote the development of technology and industry.

Application of GH200 in game industry

Jensen Huang announced the launch of the Avatar Cloud Engine (ACE) service for games, a custom AI model foundry service that middleware, tool and game developers can use to build and deploy custom speech, dialogue and animation AI models .

ACE empowers non-player characters (NPCs) with intelligent and evolving dialogue skills, allowing them to answer player questions with a lifelike personality. ACE for Games provides optimized AI-based models for speech, dialogue, and character animation, including: NVIDIA NeMo, using proprietary data, to build, customize, and deploy language models; NVIDIA Riva, for automatic speech recognition and text-to-speech, and Realize real-time voice dialogue; NVIDIA Omniverse Audio2Face, used to instantly create facial expression animations of game characters to match any voice track.

Additionally, Nvidia, in partnership with its subsidiary Convai, demonstrated how to quickly build a gaming NPU with Nvidia ACE for Games. In a demo called "Kairos," Nvidia showed off a game that interacted with Jin, a vendor at a ramen shop. Based on generative AI, although Jin is an NPC, he can realistically answer natural language questions, and the answers are consistent with the narrated background story. Developers can integrate the entire NVIDIA ACE for Games solution or use only the components they need. Several game developers and startups have adopted Nvidia's generative AI technology.

Huang Renxun also introduced how Nvidia and Microsoft are working together to drive innovation for Windows PCs in the era of generative AI. New and enhanced tools, frameworks and drivers make it easier for PC developers to develop and deploy AI, such as Microsoft's Olive toolchain for optimizing and deploying GPU-accelerated AI models and new graphics drivers will improve Windows with NVIDIA GPUs DirectML performance on PC. The partnership will strengthen and expand the installed base of 100 million PCs powered by RTX GPUs, boosting the performance of more than 400 AI-accelerated Windows applications and games. This will bring higher performance and better experience to PC games, and will also promote the application and development of AI on Windows PCs.

In general, Huang Renxun introduced NVIDIA's latest progress and cooperation in game AI in his announcement, including the Avatar Cloud Engine (ACE) service, Microsoft's cooperation to promote Windows PC innovation, and so on. These technologies and cooperation will bring more AI tools and solutions to game developers and bring players a better gaming experience.

Application of DGX GH200 in digital advertising

Nvidia's generative AI technology will also bring new opportunities in the digital advertising industry. Engines based on NVIDIA AI and Omniverse technologies connect multiple creative 3D and AI tools to revolutionize business content and experiences at scale. 

British WPP Group, the world's largest marketing service organization, is cooperating with NVIDIA to use Omniverse Cloud to build the first generative AI content engine to create business content for customers in a more efficient and high-quality way.

The new engine connects an ecosystem of 3D design, manufacturing and creative supply chain tools from tools like Adobe and Getty Images. In his presentation, Jensen Huang showed how creative teams can connect their 3D design tools together and build a digital twin of a client's product in Omniverse. Generative AI technology trained with responsible data sources and built with NVIDIA Picasso enables it to rapidly generate virtual sets. From there, WPP clients can use the complete scene to generate a large number of advertisements, videos and 3D experiences for the global market and users on any network device.

This collaboration continues to advance generative AI technology in digital advertising. Mark Reed, CEO of WPP, said that generative AI technology is changing the marketing world at an incredible speed, and the unique competitive advantage provided by the cooperation will change the way brands create content for commercial use, and strengthen WPP's creative ability for the world's top brands Industry leadership in applied AI.

Application of DGX GH200 in electronics manufacturers

Electronics manufacturers around the world are using a new comprehensive reference workflow that combines multiple technologies from NVIDIA, including generative AI, 3D collaboration, simulation and autonomous machines, to help manufacturers plan, build, operate and optimize their factories. These technologies include NVIDIA's Omniverse, which connects top-of-the-line computer-aided design and generative AI APIs and cutting-edge frameworks; NVIDIA's Isaac Sim application, for simulating and testing robots; and NVIDIA's Metropolis vision AI framework, for automated optical inspection .

NVIDIA enables electronics manufacturers to easily build and operate virtual factories, digitize their manufacturing and inspection workflows, and dramatically improve quality and safety, reducing costly last-minute surprises and delays. Huang Renxun showed a demonstration of a fully digitized smart factory on the spot.

Foxconn Industrial Internet, Innodisk, Pegatron, Quanta, and Wistron are using NVIDIA’s reference workflow to optimize their workcell and assembly line operations while reducing production costs. Specific use cases include circuit board quality assurance inspection point automation, optical Inspection automation, construction of virtual factories, simulation of collaborative robots, construction and operation of digital twins, etc.

Nvidia is working with several leading manufacturing tools and service providers to build a full-stack, single architecture, one for each workflow level.

At the system level, NVIDIA IGX Orin provides an all-in-one edge AI platform that combines industrial-grade hardware with enterprise-grade software and support. IGX meets the unique durability and low power requirements of edge computing while delivering the high performance required to develop and run AI applications. Its manufacturer partners are developing IGX-driven systems to serve the industrial and medical markets.

At the platform level, Omniverse connects the world's leading 3D, simulation, and generative AI providers so teams can build interoperability between their favorite applications, such as those from Adobe, Autodesk, and Siemens.

The integration of these technologies enables manufacturers to design, simulate, test and produce on a unified platform, thereby greatly improving efficiency and quality. In addition, Nvidia offers a range of tools and services to help manufacturers manage and optimize their production lines, including real-time monitoring, data analytics and predictive maintenance.

NVIDIA's digital factory solutions are not only applicable to electronics manufacturing, but can also be applied to other industries, such as automobile manufacturing, aerospace, medical equipment, etc. These industries all require highly automated and digitalized production lines to meet ever-increasing market demands and quality standards.

GH200 product parameters

The GH200 is the latest supercomputer launched by Nvidia, which can accommodate up to 256 GPUs and is suitable for the deployment of very large AI models. Compared with the previous DGX server, GH200 provides a linear expansion method and a higher GPU shared memory programming model. It can access 144TB memory at high speed through NVLink, which is 500 times that of the previous generation DGX. The NVLink bandwidth provided by its architecture is 48 times that of the previous generation, enabling large models with hundreds of billions or trillions of parameters to be placed in one DGX, further improving model efficiency and the development process of multi-modal models.

The GPU's unified memory programming model has been the cornerstone of breakthroughs in complex accelerated computing applications. The NVIDIA Grace Hopper Superchip is paired with the NVLink switch system, integrating 256 GPUs in the NVIDIA DGX GH200 system, and accessing 144TB of memory at high speed through NVLink. Compared to a single NVIDIA DGX A100 320 GB system, the NVIDIA DGX GH200 provides nearly 500 times more memory for the GPU shared memory programming model, and is the first supercomputer to break through the 100TB barrier for GPUs to access memory through NVLink. The rapid deployment and simplified system management of NVIDIA Base Command enables users to perform accelerated computing faster.

The NVIDIA DGX GH200 system uses the NVIDIA Grace Hopper Superchip and NVLink Switch System as its building blocks. NVIDIA Grace Hopper Superchip combines CPU and GPU, uses NVIDIA NVLink-C2C technology to provide a coherent memory model, and provides high bandwidth and seamless multi-GPU system. Each Grace Hopper super chip has 480GB of LPDDR5 CPU memory and 96GB of fast HBM3, providing 7 times more bandwidth than PCIe Gen5, interconnected with NVLink-C2C.

The NVLink switch system uses fourth-generation NVLink technology to extend NVLink connectivity to superchips to create a two-stage, non-blocking, NVLink fabric that can fully connect 256 Grace Hopper superchips. This structure provides a memory access speed of 900GBps, and the computing backplane hosting the Grace Hopper Superchips is connected to the first-tier NVLink fabric using custom wiring harnesses, and the connectivity of the second-tier NVLink fabric is extended by LinkX cables.

In the DGX GH200 system, GPU threads can use NVLink page tables to access memory from other Grace Hopper super chips, and optimize GPU communication through the NVIDIA Magnum IO acceleration library to improve efficiency. The system has a bisection bandwidth of 128 TBps and 230.4 TFLOPS of NVIDIA SHARP in-network computing, which can accelerate collective computing commonly used in AI and double the actual bandwidth of the NVLink network system. Equipped with an NVIDIA ConnectX-7 network adapter and an NVIDIA BlueField-3 NIC for scaling to more than 256 GPUs, each Grace Hopper Superchip can interconnect multiple DGX GH200 systems and leverage the power of the BlueField-3 DPU to turn any enterprise The computing environment is transformed into a secure and accelerated virtual private cloud.

For AI and HPC applications that are bottlenecked by GPU memory size, a generational leap in GPU memory can significantly improve performance. For many mainstream AI and HPC workloads, the aggregate GPU memory of a single NVIDIA DGX H100 can fully support it. For other workloads, such as deep learning recommendation models (DLRM) with terabyte-scale embedded tables, terabyte-scale graph neural network training models, or large-scale data analysis workloads, a 4x to 7x speedup can be achieved using the DGX GH200. This suggests that the DGX GH200 is a better solution for more advanced AI and HPC models that require massive amounts of memory for GPU shared memory programming.

The DGX GH200 is a system designed for the most demanding workloads, with each component carefully selected to minimize bottlenecks while maximizing network performance for critical workloads and fully utilizing all extended hardware capabilities . This enables the system to have a high degree of linear scalability and high utilization of the massive shared memory space.

To take full advantage of this advanced system, NVIDIA also built an extremely high-speed memory fabric that operates at peak capacity and processes various data types (text, tabular data, audio, and video) with consistent and parallel performance.

The DGX GH200 comes with NVIDIA Base Command, which includes an AI workload-optimized operating system, cluster manager, libraries for accelerated computing, storage and networking infrastructure, all optimized for the DGX GH200 system architecture. In addition, DGX GH200 also includes NVIDIA AI Enterprise, which provides an optimized set of software and frameworks to simplify AI development and deployment. This full-stack solution enables customers to focus on innovating without worrying about managing their IT infrastructure.

Guess you like

Origin blog.csdn.net/LANHYGPU/article/details/131004439