Why can NVIDIA be in an undefeated position in high-performance computing GPUs?

NVIDIA | GTC2022 | High Performance Computing

NVIDIA | RTX4090 | Liquid Cooled Server

With the rapid development of technologies such as east-to-west computing, life sciences, remote sensing mapping, geological exploration, vacuum plume, and cryo-electron microscopy, the development of high-performance computing has gradually been valued by people. GTC 2022 pointed out that high-performance computing is one of the key tools to promote scientific development.

The GeForce RTX 4090 graphics card was officially announced yesterday. It is the flagship product of the new GeForce RTX 40 series and the world's first graphics card based on the new NVIDIA Ada Lovelace architecture. Compared to the previous generation RTX 3090 Ti with DLSS 2, the RTX 4090 with DLSS 3 offers up to 4x performance improvement. The RTX 4090 has 76 billion transistors, 16384 CUDA cores and 24GB of high-speed Micron GDDR6X memory.

This article will explain why NVIDIA is in an undefeated position in high-performance computing, the development trend of high-performance computing, and high-performance computing solutions.

High-end GPU Nvidia one-man show?

As the "acceleration artifact" of general computing - high-end GPU is becoming a rigid demand in large data centers, artificial intelligence, supercomputing and other fields. Nvidia has long dominated the high-end GPU market, with a market share of more than 90%. At present, domestic enterprises still have a long way to go to break through the monopoly of foreign companies such as Nvidia. The domestic DSA (programmable processor for specific domain) chip products based on architecture innovation are increasingly abundant, which may bring some hope.

  • High-end GPUs and traditional GPUs are "distinct" 

    Traditional GPUs focus on imaging, focusing on indicators such as frame rate, rendering fidelity, and mapping degree to real scenes. They are mainly used in scenarios such as running games, professional image processing, and cryptocurrency processing. The high-end GPU is a chip product used for computing acceleration, focusing on supercomputing fields such as basic science and large-scale artificial intelligence computing scenarios such as training and reasoning.

    The main dimensions for measuring high-end GPUs are versatility, ease of use, and high performance. The general-purpose hardware architecture should be flexible enough to adapt to iterative algorithms and scenarios of artificial intelligence. Ease of use means that the development threshold is lower, it is easier for developers to get started, and customized development is carried out in combination with actual scenarios. High performance means that the basic performance and cost performance of chip products must reach the international advanced level before market development can be carried out.

Discrete GPU market (including AIB partner graphics card) share in the second quarter of 2022

Source: Jon Peddie Research

Computing power is often the market's "first impression" of GPU performance. However, the performance of high-end GPUs is not equal to paper performance, especially it cannot be measured by single performance paper data.

In the process of actual use, the versatility, ease of use, and practicality of the GPU are far more important than the single performance of the computing power marked on paper. No matter how high the paper index is, we must also pay attention to whether the memory and bandwidth are sufficient, and whether the interconnection between chips is solved well. It is a common misunderstanding to use a single performance to measure whether a GPU is high-end.

  • HPC will be the main "arena"

For a long time, Nvidia has dominated the high-end GPU market with a market share of more than 90%, especially in the field of artificial intelligence computing. So far, Nvidia has launched Volta, Ampere, Hopper and other architectures for high-performance computing and AI training, and based on this, it has launched high-end GPUs such as V100, A100, and H100. The vector-oriented double-precision floating-point computing capability goes from 7.8 TFLOPS all the way to 30 TFLOPS.

As the world's second largest independent GPU supplier, although AMD's overall share of high-end GPUs is far behind that of Nvidia, it has made a breakthrough in the field of supercomputing. On the latest global supercomputer TOP500 list, the world's fastest supercomputer at the forefront of Oak Ridge National Laboratory (ORNL) and the world's third-ranked supercomputer LUMI all use AMD EPYC processors and AMD Instinct MI250X GPU accelerators.

AMD's outstanding performance in the field of supercomputing is based on targeted software and hardware design. The combination of the GPU accelerator based on the CDNA 2 architecture, the ROCm software platform and the AMD Infinity Hub, an open source application resource center, constitutes a more comprehensive solution for scientific researchers. Friendly hardware performance and programming environment.

Although it is more convenient to directly use GPUs for high-performance or AI computing, the core requirements of upper-layer applications to reduce costs and increase efficiency put forward higher requirements for underlying computing power. The AI ​​chips launched by foreign AI start-up companies are often based on a new architecture, which comprehensively improves and focuses on optimizing parallel computing capabilities. The leading domestic AI chip company has also launched a series of artificial intelligence computing chips based on the DSA architecture for the same consideration.

In the domestic market, DSA chip products based on architecture innovation are increasingly abundant. For example, Huawei's self-developed AI computing-oriented architecture features Da Vinci, Kunlun Technology's first-generation architecture XPU-K and second-generation architecture XPU-R, and Suiyuan Technology's self-developed architecture GCU-CARA, etc. Entering the scale landing stage. As the application scenarios of AI computing become more and more subdivided and complex, customized and heterogeneous DSA is expected to play a greater role in the next generation computing platform.

High Performance Computing HPC Development Trend

New fields of application are constantly emerging

Global catastrophic climate events are increasing, and predicting such events in advance is becoming more and more important to protect human security. Therefore, applications related to climate prediction in the coming year will attract much attention in the HPC field. In addition, with the use of HPC in the cloud, more HPC will be used in the development of consumer-oriented software programs. The emergence of the concept of virtual world and metaverse also ushers in new development opportunities for HPC, which can be used in games (AR/ VR) and other entertainment applications, as well as simulation applications such as digital twins.

The HPC market is expanding into new territories, adding artificial intelligence (AI) and data analytics techniques to traditional simulation and modeling processes. The outbreak of the new crown epidemic has increased the demand for flexible and scalable cloud-based HPC solutions. This demand, together with the need for fast data processing and high-precision The increasing demand will be the main factor driving the growth of HPC applications in the coming years. AI, edge computing, 5G and other technologies will broaden the functions of HPC, thereby forming a new chip/system architecture to provide efficient processing and analysis capabilities for various industries.

Improving HPC security will be key

As the overall digitalization of the market increases, security risks will also increase. More and more high-performance computing is moving away from the data center, which will directly lead to an increase in the number of attacks that cannot be handled by software patches. This puts enormous pressure on development teams to rush out hardware to address these issues, thereby shortening hardware design cycles. Therefore, improving the production efficiency of developers to keep up with the pace of market demand will become the focus of the next step.

Diversified HPC processor architecture

As the amount of data increases, not only security, but also infrastructure storage and computing power for data processing must be improved. In addition, new architectures including inter-chip connections are also necessary to drive new requirements.

Driven by factors such as changing AI workloads, flexible computing (CPU, GPU, FPGA, DPU, etc.), cost, memory, and IO throughput, HPC architectures are undergoing dramatic changes. The micro-architecture level becomes faster interconnection, higher computing density, storage scalability, higher infrastructure efficiency, eco-friendliness, space management, and higher security. From a system perspective, the next-generation HPC architecture will see explosive growth of disaggregated architectures and heterogeneous systems. Different dedicated processing architectures will be integrated in a single node, enabling precise and flexible switching between modules. Such a complex system also brings huge verification challenges, especially related verification of the system's IP or nodes, software and hardware dynamic coordination, workload-based performance, and power supply. To meet these verification requirements, new hardware and software verification methods need to be developed.

Moving data requires a lot of power and time, which is one of the challenges that system administrators are facing now, and reducing the amount of data movement will become a trend in the future. We need to continue to expand resources to support higher-performance devices with advanced packaging and inter-chip interfaces, that is, to expand processing capabilities within devices by using multiple dies, which is expected to be truly realized within the next year.

High Performance Computing Liquid Cooling Solutions

Against the background of the rapid development of deep learning, visual computing, image rendering, data science, and machine learning, high-performance computing (HPC) and liquid cooling are no longer the exclusive requirements of a few large companies or large scientific research institutions, but have been adopted by more and more More and more customers including the government, education and scientific research, remote sensing mapping, pharmaceutical research and development, small molecule research, cell therapy, and image recognition need and accept it.

In order to meet customer needs, Blue Ocean Brain proposes comprehensive solutions based on the characteristics of the industry from the aspects of computing nodes, networks, storage, power consumption, expansion, and heat dissipation.

Product Features

  • Rack-mounted liquid cooling design, plug and play, quick and easy to put into use;

  • Support up to 9 GPU graphics cards and 2 CPU processors;

  • The storage space of the rack can be greatly expanded and can be used for cloud storage services;

  • The liquid cooling system has higher density, more energy saving and better anti-noise effect;

  • High efficiency, energy saving, green and environmental protection

Customer benefits

  • The hyper-converged architecture assumes the role of computing resource pool and distributed storage resource pool, which greatly simplifies the infrastructure of the data center, and realizes no single point of failure and no single point of bottleneck through software-defined computing resource virtualization and distributed storage architecture , elastic expansion, performance linear growth and other capabilities.

  • Through a simple and convenient unified management interface, unified monitoring, management and operation and maintenance of computing, storage, network, virtualization and other resources in the data center are realized.

  • The computing resource pool and storage resource pool formed by the hyper-converged infrastructure can be directly allocated by the cloud computing platform to serve IaaS, PaaS, and SaaS platforms such as OpenStack, EDP, Docker, Hadoop, R, and HPC, and for upper-layer application systems or applications cluster etc. for support.

  • The distributed storage architecture simplifies the disaster recovery method, and realizes active-active data in the same city and remote disaster recovery. The existing hyper-converged infrastructure can be extended to the public cloud, and private cloud services can be easily migrated to public cloud services.

Guess you like

Origin blog.csdn.net/LANHYGPU/article/details/126988788