Science and technology cloud report: In the battle of computing power, Nvidia released AI "bomb" again

Technology cloud report original.

Recently, at the scene of SIGGRAPH 2023, the top meeting of computer graphics, Nvidia released the late-night "bomb" again, and the special chip for large models ushered in an upgraded version.

Nvidia released a new generation of GH200 Grace Hopper platform at the meeting, which is based on the world's first new Grace Hopper super chip equipped with HBM3e processor - GH200, which is designed to handle the world's most advanced Built for complex generative AI workloads.

It is reported that the GH200 chip will become the world's first GPU chip equipped with HBM3e (High Bandwidth Memory 3e) memory.

Compared with the current generation of products, the latest version of the GH200 super chip has increased memory capacity by 3.5 times and bandwidth by 3 times; compared with the most popular H100 chip, its memory has increased by 1.7 times and transmission bandwidth by 1.5 times.

Under the current surging demand for generative AI, the launch of the GH200 super chip has further sounded the clarion call for the battle of AI computing power.

Higher performance GH200 chip

According to reports, the HBM3e memory of the GH200 Grace Hopper platform is 50% faster than the current HBM3 and can provide a total bandwidth of 10TB/s. This enables the new platform to run models 3.5 times larger than the previous version, while boosting performance with 3 times faster memory bandwidth.

Meanwhile, the platform features dual configurations, including a single server with 144 Arm Neoverse cores, 8 petaflops of AI performance, and 282GB of the latest HBM3e memory technology.

Jen-Hsun Huang, founder and CEO of Nvidia, said: "In order to meet the surging demand for generative AI, data centers need accelerated computing platforms that can meet specific needs. The new GH200 Grace Hopper super chip platform improves throughput, can connect multiple GPUs to consolidate performance without compromising performance, and has a server design that can be easily deployed throughout a data center."

According to information released by NVIDIA, the new platform can be connected to other superchips through NVIDIA NVLink™, enabling them to work together to deploy today's large-scale generative AI models. This high-speed, coherent technology gives the GPU full access to CPU memory, providing a total of 1.2TB of fast memory in dual configurations.
insert image description here

It is worth noting that compared with the previously released H100, the new super chip GH200 used in the new platform uses the same GPU, but the GH200 will be equipped with up to 141G of memory and a 72-core ARM CPU at the same time, with a bandwidth of 5TB per second. The memory has increased by 1.7 times and the bandwidth has increased by 1.5 times.

The support of new platforms and chips also effectively reduces the cost of large model training. Huang Renxun said that a server can be loaded with two GH200 super chips at the same time, and the reasoning cost of large language models will be greatly reduced.

According to reports, an investment of US$8 million in Grace Hopper is equivalent to 8,800 x86 GPUs worth US$100 million, which means a 12-fold reduction in cost and a 20-fold reduction in energy consumption.

Nvidia said the GH200 went into full production in May, and new systems based on the GH200 Grace Hopper platform will be delivered in the second quarter of 2024.

A key catch, though, is that Nvidia hasn't disclosed the price of its superchip GH200, which is especially important for larger models where computing is expensive, with the H100 series currently selling for around $40,000.

Why is memory important for large models?

In fact, the GH200 super chip itself is not a new product, but an updated version of the GH200 chip released at Computex in Taipei, China in May this year.

"We're very excited about this new GH200," said Ian Buck, vice president and general manager of hyperscale and high-performance computing at NVIDIA. HBM3e not only increases the capacity and amount of memory of the GPU, but it's also faster .”

But why is GPU memory so important?

This is because as the size of the underlying AI models underpinning generative AI applications increases, large models require larger amounts of memory to avoid performance degradation in order to be able to run without connecting separate chips and systems.

Having more memory allows the model to remain on a single GPU and not require multiple systems or multiple GPUs to run, and the extra memory will only improve GPU performance.

Even with Nvidia's top-of-the-line H100 chip, some models must be "broken down" in other GPUs to run.

The latest version, the GH200, comes with 141GB of HBM3e memory and is designed to handle "the world's most complex generative AI workloads, spanning large language models, recommendation systems, and vector databases," according to Nvidia.

Impact on the field of AI

NVIDIA's GH200 superchip and DGX GH200 supercomputer are major breakthroughs in the field of AI. They provide unprecedented performance and memory space for large-scale generative AI workloads, making it possible to train huge models with hundreds of billions or even trillions of parameters.

These models can achieve higher accuracy and efficiency in the fields of natural language processing, computer vision, recommendation systems, graph analysis, etc., providing a powerful tool for humans to solve more complex problems.

In the opinion of many AI practitioners, the current large model training needs are too urgent, and the performance requirements are also high, and GPU adaptation and ecological transfer will take a long time, so everyone is currently giving priority to NVIDIA, and other Manufacturer's test verification is also in progress.

A new battle of computing power has begun. If computing power is a world, then Nvidia is a peerless master at this moment.

It has the stunt of accelerated computing, especially in the AI ​​battlefield, and it seems that it can accurately step on the rhythm of the wave every time. From the gaming PC market, to the rise of deep learning, to the popularization of cloud computing, to the advent of generative AI, Nvidia's technology is invincible.

Looking back, Nvidia has already surpassed the concept of GPU itself, AI has become the biggest label, and the peerless martial arts of computing power have supported a new trillion empire.

In 2022, Nvidia launched a number of blockbuster products, namely the H100 GPU based on the new Hopper architecture, Grace Hopper, a combination of CPU and GPU, and Grace CPU Superchip, a combination of two CPUs. CPU products will be launched in 2023.

Among them, when designing the new GPU architecture Hopper, Nvidia added a Transformer engine, which specifically optimized the hardware for the Transformer algorithm to speed up the efficiency of AI computing.

A domestic chip practitioner said bluntly: "The release of H100 is actually a new era. Another combination of Grace-Hopper, coupled with high-end interconnection, does not give a way to survive at all. Nvidia wins all, and AMD and Intel continue to chase hard."

At the same time, he also said: "At present, some domestic companies are still focusing on CNN for optimization. Nvidia already has a Transformer engine, and AIGC is hot, and it happens to be able to support it. With this vision, I can only admire their scientists' deep understanding of this field. "

An academic person also analyzed: "From the H100, including the dedicated Transformer engine and support for the FP8 format, it can be seen that computing hardware is moving in the direction of application customization. Grace CPU illustrates the importance of integrating heterogeneous computing systems. Reliability. Simple accelerator optimization and design can no longer meet the current requirements for computing power and energy efficiency ratio of computing systems, and collaborative optimization and design of various parts are required.”

He also said that Grace CPU solves the bottleneck in computing by improving communication bandwidth and establishing a coherent memory model between CPU and GPU, which is also in line with academic circles (near memory computing, in-memory computing) and industry (CXL, CCI and other system interconnection protocols) have been paying attention to the same direction.

All in all, in the various permutations and combinations of GPU and CPU, Nvidia has raised the computing power to a new level. As Huang Renxun said: "We are reinventing computers. Accelerated computing and artificial intelligence indicate that computing is being redefined."

Huang Renxun also mentioned in the interview that data centers need to use fewer and fewer CPUs, instead of buying millions of CPUs traditionally, but instead buying millions of GPUs. In other words, in his view, the AI ​​computing power arena is already the home of the GPU.

Nvidia's ambitions

In fact, as ChatGPT triggers an upsurge in demand for large AI models, as a leader in accelerated computing, Nvidia’s stock price has risen by more than 210% this year, and it has risen by 56% in the past three months. In the past seven years, its stock price has increased by more than 40 times. The current market value has exceeded 1.1 trillion US dollars.

According to public data, Nvidia occupies more than 80% of the global GPU server market share and 91.4% of the global enterprise GPU market share.

According to a research report released by investor service company Moody's in May this year, Nvidia will achieve "unparalleled" revenue growth in the next few quarters, and its data center business revenue will exceed the sum of rivals Intel and AMD.

But Morgan Stanley strategist Edward Stanley (Edward Stanley) said in the latest report that based on historical background, Nvidia's stock price surge is in the "late stage" stage, which Morgan Stanley believes marks a "bubble" in the AI ​​​​industry.

Due to the continued shortage of GPUs, the price of Nvidia products has risen by more than 30% year-on-year. The spot price of Nvidia A800 single card is nearly 130,000 yuan, and the price of H100 on eBay is as high as 45,000 US dollars.

At the same time, OpenAI's GPT-4 large model requires at least 25,000 Nvidia A100 GPU chips, and the company currently has at least 10 million GPU chips.

As Jensen Huang often said, "The more GPUs you buy, the more money you save." The main reason is that the new GPU products can significantly improve accelerated computing, which is more powerful, more computing power, and lower power consumption than CPUs.

But Nvidia's layout doesn't stop there.

A practical problem is that high-performance computing power also means high prices. The cost of large-scale model training can cost tens of millions of dollars, and not all companies can afford it.

At the same time, Nvidia proposed a cloud service solution, Nvidia AI foundations, and Huang Renxun said that he wanted to be "TSMC in the AI ​​world". TSMC has greatly lowered the production threshold for chip design companies, and Nvidia will also play the role of a foundry, providing cost-effective cloud services through cooperation with large-scale model manufacturers and cloud manufacturers.

While helping downstream companies reduce the cost of large model training, Nvidia is also gradually participating in the upgrading of the upstream industry chain. This year, Nvidia teamed up with TSMC, ASML, and Synopsys to release the computational lithography library cuLitho.

Computational lithography is a critical step in the field of chip design and fabrication and one of the largest computational loads. The technological breakthrough of the computational lithography library lies in the fact that the computational lithography can be accelerated by deploying the DGX AI computing system with a large number of GPUs, making it dozens of times faster than the original CPU-based computing speed, while reducing the total cost of the computing process. energy consumption.

This will help fabs shorten prototype cycle times, increase yields, reduce carbon emissions, lay the groundwork for 2nm and beyond, and pave the way for curved masks, high numerical aperture extreme ultraviolet, subatomic photoresist models, and more The new solutions and innovative technologies required by new technology nodes offer more possibilities.

In the opinion of many people in the industry, although the downstream applications will not be affected in the short term, these upstream R&D and upgrades will affect the development of the industry in the long term, forming an intergenerational gap.

"Nvidia has always had its own development path in the iteration of the GPU architecture. The development of the past few years has also made Nvidia the leader in the field of AI computing power chips. Because of its leadership, Nvidia will think about how to do more. Yuan's layout and in-depth cooperation in the industry, so that we can better understand the needs of the industry, such as cooperation with TSMC is a good example," said a chip industry expert.

Of course, both Intel and AMD have sounded the clarion call to counterattack.

In July, Intel launched the AI ​​chip Habana Gaudi 2 for the Chinese market; in June, AMD launched the AI ​​chip Instinct MI 300X, both of which directly compete with the Nvidia 100 series.

At present, in the data center market, Nvidia, Intel, and AMD form a tripartite confrontation. But with the official release of GH200, Grace CPU is officially on the stage to compete, and it should be Intel and AMD that should feel the most. Although everyone knows that GH200 will be released sooner or later, but when it is actually released, it is still touched.

The power game around computing power will continue.

[About Science and Technology Cloud Report]

Focus on original enterprise-level content experts - technology cloud reports. Founded in 2015, it is the top 10 media in the cutting-edge enterprise IT field. Recognized by the Ministry of Industry and Information Technology, Trusted Cloud, one of the official media designated by the Global Cloud Computing Conference. In-depth original reports on cloud computing, big data, artificial intelligence, blockchain and other fields.

Guess you like

Origin blog.csdn.net/weixin_43634380/article/details/132274607
Recommended