11 minutes to finish training GPT-3! Nvidia H100 sweeps 8 MLPerf benchmark tests, the next generation of graphics cards will be released in 25 years

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

Source | Xinzhiyuan ID | AI-era

In the latest MLPerf training benchmark test, the H100 GPU set new records in all eight tests!

Today, the NVIDIA H100 pretty much dominates all categories and is the only GPU used in the new LLM benchmark.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

A cluster of 3,584 H100 GPUs completed a large-scale benchmark based on GPT-3 in just 11 minutes.

The MLPerf LLM benchmark is based on OpenAI's GPT-3 model and contains 175 billion parameters.

Lambda Labs estimates that training such a large model requires about 3.14E23 FLOPS of computation.

edit

Add picture annotations, no more than 140 words (optional)

How the 11-minute-trained GPT-3 monster is made The top-ranked system on the LLM and BERT natural language processing (NLP) benchmarks, jointly developed by NVIDIA and Inflection AI. Hosted by CoreWeave, a cloud service provider specializing in enterprise-grade GPU-accelerated workloads. The system combines 3,584 NVIDIA H100 accelerators with 896 Intel Xeon Platinum 8462Y+ processors.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

Because Nvidia introduced a new Transformer engine in the H100, which is specially designed to accelerate Transformer model training and reasoning, increasing the training speed by 6 times. The performance that CoreWeave can deliver from the cloud is very close to what Nvidia can deliver from an AI supercomputer running in an on-premises data center. This is thanks to the low-latency networking of the NVIDIA Quantum-2 InfiniBand network used by CoreWeave.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

As the number of H100 GPUs involved in training expands from hundreds to more than 3,000. Good optimization enables the entire technology stack to achieve near-linear performance scaling in the demanding LLM test. If the number of GPUs is reduced to half, the time to train the same model increases to 24 minutes. Showing that the efficiency potential of the overall system, as GPUs increase, is superlinear. The main reason is that Nvidia has considered this problem from the beginning of GPU design, using NVLink technology to efficiently realize the communication between GPUs.

edit

Add picture annotations, no more than 140 words (optional)

Of the 90 systems tested, 82 were accelerated using NVIDIA GPUs.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

Single card training efficiency

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

System cluster training time vs. Intel The systems tested used 64 to 96 Intel Xeon Platinum 8380 processors and 256 to 389 Intel Habana Gaudi2 accelerators. However, Intel submitted GPT-3 with a training time of 311 minutes. Compared with Nvidia, the results are a little bit miserable.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

Analyst: Nvidia has too much advantage

Industry analysts believe that Nvidia's technical advantage in GPU is very obvious.

As an AI infrastructure provider, its dominant position in the industry is also reflected in the stickiness of the ecosystem that Nvidia has built up over the years. The AI ​​community is also very dependent on Nvidia's software. Almost all AI frameworks are based on the underlying CUDA libraries and tools provided by Nvidia.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

And it also offers full-stack AI tools and solutions. In addition to supporting AI developers, Nvidia continues to invest in enterprise-grade tools for managing workloads and models. In the foreseeable future, Nvidia's leading position in the industry will be very stable. Analysts went further. As shown in the MLPerf test results, the powerful functions and efficiency of the NVIDIA system for AI training in the cloud are the biggest capital of NVIDIA's "war for the future".

edit

Add picture annotations, no more than 140 words (optional)

Next-generation Ada Lovelace GPU, to be released in 2025

Zhiye Liu, a freelance writer at Tom's Hardware, also recently published an article introducing plans for the next generation of Nvidia Ada Lovelace graphics cards.

There is no doubt about the ability of H100 to train large models. With only 3584 H100s, a GPT-3 model can be trained in just 11 minutes. At a recent press conference, Nvidia shared a new roadmap detailing next-generation products, including the successor to the GeForce RTX 40-series Ada Lovelace GPUs, the former of which are some of the best gaming graphics cards available today.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

According to the roadmap, Nvidia plans to launch the "Ada Lovelace-Next" graphics card in 2025. If the current naming scheme continues, the next generation of GeForce products should be listed as the GeForce RTX 50 series. According to the information obtained by the South American hacker organization LAPSU$, Hopper Next is likely to be named Blackwell. On consumer-grade graphics cards, Nvidia maintains a two-year update rhythm. They launched Pascal in 2016, Turing in 2018, Ampere in 2020, and Ada Lovelace in 2022. If the successor of Ada Lovelace will be launched in 2025 this time, Nvidia will undoubtedly break the usual rhythm.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

The recent AI explosion has created a huge demand for NVIDIA GPUs, whether it is the latest H100 or the previous generation A100. According to reports, a major manufacturer has ordered Nvidia GPUs worth $1 billion this year. Despite export restrictions, my country remains one of Nvidia's largest markets in the world. (It is said that in Shenzhen Huaqiangbei Electronics Market, you can buy a small amount of Nvidia A100, each priced at 20,000 US dollars, which is twice the usual price.) In this regard, Nvidia has fine-tuned some AI products and released Specific SKUs such as H100 or A800 have been selected to meet export requirements.

edit

Add picture annotations, no more than 140 words (optional)

Zhiye Liu analyzed this. From another perspective, export regulations are actually beneficial to Nvidia, because it means that chip manufacturer customers must buy more variants of the original GPU to obtain the same performance. This can also understand why Nvidia will give priority to generating computing GPUs instead of gaming GPUs. Recent reports indicate that Nvidia has ramped up production of compute-grade GPUs. Not facing serious competition from AMD's RDNA 3 product stack, nor does Intel pose a serious threat to the GPU duopoly, so Nvidia can stall on the consumer side.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

More recently, Nvidia has expanded its GeForce RTX 40-series product stack with the GeForce RTX 4060 and GeForce RTX 4060 Ti. There's potential for a GeForce RTX 4050, along with an RTX 4080 Ti or GeForce RTX 4090 Ti on top, etc. If forced to, Nvidia can also take out a product from the old Turing version, update Ada Lovelace, give it a "Super" treatment, and further expand the Ada lineup. Finally, Zhiye Liu said that at least this year or next year, the Lovelace architecture will not really be updated. References: https://blogs.nvidia.com/blog/2023/06/27/generative-ai-debut-mlperf/

Guess you like

Origin blog.csdn.net/lqfarmer/article/details/131604460