The birth of GPT-5 requires 50,000 H100s! The total global demand for H100 is 430,000, and Nvidia GPU is in a shortage storm

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

Source | Xinzhiyuan ID | AI-era "Who will get how much H100 and when will get H100 are the hottest topics in Silicon Valley." OpenAI co-founder and part-time scientist Andrej Karpathy recently published an article explaining his shortage of Nvidia GPUs the opinion of.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

Recently, a picture "How many GPUs do we need" that has been widely circulated in the community has sparked discussions among many netizens.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

According to the content shown in the figure: - GPT-4 may be trained on about 10,000-25,000 A100s - Meta is about 21,000 A100s - Tesla is about 7,000 A100s - Stability AI is about 5,000 A100s - Falcon-40B is trained on 384 A100s Training – Inflection used 3500 and H100 to train a model with comparable capabilities to GPT-3.5 In addition, according to Musk, GPT-5 may need 30,000-50,000 H100. Previously, Morgan Stanley had stated that GPT-5 uses 25,000 GPUs and has been training since February, but Sam Altman later clarified that GPT-5 has not yet been trained. However, Altman previously stated that we have a very short supply of GPUs, and the fewer people using our products, the better. We'd be happy if people used less, because we don't have enough GPUs.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

In this article titled "Nvidia H100 GPU: Supply and Demand", an in-depth analysis of the current technology companies' usage and demand for GPUs. The article speculates that the large-scale H100 cluster capacity of small and large cloud providers is about to run out, and the demand trend for H100 will continue until at least the end of 2024.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

So, is GPU demand really a bottleneck? GPU demand from major companies: about 430,000 H100s At present, the outbreak of generative AI has not slowed down, and higher requirements are placed on computing power. Some startups are using Nvidia's expensive and extremely high-performance H100 to train models. GPUs are harder to come by than drugs at this point, Musk said. Sam Altman says that OpenAI is GPU limited, which delays their short-term plans (fine-tuning, dedicated capacity, 32k context windows, multimodality).

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

Karpathy's comments come as annual reports from major tech companies even discuss issues related to GPU access. Last week, Microsoft released its annual report and highlighted to investors that GPUs are a "key raw material" for its rapidly growing cloud business. If the required infrastructure is not available, there may be a risk factor for data center outages.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

This article is purportedly written by the author of the HK post.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

He guessed that OpenAI may need 50,000 H100, while Inflection needs 22,000, Meta may need 25k, and large cloud service providers may need 30k (such as Azure, Google Cloud, AWS, Oracle). Lambda and CoreWeave and other private clouds might need a total of 100k. He wrote that Anthropic, Helsing, Mistral, and Character might each need 10k. The authors say these are all rough estimates and guesses, some of which are double-counting cloud and end customers renting equipment from the cloud.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

Overall, global companies need about 432,000 H100s. Calculated at about $35k per H100, the total GPU needs cost $15 billion. This does not include domestic Internet companies that need a large number of H800s. There are also some well-known financial companies, such as Jane Street, JP Morgan, Two Sigma, etc., each of which is deploying, starting with hundreds of A/H100s and expanding to thousands of A/H100s.

edit

Add picture annotations, no more than 140 words (optional)

All large labs including OpenAI, Anthropic, DeepMind, Google, and X.ai are training large language models, and Nvidia's H100 is irreplaceable. Why is H100 the first choice? The H100 is more popular than the A100 as the first choice, partly due to lower cache latency and FP8 computing. Because its efficiency is up to 3 times, but the cost is only (1.5-2 times). Considering the overall system cost, the performance of the H100 is much higher. In terms of technical details, compared to the A100, the H100 is about 3.5 times faster at 16-bit reasoning, and about 2.3 times faster at 16-bit training.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

A100 vs H100 speed

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

H100 training MoE

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

H100 Mass Acceleration Most companies buy H100 and use it for training and inference, while A100 is mainly used for inference. But some companies are hesitant to switch because of the cost, capacity, risk of using and setting up new hardware, and the fact that existing software is already optimized for the A100. It's not a shortage of GPUs, it's a supply chain problem An Nvidia executive said the problem isn't a shortage of GPUs, but how those GPUs get to market. Nvidia is producing GPUs at full capacity, but the executive said that GPU production capacity is mainly limited by the supply chain. The chip itself may have sufficient capacity, but insufficient capacity of other components will severely limit the capacity of the GPU. The production of these components relies on other suppliers throughout the world. But the demand is predictable, so now the problem is gradually being solved. Production capacity of GPU chips First of all, Nvidia only cooperates with TSMC to produce H100. All of Nvidia's 5nm GPUs are only partnered with TSMC.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

It is possible to cooperate with Intel and Samsung in the future, but it is impossible in the short term, which limits the production of H100. According to the whistleblower, TSMC has 4 production nodes to provide capacity for 5nm chips: N5, N5P, N4, N5P and H100 is only produced on the 4N node of N5 or N5P, which is a 5nm enhanced node. Nvidia needs to share the capacity of this node with Apple, Qualcomm and AMD. The TSMC fab needs to plan the production capacity of each customer 12 months in advance. If Nvidia and TSMC underestimated the demand for H100 before, then the production capacity will be limited now. According to the whistleblower, it will take about half a year for the H100 to go from production to delivery. And the whistleblower also quoted a retired semiconductor industry professional as saying that the fab is not the production bottleneck of TSMC, and CoWoS (3D stacking) packaging is the gate of TSMC's production capacity. H100 memory production capacity As for another important component on H100, H100 memory, there may also be a problem of insufficient production capacity. HBM (High Bandwidth Memory), which is integrated with GPU in a special way, is a key component to ensure GPU performance.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

The main problem is HBM. Making it is a nightmare. Since HBM is difficult to produce, supplies are very limited. Both production and design must follow its rhythm. For HBM3 memory, Nvidia almost always uses SK Hynix products, and there may be some Samsung products, and there should be no Micron products. Nvidia wants SK Hynix to increase production capacity, and they are doing it. But both Samsung and Micron have limited capacity. Moreover, many other materials and processes, including rare earth elements, will be used in the manufacture of GPUs, which will also become possible factors that limit GPU production capacity. How will the future situation of GPU chips develop Nvidia's statement? Nvidia only revealed that they will be able to supply more GPUs in the second half of the year, but did not provide any quantitative information. We're dealing with supply for the quarter today, but we've also sourced a lot of supply for the second half of the year. We believe that the supply in the second half of the year will be much higher than that in the first half. - Nvidia CFO Colette Kress on the February-April 2023 earnings call to reveal what's next? The supply problem of GPU is now a vicious circle, and the scarcity causes GPU ownership to be regarded as a moat, which leads to more GPUs being hoarded, thus exacerbating the scarcity. – A person in charge of private cloud revealed when the next generation of H100 will appear? According to Nvidia's previous roadmap, the next generation of the H100 will not be announced until late 2024 to early 2025. Until that point in time, the H100 will be Nvidia's flagship product. However, Nvidia will launch a 120GB water-cooled version of the H100 during this period. According to industry insiders interviewed by the whistleblower, the H100 will be sold out by the end of 2023! ! How to get the computing power of H100? As the Nvidia executives mentioned earlier, the computing power provided by the H100 GPU will eventually be integrated into the industry chain through various cloud computing providers, so the shortage of H100 is caused by GPU generation on the one hand. Another aspect is how computing power cloud providers can effectively obtain H100 from Nvidia, and finally reach the customers who need it by providing cloud computing power. The process is simply: the computing power cloud provider purchases H100 chips from OEMs, and then builds computing power cloud services and sells them to various AI companies, so that end users can obtain H100 computing power. There are also various factors in this process, which have caused the current shortage of H100 computing power, and the article that broke the news also provides a lot of information within the industry for your reference. Who can I buy the H100 board from? OEMs such as Dell, Lenovo, HPE, Supermicro and Quanta will sell both the H100 and the HGX H100. Cloud providers like CoreWeave and Lambda buy GPUs from OEMs and lease them to startups. Hyperscalers (Azure, GCP, AWS, Oracle) will work more directly with Nvidia, but will also buy from OEMs. This seems to be similar to the way gamers buy graphics cards. But even to buy DGX, users need to purchase through OEM, and cannot place an order directly with Nvidia. Delivery times are terrible for 8-GPU HGX servers, and just fine for 4-GPU HGX servers. But every customer wants an 8-GPU server! Does the startup buy from OEMs and resellers? If a start-up company wants to obtain the computing power of H100, it does not end up buying H100 and plugging it into its own GPU cluster. They usually rent computing power from large clouds such as Oracle, private clouds such as Lambda and CoreWeave, or providers that work with OEMs and data centers such as FluidStack.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

If you want to build your own data center, you need to consider the time to build the data center, whether you have the personnel and experience in hardware, and whether the capital expenditure can be afforded. Renting and hosting servers has just gotten easier. If users want to build their own data centers, a dark fiber line must be laid to connect to the Internet - $10,000 per kilometer. Much of the infrastructure has already been built and paid for during the dot-com boom. Just rent it, it's cheap. – The order of a person in charge of a private cloud from leasing to self-built cloud services is roughly: renting cloud services on demand (pure leasing cloud services), booking cloud services, and managed cloud services (purchasing servers, cooperating with providers to host and manage servers) , self-hosting (purchasing and hosting a server yourself)). Most start-ups that need H100 computing power will choose to book cloud services or managed cloud services. Comparison among large cloud computing platforms For many startups, the cloud services provided by large cloud computing companies are the ultimate source of their H100. The choice of cloud platform also ultimately determines whether they can obtain stable H100 computing power. The overall point is: Oracle is not as reliable as the big three clouds. But Oracle will provide more technical support help. The main differences among the other large cloud computing companies are: Networking: While most startups looking for large A100/H100 clusters are looking for InfiniBand, AWS and Google Cloud have been slower to adopt InfiniBand because they use their own methods to Provide services. Availability: Most of Microsoft Azure's H100 is dedicated to OpenAI. Google has had a harder time acquiring the H100.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

Because Nvidia seems to be inclined to provide more H100 quotas for those clouds that have no plans to develop competing machine learning chips. (This is all speculation, not conclusive facts.) And the three major cloud companies except Microsoft are all developing machine learning chips, and Nvidia alternatives from AWS and Google are already on the market, occupying a part of the market share. In terms of the relationship with Nvidia, it might go like this: Oracle and Azure > GCP and AWS. But that's just guesswork. Smaller cloud computing power providers will be cheaper, but in some cases, some cloud computing providers will exchange computing power for equity. How Nvidia allocates H100 Nvidia will provide each customer with a quota of H100. But if Azure says "Hey, we want to get 10,000 H100, all for Inflection" you get a different quota than if Azure says "Hey, we want to get 10,000 H100 for the Azure cloud". Nvidia cares about who the end customer is, so if Nvidia is interested in the end use customer, the cloud computing provider platform will get more H100. Nvidia wants to understand as much as possible who the end customer is, and they prefer customers with good brands or startups with a strong pedigree. Yes, that seems to be the case. NVIDIA likes to guarantee GPU access to emerging AI companies (many of which have close ties to them). See Inflection - an AI company they invest in - testing a huge H100 cluster on CoreWeave, which they also invest in. – Concluding remarks by a person in charge of a private cloud The current desire for GPUs has elements of bubbles and hype, but it does exist objectively. There are companies like OpenAI with products like ChatGPT that are getting traction, but they still can't get enough GPUs. Other companies are buying and hoarding GPUs for future use, or for training large language models that the market may not even use. This creates a bubble of GPU shortages. But no matter how you look at it, Nvidia is the green king in the fortress.

Edit toggle to center

Add picture annotations, no more than 140 words (optional)

References: https://news.ycombinator.com/item?id=36951872https://twitter.com/lpolovets/status/1686545776246390784https://venturebeat.com/ai/nvidia-gpu-shortage-is-top-gossip -of-silicon-valley/

Guess you like

Origin blog.csdn.net/lqfarmer/article/details/132238210