Detailed - Nvidia H100 GPU: Supply and Demand

This post is an exploration of supply and demand for GPUs, specifically Nvidia H100s. We'll also be releasing the song and music video the same day as this post.

The article went viral. It was featured on HN, techmeme, the front page of many email newsletters, got tweets from Andrej Karpathy and others, a comment from Inflection's Mustafa (who will have a $100M GPU online soon) and from Stability Emad, the song was mentioned in the New York Times, and various asset managers and AI founders reached out. If you haven't read it, I hope you enjoy it!

Introduction #

As of <> March 2023, AI appears likely to be bottlenecked by GPU supply.

“One reason the AI ​​boom is underestimated is the shortage of GPUs/TPUs. This shortage leads to various constraints on product launches and model training, but these constraints are invisible. Instead, all we see is Nvidia’s price spikes. Once supply meets demand, things accelerate. — Adam D'Angelo, Quora CEO, Poe.com, former Facebook CTO

These are the CEOs and companies that matter most to GPU supply and demand as well as AI. big version

Is there really a bottleneck? #

Elon Musk said, “GPUs are harder to get than drugs at this point. 1

Sam Altman stated that OpenAI is GPU limited and it's pushing back their short-term plans (fine-tuning, dedicated capacity, 32k context windows, multimodality). 2

Large scale H100 clusters of small and large cloud providers are running out of capacity. 3

"Everyone wants Nvidia to make more A/H100" 4 — Message from cloud provider exec "We are sorely short on GPUs, the fewer people using our products the better" "We would love it if they used less It, because we don't have enough GPUs" Sam Altman, CEO, OpenAI5

It's a nice voice to remind the world how much users love your product, but it's also true that OpenAI needs more GPUs.

For Azure/Microsoft:

  1. Internally they rate limit GPU workers. They have to line up like college mainframes in the 1970s. I think OpenAI is absorbing all of that right now.

  2. Coreweave's deal is glued to their GPU infrastructure.

- Anonymous

In short: yes, H100 GPUs are in short supply. I've been told that for companies looking for 100's or 1000+ H100's, Azure and GCP are effectively running out of capacity and AWS is on the verge of being phased out. 6

This "capacity deficit" is based on the allocation Nvidia gave them.

What do we want to know about bottlenecks?

  1. What caused it (demand, supply)

  2. how long will it last

  3. what will help to fix it

Directory #

Graphics Song #

Well. . . We also released a song the same day we published this article. It is fire.

If you haven't heard the GPU song, do yourself a favor and play it.

I just watched the video. very funny. Well done. —Mustafa Suleyman, CEO, Inflection AI

It's on Spotify , Apple Music , and YouTube .

Check out more about the song here .

Requirements for H100 GPU #

Cause of Bottleneck - Demand

  1. Specifically, what do people want to buy that they can't?

  2. How many of these GPUs do they need?

  3. Why can't they use different GPUs?

  4. What are the different product names?

  5. Where do companies buy them and how much do they cost?

Who needs H100? #

“It seems like everyone and their dogs are buying GPUs at this point” 7 – Elon

Who needs/has 1,000+ H100 or A100 #

  • Startups training LLMs

  • OpenAI (via Azure), Anthropic, Inflection (via Azure 8 and Core Weave 9 ), Mistral

  • cloud service provider

  • The Big Three: Azure, GCP, AWS

  • Another public cloud: Oracle

  • Larger private clouds like CoreWeave, Lambda

  • other big companies

  • Tesla 7 10

Who needs/has 100+ H100 or A100 #

Startups making major tweaks to large open source models.

What are most high-end GPUs used for? #

For companies using private clouds (CoreWeave, Lambda), companies with hundreds or thousands of H100s, almost all LLM and some diffusion model work. Some of these are fine-tuning of existing models, but most are new startups you may not know about yet that are building new models from the ground up. They're doing $100,000-$500,000 contracts for 3 years with hundreds to thousands of GPUs.

For companies using on-demand H100 with a small number of GPUs, it's still probably >50% LLM-related usage.

Private clouds are now starting to see inbound demand from enterprises that would normally use their default big cloud providers, but everyone is out.

Are large AI labs more constrained in inference or training? #

Depends on how much product traction they have! Sam Altman said that if forced to choose, OpenAI would rather have more reasoning capabilities, but OpenAI is still limited by both. 11

Which GPUs do people need? #

Mainly H100. Why? For LLM, it is the fastest for inference and training (H100 is also usually the best price/performance for inference)

Specifically: the 8-GPU HGX H100 SXM server.

My analysis is that it's also cheaper to run for the same job. V100 would be great if you could find them, you can't – Anonymous Honestly, not sure [this is the best bang for the buck]? The training price/performance ratio of the A100 looks about the same as the H100. To extrapolate, we found the A10G to be more than adequate, and much less expensive. – Private cloud exec This [A10G is more than enough] has been true for a while. But in the world of the Falcon 40b and Llama 2 70b we see a lot of use, that's not true anymore. We need A100 for these 2xA100 to be exact. Therefore, interconnect speed is important for inference. – (Various) Private Cloud Executives

What are the most common needs of LLM startups? #

For training LLM: H100, 3.2Tb/s InfiniBand.

What LLM training and reasoning do companies want? #

For training they tend to want the H100, for inference it's more about performance per dollar.

It's still a performance-per-dollar question for the H100s vs A100, but the H100s are generally favored as they scale better with a higher number of GPUs and provide faster training times and start, train or improve Model speed/compression time is critical for startups.

"For multi-node training, they all ask for A100 or H100 with InfiniBand networking. The only non-A/H100 requests we see are for inference where the workload is a single GPU or a single node" – Private Cloud Executive

What are the important elements of LL.M. training? #

  • memory bandwidth

  • FLOPS (tensor core or equivalent matrix multiply unit)

  • Caching and cache latency

  • Additional features such as FP8 calculations

  • Computing performance (related to the number of cuda cores)

  • Interconnect speed (e.g. Infiniband)

The H100 outperforms the A100 in part due to factors such as lower cache latency and FP8 compute.

The H100 is preferred as it is 3x more efficient at only (1.5 - 2x) cost. Combined with overall system cost, the H100 yields more performance per dollar (if you look at system performance, maybe 4-5x better performance per dollar). — Deep Learning Researcher

What are the other costs of training and running an LLM? #

The GPU is the most expensive individual component, but there are other costs.

System RAM and NVMe SSDs are expensive.

InfiniBand networks are expensive.

10-15% of the total cost of running a cluster is probably for power and hosting (electricity, data center building costs, land costs, staff) - roughly split between the two, it could be 5-8% power and 5-10% Other hosting cost elements (land, buildings, staff).

It's mostly networking and reliable data centers. AWS is difficult to use due to network limitations and unreliable hardware - Deep Learning Researcher

What about GPUs? #

A GPU is not a critical requirement, but might help.

I wouldn't say it's supercritical, but it has an impact on performance. I guess it depends on where your bottleneck is. For some architectures/software implementations, the bottleneck isn't necessarily the network, but if it's GPUDirect, it can make a difference of 10-20%, which is a decent number for an expensive training run. Having said that, GPUDirect RDMA is so ubiquitous these days it almost goes without saying that it is supported. I don't think the support for non-InfiniBand networks is too strong, but most GPU clusters optimized for neural network training have Infiniband networks/cards. A bigger factor affecting performance might be NVLink, since this is rarer than Infiniband, but only critical if you have a specific parallelization strategy. So features like powerful networking and GPUdirect let you goof off, and you're guaranteed no-frills software out of the box. However, if you are concerned about cost or using the infrastructure you already have, this is not a strict requirement. – Deep Learning Researcher

What prevents LLM companies from using AMD GPUs? #

In theory, a company could buy a bunch of AMD GPUs, but it will take time to get everything working. Development time (even just 2 months) can mean being later to market than the competition. So CUDA is now NVIDIA's moat. - Private cloud exec I suspect that 2 months is an order of magnitude difference, which may not be a meaningful difference, see Training LLMs with AMD MI250 GPUs and MosaicML - Machine learning engineer who would risk scrapping 10,000 AMD GPUs or 10 ,000 random startup silicon chips at risk? That's almost a $30 billion investment. – Private Cloud Executive MosaicML/MI250 - Has anyone asked about AMD availability? AMD doesn't seem to be giving Frontier more than they need, and now TSMC's CoWoS capacity is being absorbed by Nvidia. MI250 might be a viable alternative, but not available. – Retired semiconductor industry professional

H100 vs. A100: How much faster is the H100 than the A100? #

3-bit inference is about 5.16x faster12 For 2-bit training, the speedup is about 3.16x. 13

Here's some more reading for you: 1 2 3 .

Everyone wants to upgrade from A100 to H100? #

Most people will want to buy the H100 and use it for training and inference, and switch their A100 to use it primarily for inference. However, some may be hesitant to switch due to the cost, capacity, risk of using and setting up new hardware, and the fact that their existing software is already optimized for the A100.

Yes, the A100 will be what the V100 is today in a few years. I don't know of anyone training LLM on the V100 right now due to performance limitations. But they are still used for inference and other workloads. Likewise, as more and more AI companies move workloads to the H100, pricing for the A100 will drop, but there will always be demand, especially for inference. – Private cloud execs think it’s also plausible that some startups that raised huge sums of money end up going out of business and then there are a lot of A100’s coming back into the market. – (Various) Private Cloud Executives

Over time, people will move, and the A100 will be used more for inferencing.

What about the V100? Higher VRAM cards are better suited for larger models, so cutting-edge groups prefer the H100 or A100.

The main reason for not using V100 is the lack of brainfloat16 (bfloat16, BF16) data types. Without it, it is difficult to easily train a model. The poor performance of OPT and BLOOM is mainly due to not having this data type (OPT was trained in float16, prototyping of BLOOM was mostly done in fp16, which produced no data generalized to training runs done in bf16 Medium) — Deep Learning Researcher

What is the difference between H100s, GH200s, DGX GH200s, HGX H100s and DGX H100s? #

  • H100 = 1x H100 GPU

  • HGX H100 = Nvidia server reference platform for OEMs to build 4 GPU or 8 GPU servers. Built by 3rd party OEMs like Supermicro.

  • DGX H100 = Official Nvidia H100 server with 8 H100s. 14 Nvidia is the sole supplier.

  • GH200 = 1x H100 GPU plus 1x Grace CPU. 15

  • DGX GH200 = 256x GH200, 16 will be available at the end of 2023. 17 may only be provided by Nvidia.

There's also MGX for the big cloud companies.

Which of these will be the most popular? #

Most companies will buy the 8-GPU HGX H100, not the 18 DGX H100 or 4-GPU HGX H100 servers.

How much do these GPUs cost? #

Price for 1x DGX H100 (SXM) and 8x H100 GPUs is $4.6M including required support. $460k of the $100k is required support. The specifications are as follows. Startups can get an Inception rebate of around $50k and can be used for up to 8x DGX H100 boxes for a total of 64 H100s.

1x HGX H100 (SXM) and 8x H100 GPUs cost between 300k-380k depending on the specs (network, storage, RAM, CPU) and the margin and support level of whoever is selling it. The top end of the range, $360k-380k including support, is what you might expect with the same specs as the DGX H100.

1x HGX H100 (PCIe) and 8x H100 GPUs are around $300k including support, depending on specs.

The market price for a PCIe card is around $30k-32k.

SXM cards aren't really sold as individual cards, so it's hard to give pricing there. Usually only sold as 4-GPU and 8-GPU servers.

About 70-80% of the demand is SXM H100 and the rest is PCIe H100. Demand in the SXM segment is on the rise, as PCIe cards were the only cards available for the first few months. Given that most companies buy 8-GPU HGX H100 (SXM), the approximate spend per 360 H380 is 8k-100k including other server components.

The DGX GH200 (as a reminder, contains 256x GH200, each GH200 containing 1x H100 GPU and 1x Grace CPU) will likely cost between 15mm-25mm - although this is a guess and not based on a pricing table. 19

How many GPUs are needed? #

  • GPT-4 may have been trained on between 10,000 and 25,000 A100s. 20

  • Meta has about 21,000 A100s, Tesla has about 7,000 A100s, and Stable AI has about 5,000 A100s. twenty one

  • The Falcon-40B was trained on 384 A100s. twenty two

  • Inflection uses 3,500 H100 as its GPT-3.5 equivalent model. twenty three

By the way, we have 22k up and running by March. And run more than 5.<>k today. —Mustafa Suleyman, CEO, Inflection AI

According to Elon, GPT-5 may require 30k-50k H100. Morgan Stanley said in May 2023 that GPT-25 would use 000,000,2023 GPUs, but they also said that it was already training as of <> month 2023, Sam Altman in <> year< > month means it hasn't been trained yet, so MS's information may be outdated.

GCP has about 25k H100, Azure maybe 10k-40k H100. It should be similar for Oracle. Most of Azure's capacity will be dedicated to OpenAI.

CoreWeave on the pitch for 35k-40k H100 - not live but based on bookings.

How many H100s do most startups order? #

For LLM: For fine tuning, tens or low hundreds. For training, thousands of people.

How much H100 might the company want? #

OpenAI may need 50k. The inflection point requires 22k. 24 Meta might be 25k (I'm told Meta actually wants 100k or more). Larger clouds may require 30k (Azure, Google Cloud, AWS and Oracle). Maybe 1 million for Lambda and CoreWeave and other private clouds. Anthropic, Helsing, Mistral, Character, maybe 10k each. Total approximation and guesswork, some of that is recounting the cloud and the end customers that will be renting from the cloud. But that's about 432k H100. About $35, $15 each, about $80 billion worth of GPUs. This also rules out Chinese companies like ByteDance (TikTok), Baidu, and Tencent, which want a lot of H<>.

There are also a number of financial firms that are deploying, starting with hundreds of A100 or H100, and going up to thousands of A/H100: Jane Street, JP Morgan, Two Sigma, Citadel and other names.

How does this compare to Nvidia's data center revenue?

April-28, 2023 is $<>.<>b Data Center Revenue. 25 August to <> 2023 data center revenue could be in the region of <> billion, assuming most of the higher guidance for the quarter is due to data center revenue growth rather than other segments.

Therefore, it may take a while for the supply shortage to disappear. But all my pitches are also probably exaggerated, and many of these companies won't be buying the H100 outright today, they'll be upgrading over time. In addition, Nvidia is actively increasing production capacity.

It seems possible. 400k H100 doesn't sound out of reach, especially considering everyone is doing massive 4 or 5 figure H100 deployments these days. – Private Cloud Executive

Summary: H100 Requirements #

The main thing to keep in mind as you go into the next section is that most large CSPs (Azure, AWS, GCP, and Oracle) and private clouds (CoreWeave, Lambda, and various others) want more H100s than they can be accessed. Most of the big AI product companies also want more of the H100 than they can get. Typically, they want an 8-GPU HGX H100 box with an SXM card, which costs around $400k-$8k per 300-GPU server, depending on specs and support. There may be a glut of hundreds of thousands of H100 GPUs (15b+ GPUs). With limited supply, Nvidia can raise prices purely to find a liquidation price, and to some extent it is doing so. But it's important to know that ultimately the allocation of the H100 depends on who Nvidia prefers to give the allocation to.

Supply H100 Graphics Card #

Cause of Bottleneck - Supply

  1. What are the bottlenecks in production?

  2. which components?

  3. who produces them?

Who made the H100? #

TSMC.

Can Nvidia use other chip factories for H100 production? #

Not really, at least not yet. They have worked with Samsung in the past. But on H100 and other 5nm GPUs, they only use TSMC. That means Samsung hasn't been able to satisfy their need for a cutting-edge GPU. They may work with Intel in the future, and again with Samsung, but none of that will happen anytime soon in a way that will help tighten the supply of the H100.

How are the different TSMC nodes related? #

TSMC 5nm series:

  • N5 26

  • 4N is either suitable as an enhanced version of N5, or lower than N5P

  • N5P

  • 4N is either suitable as an enhanced version of N5P or below N5 as an enhanced version of N5

  • N4

  • N4P

On which TSMC node is the H100 manufactured? #

TSMC 4N. This is a special node of Nvidia, which belongs to the 5nm series, and it is an enhanced 5nm, not a real 4nm.

Who else uses this node? #

It's Apple, but they mostly moved to N3 and kept most of the N3 capacity. Qualcomm and AMD are other big customers of the N5 family.

Which TSMC node does the A100 use? #

N7 27

How far in advance is fab capacity typically reserved? #

Not sure, maybe 12+ months though.

This goes for TSM and their big clients they plan it all together that's why TSM/NVDA probably underestimated their needs – Anonymous

How long does production take (production, packaging, testing)? #

It will take 100 months from the start of production of the H6 to the time the H100 is ready to be sold to customers (started in conversation, hope for confirmation)

Where is the bottleneck? #

Wafer start is not the bottleneck of TSMC. The aforementioned CoWoS (3D stacking) packaging is TSMC's door. – Retired semiconductor industry professional

H100 RAM #

What affects memory bandwidth on a GPU? #

Memory type, memory bus width, and memory clock speed.

Mainly HBM. Making it is a nightmare. Supplies are also mostly limited because HBM is hard to produce. Once you have the HBM, the design follows intuitively - Deep Learning Researcher

What memory is used on the H100s? #

On the H100 SXM, it's HBM3. 28 On H100 PCIe, it's actually HBM2e. 29

Who made the memory on the H100? #

Bus width and clock speed are designed by Nvidia as part of the GPU architecture.

For the HBM3 memory itself, I think Nvidia uses all or most of SK Hynix. Not sure if Nvidia used anything from Samsung in the H100, I don't believe it's anything Micron used in the H100.

As far as HBM3 is concerned, SK Hynix has done the most, then Samsung is not far behind, and then Micron is far behind. It looks like SK Hynix is ​​ramping up production, but Nvidia still wants them to produce more, Samsung and Micron haven't managed to ramp up production yet.

What else do you use when making a GPU? #

Note that some of these parts are more bottlenecked than others.

  • Metal elements: These elements are essential in the production of GPUs. They include:

  • Copper: Used to create electrical connections due to its high conductivity.

  • Tantalum: Often used in capacitors because of its ability to hold a high charge.

  • Gold: Used for high-quality plating and connectors due to its corrosion resistance.

  • Aluminum: Often used in heat sinks to help dissipate heat.

  • Nickel: Often used as a coating for connectors due to its corrosion resistance.

  • Tin: Used to solder components together.

  • Indium: Used in thermal interface materials for its good thermal conductivity.

  • Palladium: Used in some types of capacitors and semiconductor devices.

  • Silicon (Metalloid): This is the main material used to make semiconductor devices.

  • Rare Earth Elements: These elements are used in various parts of the GPU due to their unique properties.

  • Other metals and chemicals: These are used in various stages of production, from the creation of the silicon wafers to the final assembly of the GPU.

  • Substrate: These are the materials on which the GPU components are mounted.

  • Encapsulating Materials: These are used to house and protect the GPU die.

  • Solder balls and bond wires: These are used to connect the GPU chip to the substrate and other components.

  • Passive Components: These include capacitors and resistors, which are critical to the operation of the GPU.

  • Printed Circuit Board (PCB): This is the circuit board on which all the components of the GPU are mounted. It provides electrical connection between components.

  • Thermally Conductive Compounds: These are used to improve heat transfer between the chip and the heatsink.

  • Semiconductor manufacturing equipment: including photolithography machines, etching equipment, ion implantation equipment, etc.

  • Clean room facilities: These are necessary for the production of GPUs to prevent contamination of silicon wafers and other components.

  • Testing and Quality Control Equipment: These are used to ensure that the GPU meets the required performance and reliability standards.

  • Software and Firmware: These are critical to controlling the operation of the GPU and interfacing with the rest of the computer system.

  • Packaging and Shipping Materials: These are necessary to deliver the final product to the customer in perfect condition.

  • Software Tools: Software tools for computer-aided design (CAD) and simulation are essential for designing the architecture and testing functionality of the GPU.

  • Energy consumption: Due to the use of high-precision machinery, the manufacturing process of GPU chips requires a lot of electricity.

  • Waste management: The production of GPUs generates waste that must be managed and disposed of properly, as many of the materials used can be harmful to the environment.

  • Testing capacity: custom/specialized testing equipment to verify functionality and performance.

  • Chip Packaging: The assembly of silicon wafers into component packages that can be used in larger systems.

Outlook and Forecast #

What is Nvidia talking about? #

Nvidia revealed that they have more supply in the second half of the year, but other than that, they didn't say much or quantify it.

"We're looking at supply for the quarter today, but we've also sourced a lot of supply for the second half of the year" "We believe our supply for the second half of the year will be significantly greater than h1" - Nvidia CFO Colette Kress in 2023 <> month to <> month earnings call

What's next? #

I think we may now have a self-reinforcing cycle where scarcity causes GPU capacity to be seen as a moat, which leads to more GPU hoarding, which exacerbates scarcity. – Private Cloud Executive

When will there be an H100 successor? #

Might not be announced until late 2024 (mid 2024 to early 2025), based on historical timing between Nvidia architectures.

Until then, the H100 will be Nvidia's top-of-the-line GPU. (GH200 and DGX GH200 are not counted, they are not pure GPU, they both use H100 as their GPU)

Will there be a higher video memory H100? #

Maybe a liquid-cooled 120GB H100s.

When will the shortage end? #

One group I spoke with mentioned that they are actually sold out by the end of 2023.

Purchasing H100 #

Who sells H100? #

OEMs such as Dell, HPE, Lenovo, Supermicro and Quanta sell the H100 and HGX H100. 30

When you need InfiniBand, you need to talk directly to Nvidia's Mellanox. 31

So GPU clouds like CoreWeave and Lambda buy from OEMs and lease them to startups.

Hyperscalers (Azure, GCP, AWS, Oracle) work more directly with Nvidia, but they often work with OEMs as well.

Even for DGX, you'll still be buying through OEM. You can talk to Nvidia, but you'll be buying through OEM. You do not place an order directly with Nvidia.

How about the delivery time? #

The lead time on the 8-GPU HGX server is terrible, while the lead time on the 4-GPU HGX server is good. Everyone wants an 8-GPU server!

If a startup places an order today, when will they have access to SSH? #

This will be a staggered deployment. Let's say this is an order for 5,000 GPUs. They might get 2-000 or 4,000 in 4 months and then the remaining about 6 months total.

Does the startup buy from OEMs and resellers? #

No. Startups typically go to large clouds like Oracle that lease access, or private clouds like Lambda and CoreWeave, or providers that work with OEMs and data centers like FluidStack.

When Do Startups Build Their Own Data Centers vs Hosting? #

When it comes to building a data center, the considerations are when it will take to build the data center, do you have the staff and experience with the hardware, and whether it's capital expenditure expensive.

Easier to rent and colo servers. If you want to build your own DC, you have to run a dark fiber line at your location to connect to the internet - $100,000 per kilometer. Much of the infrastructure has already been built and paid for during the dot-com boom. Now you can rent it, pretty cheap - Private Cloud Executive

The spectrum from leasing to owning is: on-demand cloud (pure leasing using cloud services), reserved cloud, colo (buying servers, working with a provider to host and manage them), self-hosting (buying and hosting servers yourself).

Most startups that need a lot of H100 will do a reserved cloud or colo.

How does Big Cloud compare? #

Oracle infrastructure is believed to be less reliable than the big three clouds. In exchange, Oracle will provide additional technical support assistance and time.

100%. A whole bunch of unhappy customers lol – private cloud exec I think [Oracle] has a better network – (different) private cloud exec

Generally, startups choose whoever offers the best combination of support, price, and capacity.

The main differences of Big Cloud are:

  • Networking (AWS and Google Cloud have been slower to adopt InfiniBand as they have their own approaches, although most startups looking for large A100/H100 clusters are looking for InfiniBand)

  • Availability (Azure's H100 is mainly for OpenAI. GCP is working hard to get H100.

Nvidia seems to be leaning toward better allocations for clouds that don't build competing machine learning chips. (This is all speculation, not hard truth. All three big clouds are developing machine learning chips, but AWS and Google's Nvidia alternatives are already available and may already cost Nvidia dollars.

Also guesswork, but I agree that Nvidia likes Oracle for this reason – Private Cloud Exec

Some large clouds are better priced than others. As one private cloud exec noted, "For example, a100 is much more expensive on AWS/Azure than GCP.

Oracle told me they have "100s of the thousands of H10s" coming online later this year. They boast about their special relationship with Nvidia. but. . . In terms of pricing, they are much higher than anyone else. They didn't price me the H100, but for the A100 80gb they quoted me close to $4/hour, which is almost 2x more than what GCP quoted for the same hardware and same commit. – Anonymous

The smaller clouds are better priced, except in some cases where one of the big clouds does a weird deal in exchange for equity.

It could go something like this: Oracle and Azure > GCP and AWS relationship. But that's just guesswork.

Oracle was the first to launch the A100 , and they partnered with Nvidia to host an Nvidia-based cluster . Nvidia is also an Azure customer.

Which big cloud has the best network? #

Azure, CoreWeave, and Lambda all use InfiniBand. Oracle has a good network, it's 3200 Gbps, but it's Ethernet and not InfiniBand, which can be around 15-20% slower than IB for use cases like high parameter count LLM training. The networking of AWS and GCP is not so good.

Which big clouds are enterprises using? #

In a private data point of about 15 enterprises, all 15 are AWS, GCP or Azure, zero Oracle.

Most businesses will stick with their existing cloud. Wherever desperate startups go, there is supply.

What about DGX Cloud, and who is Nvidia working with? #

"NVIDIA is working with leading cloud service providers to host DGX cloud infrastructure, starting with Oracle Cloud Infrastructure (OCI)" - You handle Nvidia's sales, but you lease it through your existing cloud provider (start with Oracle , then Azure, then Google Cloud instead of launching with AWS) 32 33

"The ideal mix is ​​10% Nvidia DGX cloud and 90% CSP cloud," Jensen said on the last earnings call.

When will Dayun launch their H100 preview? #

CoreWeave was the first. 34 Nvidia gave them an earlier allocation, presumably to help increase competition among the big clouds (since Nvidia is an investor).

Azure announced on 13/100 that H<> is available for preview. 35

Oracle announced on 21/21 that H<> is limited in number. 36

Lambda Labs announced on 21/21 that H<> will be added at the beginning of <>. 37

AWS announced on 21/21 that H<> will be available in preview in a few weeks. 38

Google Cloud announced on October 100 that it began providing private preview for H<>. 39

Which companies use which clouds? #

  • OpenAI: Azure.

  • Variations: Azure and CoreWeave.

  • Humans: AWS and Google Cloud.

  • Cohere:AWS和Google Cloud。

  • Hugging face: AWS.

  • Stability AI: CoreWeave and AWS.

  • Character.ai: Google Cloud.

  • X.ai: Oracle.

  • Nvidia: Azure. 35

How can a company or cloud service provider get more GPUs? #

The ultimate bottleneck is getting allocations from Nvidia.

How does Nvidia allocation work? #

They have a quota assigned to each customer. But for example, Azure saying "hey, we want Inflection to use 10,000 H100" is not the same as Azure saying "hey, we want Azure's cloud to use 10,000 H100" - Nvidia cares about who the end customer is, so if Nvidia Excited about end customers, the cloud may be able to get additional allocations for specific end customers. Nvidia also wants to know as much as possible about who the end customer is. They prefer clients with beautiful brands or startups with a strong pedigree.

Yes, that seems to be the case. NVIDIA likes to guarantee access to GPUs for emerging AI companies (many of which have close ties to them). Check out Inflection, the AI ​​company they invested in, testing a huge H100 cluster on CoreWeave, which they also invested in. – Private Cloud Executive

If the cloud brings Nvidia an end customer and says they're ready to buy the xxxx H100, if Nvidia gets excited about the end customer, they usually give an allocation, which effectively increases the total capacity Nvidia allocates to that cloud - because it doesn't Counts towards Nvidia's original allocation to that cloud.

This is a unique situation because Nvidia is providing a large allocation for private clouds: CoreWeave has more H100s than GCP.

Nvidia is reluctant to give large grants to companies trying to compete directly (AWS Inferentia and Tranium, Google TPU, Azure Project Athena).

But in the end, if you put purchase orders and money in front of Nvidia, promise bigger deals and more money, and show that you have a low-risk profile, you're going to get more allocations than everyone else.

Epilogue #

Currently, we are GPU bound. Even though we're at what Sam Altman calls "the end of an era where these giant models will be."

It's either like a bubble or not, depending on where you look. Some companies like OpenAI have products like ChatGPT that are a good market fit and can't get enough GPUs. Others are buying or reserving GPU capacity for future access, or training LLMs that are unlikely to have product-market fit.

Nvidia is now the green king of the castle.

Tracking the Journey of GPU Supply and Demand #

The LLM product with the strongest product-market fit is ChatGPT. Here is the story of GPU requirements related to ChatGPT:

  1. Users love ChatGPT. It could generate recurring revenue of $500mm++ per year.

  2. ChatGPT runs on GPT-4 and GPT-3.5 API.

  3. The GPT-4 and GPT-3.5 APIs require a GPU to run. a lot of. OpenAI would like to release more features for ChatGPT and its API, but they can't because they don't have access to enough GPUs.

  4. They buy a lot of Nvidia GPUs through Microsoft/Azure. Specifically, their most wanted GPU is the Nvidia H100 GPU.

  5. To manufacture the H100 SXM GPU, Nvidia uses TSMC for manufacturing, and uses TSMC's CoWoS packaging technology, and uses HBM3 mainly from SK Hynix.

OpenAI isn't the only company wanting a GPU (but they're the one with the strongest product-market fit). Other companies also want to train large AI models. Some of these use cases make sense, but some are more hype-driven and less likely to result in product-market fit. This drives up demand. Also, some companies are concerned about not having access to GPUs in the future, so they place orders now even if they don't need them yet. So "the expectation of supply shortages creating more supply shortages" is happening.

Another major contributor to GPU demand comes from companies wanting to create new LLMs. Here's a story about the need for GPUs from companies that want to build new LLMs:

  1. Company executives or founders know that there are big opportunities in the field of artificial intelligence. Maybe they're a business that wants to train an LLM on their own data and use it externally or sell access, or maybe they're a startup that wants to build an LLM and sell access.

  2. They knew they needed GPUs to train large models.

  3. They talked to some guys from the big clouds (Azure, Google Cloud, AWS) trying to get many H100's.

  4. They found that they couldn't get a lot of allocation from the big clouds, and some of the big clouds didn't have good network setups. So they talked to other providers like CoreWeave, Oracle, Lambda, FluidStack. If they want to buy the GPUs themselves and own them, maybe they talk to OEMs and Nvidia too.

  5. In the end, they got a lot of GPUs.

  6. Now, they try to achieve product market fit.

  7. If it wasn't obvious, this path isn't as good - remember, OpenAI achieved product market fit on smaller models, then scaled them up. However, to get product market fit now, you have to fit the user's use case better than OpenAI's model, so first of all, you're going to need more GPUs than OpenAI started with.

H100 is expected to be short of hundreds or thousands of deployments through at least the end of 2023. The situation will become clearer by the end of 2023, but for now, it seems likely that shortages will continue until sometime in 2024 as well.

A tour of GPU supply and demand. big version

get in touch #

Author: Clay Pascal. Questions and notes can be emailed .

New Posts: Receive notifications about new posts by email .

Help: look here .

The natural next question - what about Nvidia alternatives? #

The natural next question is "well, what about the competition and alternatives? I'm exploring hardware alternatives as well as software approaches. Submit what I should explore as alternatives to this form . For example, TPU, Inferentia, LLM ASIC on the hardware side And other products, and Mojo, Triton and other products on the software side, and what it looks like using AMD hardware and software. I'm exploring everything, though focusing on what's available today. If you're freelance and want to help Llama 2 in Run on different hardware, please email me. So far we have run TPU and Inferentia on AMD, Gaudi, with help from people from AWS Silicon, Rain, Groq, Cerebras and others.

confirm #

This article contains a substantial amount of proprietary and previously unpublished information. When you see people wondering about GPU productivity, please point them in the direction of this article.

Thanks to a handful of executives and founders of private GPU cloud companies, some AI founders, ML engineers, deep learning researchers, some other industry experts, and some non-industry readers who provided helpful comments. Thanks to Hamid for the illustration.

A100\H100 is basically less and less in mainland China, and A800 is currently making way for H800. If you really need A100\A800\H100\H800GPU, it is recommended not to be picky. For most users, the difference between HGX and PCIE version is not It's very big, and you can buy it as soon as it's in stock.

In any case, choose regular brand manufacturers to cooperate. Under the current market situation where supply and demand are out of balance, most merchants in the market cannot supply, and even provide untrue information. If it is a scientific research server, Fenghu Yunlong scientific research server is the first choice. Mining, quality and after-sales service are guaranteed.

Welcome to communicate with Manager Chen【173-1639-1579】

What is the relationship and difference between machine learning, deep learning and reinforcement learning? - Zhihu (zhihu.com) Main application fields and three forms of artificial intelligence (AI): weak artificial intelligence , strong artificial intelligence and super artificial intelligence . Is it cost-effective to buy a hardware server or rent a cloud server? - Zhihu (zhihu.com) A comprehensive summary of deep learning machine learning knowledge points - Zhihu (zhihu.com) self-study machine learning, deep learning, artificial intelligence website look here - Zhihu (zhihu.com) 2023 deep learning GPU Recommended reference for server configuration (3) - Zhihu (zhihu.com)

Has been focusing on scientific computing servers for many years, shortlisted for political mining platforms, H100, A100, H800, A800, RTX6000 Ada, a single dual-socket 192 core server is available for sale,

Has been focusing on scientific computing servers for many years, shortlisted for political mining platforms, H100, A100, H800, A800, RTX6000 Ada, a single dual-socket 192 core server is available for sale.

Guess you like

Origin blog.csdn.net/Ai17316391579/article/details/132576380