Old Huang wins! The Nvidia H100 order has been scheduled for 24 years, and Musk can't sit still. . .

Click the card below to follow the " CVer " official account

AI/CV heavy dry goods, delivered in the first time

Click to enter —> [Computer Vision and Paper Submission] Exchange Group

Mengchen Keleixi sent from Aofei Temple
and reprinted from: Qubit (QbitAI)

Nvidia H100 , the best GPU for refining large models , is all sold out !

Even if you order it now, it will not be available until Q1 or even Q2 in 2024 .

This is the latest news revealed to the Wall Street Journal by CoreWeave, a cloud vendor closely related to Nvidia.

Supply has been extremely tight since early April. In just one week , expected delivery times jumped from reasonable levels to the end of the year .

a1851ef3fe1576df70d3bce97e4cded3.png

Amazon AWS, the world's largest cloud vendor, also confirmed the news. CEO Adam Selipsky recently said:

A100 and H100 are state of the art... hard to get even for AWS .

Earlier, Musk also said in a talk show: GPU is now more difficult to obtain than darts .

6a8e26366df6d0952b76e72548267b8d.png

If you find a "scalper" to buy, the premium is as high as 25% .

For example, the price on Ebay has risen from about $36,000 ex-factory to $45,000 , and the supply is scarce.

39e3d4c2a36b4c262d6efd1014cd03ea.png

Under this situation, large domestic technology companies such as Baidu, Byte, Ali, and Tencent have also placed orders for A800 and other chips totaling US$5 billion from Nvidia.

Among them, only 1 billion US dollars of goods can be delivered within this year, and the other 80% will have to wait until 2024.

So who are the existing high-end GPUs sold to? Where is this wave of production capacity stuck?

Who to sell the H100 to, Lao Huang has the final say

Since the outbreak of ChatGPT, Nvidia A100 and H100, which are good at training large models, have become popular.

Even H100 can already be used as an asset for start-up companies to find investment funds to obtain mortgage loans.

AI companies represented by OpenAI and Meta , cloud computing companies represented by Amazon and Microsoft , private clouds Coreweave and Lambda, and all kinds of technology companies that want to refine their own large-scale models are in huge demand.

However, it is basically Nvidia CEO Huang Renxun who has the final say on who to sell to.

50fa547c5e2f5a9620592d597f9cecda.png

According to The Information, H100 is in such a shortage that Nvidia allocated a large number of new cards to CoreWeave , and limited supply to established cloud computing companies such as Amazon and Microsoft .

(Nvidia has also invested directly in CoreWeave.)

External analysis is because these established companies are developing their own AI acceleration chips, hoping to reduce their dependence on Nvidia, so Lao Huang will help them.

Lao Huang also controls all aspects of the company's daily operations within Nvidia, even including "reviewing what sales reps are going to say to small potential customers . "

About 40 executives in the company report directly to Lao Huang , which is more than the direct subordinates of Meta Xiaozha and Microsoft Xiaona combined.

A former Nvidia manager revealed, "At Nvidia, Huang Renxun is actually the chief product officer of every product ."

aab8b5855387d25f3bbd8eceb18102d5.png

A while ago, it was also reported that Lao Huang did an exaggerated thing: asked some small cloud computing companies to provide their customer lists , and wanted to know who the end users of the GPU were.

According to external analysis, this move will allow Nvidia to better understand customers' needs for its products, and it has also raised concerns that Nvidia may use this information for additional benefits.

Some people also think that another reason is that Lao Huang wants to know who is really using the card and who is just hoarding the card and not using it.

d5085a9e52f859e8047a38cbb29955fd.png

Why do Nvidia and Lao Huang have such a big voice now?

The main reason is that the supply and demand of high-end GPUs are too unbalanced. According to the calculation of the GPU Utils website, the H100 gap is as high as 430,000 .

The author Clay Pascal estimated the number of H100 needed by various players in the AI ​​industry in the near future based on various known information and rumors.

For AI companies:

  • OpenAI may need 50,000 H100s to train GPT-5

  • Meta is said to need 100,000

  • InflectionAI's 22,000-card computing power cluster plan has been announced

  • Major AI start-ups such as Anthropic, Character.ai, MistraAI and HelsingAI in Europe each require on the order of 10,000.

For cloud computing companies:

  • In large-scale public clouds, Amazon, Microsoft, Google, and Oracle are all calculated at 30,000, a total of 120,000

  • The private cloud represented by CoreWeave and Lambda needs a total of 100,000

It adds up to 432,000.

This is not counting some financial companies and other industry participants such as JP Morgan Chase and Two Sigma who have also begun to deploy their own computing power clusters.

So the question is, with such a large supply gap, can't we produce more?

Lao Huang also wanted to, but the production capacity was stuck .

Where is the production capacity stuck this time?

In fact, TSMC has already adjusted its production plan for Nvidia.

However, it still failed to fill such a huge gap.

Charlie Boyle, vice president and general manager of Nvidia's DGX system, said that this time it was not stuck in the wafer , but that TSMC's CoWoS packaging technology production capacity encountered a bottleneck.

It is Apple that competes with Nvidia for TSMC's production capacity, and it will get the A17 chip for the next-generation iPhone before the September conference.

TSMC recently stated that it is expected to take 1.5 years to bring the packaging process backlog back to normal.

CoWoS packaging technology is TSMC's housekeeping skill, and the reason why TSMC can beat Samsung to become Apple's exclusive chip foundry depends on it.

The products packaged by this technology have high performance and strong reliability, which is why the H100 can have a bandwidth of 3TB/s (or even higher).

2e8d5bb0a978a904974ba860d5ec5202.png

The full name of CoWoS is Chip-on-Wafer-on-Substrate, which is a chip integration technology at the wafer level .

This technology can package multiple chips onto a silicon interposer with a thickness of only 100 μm .

According to reports, the area of ​​the next-generation interposer will reach 6 times the reticle, which is about 5000mm².

So far, apart from TSMC, no manufacturer has this level of packaging capability.

4716be63e71b319f3443bbacc7474490.jpeg

While CoWoS is certainly powerful, wouldn't it work without it? Can other manufacturers do it?

Not to mention that Lao Huang has already stated that "we will not consider adding a second H100 foundry".

In reality, it might not be possible.

Nvidia has cooperated with Samsung before, but the latter has never produced H100 series products for Nvidia, or even other 5nm process chips.

Based on this, some people speculate that Samsung's technical level may not be able to meet Nvidia's technological needs for cutting-edge GPUs.

As for Intel...their 5nm products don't seem to be coming out yet.

579c343bf5ebcabe3e48c7b6bc85321b.png

Since it is not feasible to change the manufacturer of Lao Huang, how about users directly switch to AMD?

AMD,Yes?

In terms of performance alone, AMD is indeed slowly catching up.

AMD's latest MI300X has 192GB of HBM3 memory, 5.2TB/s bandwidth, and can run 80 billion parameter models.

The DGX GH200 just released by Nvidia has a memory of 141GB of HBM3e and a bandwidth of 5TB/s.

But this does not mean that AMD can immediately fill the vacancy of the N card——

Nvidia's real "moat" lies in the CUDA platform.

9262db158df14680c8d49dca42012d06.png

CUDA has established a complete development ecosystem, which means that if users buy AMD products, it will take longer to debug.

An executive of a private cloud company said that no one would dare to risk spending $300 million to deploy 10,000 AMD GPUs experimentally.

The executive believes that the development and debugging cycle may take at least two months.

Against the background of rapid replacement of AI products, a two-month gap may be fatal for any manufacturer.

9316058908e30e8dc187ecbce97ad913.png

However, Microsoft extended an olive branch to AMD.

Previously, there were rumors that Microsoft was preparing to jointly develop an AI chip code-named "Athena" with AMD.

Earlier, when MI200 was released, Microsoft was the first to announce the purchase and deploy it on its cloud platform Azure.

For example, MSRA's new large model infrastructure RetNet was trained on 512 AMD MI200s a while ago .

91098a3556b13cf39904a5b95ba22553.png

Under the situation that Nvidia occupies almost the entire AI market, someone may need to take the lead in the charge, and the entire large-scale AMD computing power cluster must be prototyped before anyone dares to follow up.

However, in a short period of time, Nvidia H100 and A100 are still the most mainstream choices.

One More Thing

A while ago, when Apple released the new M2 Ultra chip that supports up to 192GB of memory , many practitioners enjoyed using it to fine-tune large models.

After all, the memory and video memory of Apple's M series chips are unified. 192GB of memory is 192GB of video memory , which is 2.4 times that of 80GB H100, or 8 times that of 24GB RTX4090.

d93855b66f3b52dee0a5d8e6ef5ca14c.png

However, after someone really bought this machine, the actual test and training speed is not as good as Nvidia RTX3080TI , fine-tuning is not cost-effective, let alone training.

After all, the computing power of the M-series chips is not specifically optimized for AI computing, and Everbright video memory is useless.

It seems that it mainly depends on H100 to refine the large model, and H100 is something you can't ask for.

Faced with this situation, there is even a magical "GPU song" circulating on the Internet .

Very brainwashing, enter with caution.

The song of GPU
https://www.youtube.com/watch?v=YGpnXANXGUg

参考链接:
[1]https://www.barrons.com/articles/nvidia-ai-chips-coreweave-cloud-6db44825
[2]https://www.ft.com/content/9dfee156-4870-4ca4-b67d-bb5a285d855c
[3]https://www.theinformation.com/articles/in-an-unusual-move-nvidia-wants-to-know-its-customers-customers
[4]https://www.theinformation.com/articles/ceo-jensen-huang-runs-nvidia-with-a-strong-hand
[5]https://gpus.llm-utils.org/nvidia-h100-gpus-supply-and-demand/#which-gpus-do-people-need
[6]https://3dfabric.tsmc.com/english/dedicatedFoundry/technology/cowos.htm
[7]https://developer.nvidia.com/blog/cuda-10-features-revealed/
[8]https://www.theverge.com/2023/5/5/23712242/microsoft-amd-ai-processor-chip-nvidia-gpu-athena-mi300
[9]https://www.amd.com/en/press-releases/2022-05-26-amd-instinct-mi200-adopted-for-large-scale-ai-training-microsoft-azure

 
  

Click to enter —> [Computer Vision and Paper Submission] Exchange Group

ICCV/CVPR 2023 Paper and Code Download

 
  

Background reply: CVPR2023, you can download the collection of CVPR 2023 papers and code open source papers

后台回复:ICCV2023,即可下载ICCV 2023论文和代码开源的论文合集
目标检测和Transformer交流群成立
扫描下方二维码,或者添加微信:CVer333,即可添加CVer小助手微信,便可申请加入CVer-目标检测或者Transformer 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer等。
一定要备注:研究方向+地点+学校/公司+昵称(如目标检测或者Transformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群

▲扫码或加微信号: CVer333,进交流群
CVer计算机视觉(知识星球)来了!想要了解最新最快最好的CV/DL/AI论文速递、优质实战项目、AI行业前沿、从入门到精通学习教程等资料,欢迎扫描下方二维码,加入CVer计算机视觉,已汇集数千人!

▲扫码进星球
▲点击上方卡片,关注CVer公众号

It's not easy to organize, please like and watch207181de16a31210242b48dc322c8eba.gif

Guess you like

Origin blog.csdn.net/amusi1994/article/details/132255925