Occupy the global computing power! Google Gemini was revealed to have five times the computing power of GPT-4, holding the TPU trump card to crush OpenAI

Source | Xinzhiyuan

Today, the famous SemiAnalysis analysts Dylan Patel and Daniel Nishball came to break the inside story of the industry again.

And the entire AI community was once again shocked by this news: Compared with Google, OpenAI's computing power can only be said to be pediatrics——

Google's next-generation large-scale model Gemini has 5 times the computing power of GPT-4!

Large model research test portal

GPT-4 Portal (free of wall, can be tested directly, if you encounter browser warning point advanced/continue to visit):
Hello, GPT4!

According to Patel and Nishball, Google Gemini, which has been repeatedly revealed to be the killer of GPT-4, has started training on the new TPUv5 Pod, with a computing power of up to ~1e26 FLOPS, which is higher than the computing power of training GPT-4 5 times bigger.

Today, with TPUv5, Google has become the king of computing power. The number of TPUv5 in its hands is more than the sum of GPUs owned by OpenAI, Meta, CoreWeave, Oracle and Amazon!

Although TPUv5 is not as good as Nvidia's H100 in terms of single-chip performance, Google's most terrifying advantage is that they have efficient and huge infrastructure.

Unexpectedly, this revelation attracted Sam Altman's onlookers, and said, "It's unbelievable that Google let that guy named semianalysis publish their internal marketing/recruitment charts, it's so funny."

Some netizens said that this is only a commentary article, not actual news, but pure speculation.

However, the two manuscripts that Dylan Patel participated in before were all confirmed without exception, and caused an uproar in the industry. Whether it's Google's internal document leaks ("We don't have a moat, and neither does OpenAI")—

Google DeepMind CEO Demis Hassabis confirms the authenticity of Google's moat in an interview

It is still a big leak of insider information such as the structure and parameters of GPT-4——

Let's take a closer look at how many heavy inside information this breaking article will bring.

The sleeping giant Google has woken up

Noam Shazeer, one of the authors of Transformer's pioneering work "Attention is all you need", a key participant of LaMDA and PaLM, wrote an article inspired by the MEENA model.

In this article, he accurately predicted the changes that the birth of ChatGPT will bring to the world——LLM will become more and more integrated into our lives and devour global computing power.

This article was well ahead of its time, but was ignored by policy makers at Google.

Paper address:
https://arxiv.org/pdf/2001.09977.pdf

Now, Google has all the keys to the kingdom of computing power. The sleeping giant has woken up, and their forward iteration speed is unstoppable. By the end of 2023, Google's computing power will reach five times the GPT-4 pre-training FLOPS.

And considering Google's current infrastructure, by the end of next year, this number may soar to 100 times.

Will Google continue to cultivate on this road without cutting creativity or changing the existing business model? No one knows at this time.

"GPU rich" and "GPU poor"

Now, companies holding Nvidia GPUs can be said to have the hardest currency.

Giants or star start-ups such as OpenAI, Google, Anthropic, Inflection, X, and Meta have more than 200,000 A100/H100 chips in their hands. On average, each researcher is allocated a lot of computing resources.

Individual researchers, with about 100 to 1,000 GPUs, can play with the small projects at hand.

CoreWeave has mortgaged the Nvidia H100 to buy more GPUs

By the end of 2024, the total number of GPUs may reach 100,000.

Now in Silicon Valley, the most proud talk of top machine learning researchers is to brag about how many GPUs they have or will soon have.

In the past 4 months, this trend has become more and more popular, so that this competition has been put on the bright side-whoever has more GPUs, the big cow researchers will go there.

Meta has directly used "having the second most H100 GPUs in the world" as a recruitment strategy.

Meanwhile, countless small startups and open source researchers are struggling with a shortage of GPUs.

Because there is no GPU with enough virtual memory, they can only waste their time and invest a lot of time and energy to do something that doesn't matter.

They can only fine-tune small models of some leaderboard-style benchmarks on larger models, and the evaluation methods of these models are also very fragmented, with more emphasis on style than accuracy and usefulness.

They also don't know that small open-source models can improve on real workloads only with larger, higher-quality pre-training datasets and IFT data.

"Who will get how much H100 and when will get H100 are the top gossip in Silicon Valley now." OpenAI co-founder Andrej Karpathy once said with emotion

Yes, efficient use of the GPU is important, something many GPU poor people ignore. They don't care about efficiencies at scale, and they don't use their time effectively.

By next year, the world will be flooded with 3.5 million H100s, and these GPU poor people will be completely cut off from commercialization. They can only use the game in their hands to learn and do experiments with the GPU.

Most of the GPU poor are still using dense models, because that's what Meta's Llama series of models provide.

Most open source projects would be even worse if it weren't for Zuckerberg's generosity.

If they really care about efficiency, especially client-side efficiency, they will choose a sparse model architecture like MoE, and train on a larger data set, and like cutting-edge LLM labs like OpenAI, Anthropic, Google DeepMind, Use speculative decoding.

This diagram assumes that the inability to fuse each operation, the memory bandwidth required by the attention mechanism, and the hardware overhead equivalent to parameter reads lead to inefficiencies. In fact, even with an optimized library, such as Nvidia's FasterTransformer library, the total overhead can be even greater

Disadvantaged companies should focus on improving model performance or mitigating token-to-token latency, increasing compute and memory capacity requirements, and reducing memory bandwidth, which are required for edge effects.

They should focus on efficiently serving multiple fine-tuned models on a shared infrastructure without paying the horrible cost penalty for low-batch models.

However, the reality is quite the opposite, they focus too much on memory capacity constraints or too much quantization, and turn a blind eye to the actual degradation of the model's quality.

In general, the current ranking list of large models is completely messed up.

While there are still many people in the closed source community working hard to improve this, this kind of open benchmark is meaningless.

For some reason, people have a morbid obsession with LLM leaderboards and a bunch of stupid names for useless models like Platypus and the like.

In the future, I hope that open source work will shift to evaluation, speculative decoding, MoE, open IFT data, and use more than 10 trillion tokens to clean pre-training data. Otherwise, the open source community will simply not be able to compete with commercial giants.

Now, the world map in the big model battle is already clear: the US and China will continue to lead, while Europe is already significantly behind due to the lack of large investment and GPU shortage, even with the government-backed supercomputer Jules Verne will not help . Many Middle Eastern countries are also increasing investment in building large-scale infrastructure for AI.

Of course, it's not just a few small start-ups that lack GPUs.

Even the most well-known AI companies like HuggingFace, Databricks (MosaicML), and Together still belong to the "GPU poor group".

In fact, just looking at the world's top-level researchers corresponding to each GPU, or the potential customers corresponding to each GPU, they may be the most GPU-deficient group in the world.

While having world-class researchers, all are only able to work on systems orders of magnitude less capable.

Although they got a lot of financing and bought thousands of H100s, it was not enough for them to grab most of the market.

All your computing power is bought from competing products

Among the various supercomputers in-house, Nvidia has several times more GPUs than anyone else.

Among them, DGX Cloud provides pre-training models, data processing frameworks, vector databases and personalization, optimized inference engines, APIs, and support from NVIDIA experts to help enterprises customize use cases and adjust models.

Today, the service has also attracted several large players from verticals such as SaaS, insurance, manufacturing, pharmaceuticals, productivity software, and automotive.

Even without counting the undisclosed partners, just by Amgen, Adobe, CCC, ServiceNow, Accenture, AstraZeneca, Getty Images, Shutterstock, The list, which is much longer than other competitors, is made up of giants such as Morningstar, Evozyne, Insilico Medicine, Quantiphi, InstaDeep, Oxford Nanopore, Peptone, Relation Therapeutics, ALCHEMAB Therapeutics, and Runway, which are already Shocked enough.

Given cloud spending and the scale of in-house supercomputer builds, it appears companies are buying more from Nvidia than HuggingFace, Together, and Databricks combined.

As one of the most influential companies in the industry, HuggingFace needs to use this to obtain huge investment, build more models, customization and reasoning capabilities. But in the most recent funding round, they didn't get the amount they needed due to a high valuation.

Databricks can catch up with data and enterprise relationships. But the problem is that if you want to serve more than 7,000 customers, you have to multiply your expenses many times over.

Unfortunately, Databricks cannot use stock to purchase GPUs. They need to raise massive amounts of money through their upcoming private placement/IPO and further use that cash to double down on investing in hardware.

It's kind of weird from an economic point of view, because they have to build before they can get customers, and Nvidia is also spending a lot of money on their services. However, this is also a prerequisite for participating in the competition.

The key here is that Databricks, HuggingFace, and Together are significantly behind their main competitors, which happen to be the source of nearly all of their computing resources.

That said, virtually everyone from Meta to Microsoft to startups is just filling Nvidia's bank account.

So, can someone save us from Nvidia slavery?

Yes, there is a potential savior — Google.

At the top of Google's computing power, OpenAI is less than half

While GPUs are also being used internally, Google has other "trump cards" in its hands.

Among them, what the industry is most looking forward to is that Google's next-generation large-scale model Gemini, as well as the next iterative version being trained, have been blessed by Google's unparalleled and efficient infrastructure.

As early as 2006, Google began to propose the idea of ​​building an artificial intelligence-specific infrastructure, and in 2013 this plan was brought to a climax.

They realized that if they wanted to deploy AI at scale, they would have to double the number of data centers.

Therefore, Google began to prepare for the TPU chip that can be put into production in 3 years.

The most famous project Nitro Program was launched in 2013, focusing on developing chips to optimize general-purpose CPU computing and storage. The main goal is to rethink the server's chip design to make it more suitable for Google's artificial intelligence computing workloads.

Since 2016, Google has built six different AI chips, TPU, TPUv2, TPUv3, TPUv4i, TPUv4, and TPUv5.

Google primarily designs these chips, with varying amounts of mid- and back-end collaborations with Broadcom, which are then produced by TSMC.

After TPUv2, these chips also use HBM memory from Samsung and SK Hynix.

Before introducing Gemini and Google's cloud business, the whistleblower first shared some data about Google's crazy expansion of computing power - the total number of newly added advanced chips in each quarter.

For OpenAI, the total number of GPUs they have will increase 4 times in 2 years.

And for Google, which everyone ignores, Google has TPUv4 (PuVerAsh), TPUv4 lite, and a whole family of GPUs used internally.

Also, TPUv5 lite is not counted here, although it may be the workhorse for inference on smaller language models.

Growth in the graph below, only TPUv5 (ViperAsh) is visualized.

Even if their abilities are fully affirmed, Google's computing power is enough to make everyone dumbfounded.

In fact, Google has more TPUv5s than OpenAI, Meta, CoreWeave, Oracle, and Amazon have GPUs combined.

And, Google is able to rent out a significant portion of these capabilities to various startups.

Of course, in terms of the performance of each chip, there is a significant gap between TPUv5 and H100.

That aside, OpenAI's computing power is a fraction of Google's. At the same time, the construction of TPUv5 can greatly improve the training and reasoning capabilities.

In addition, Gemini, a multi-modal large model of Google's new architecture, has been iterating at an incredible speed.

It is said that Gemini can access multiple TPU pod clusters, specifically training on 7+7 pods.

The whistleblower said that the first-generation Gemini should be trained on TPUv4, and these pods did not integrate the maximum number of chips - 4096 chips, but used a smaller number of chips to ensure the reliability and heat dissipation of the chips. Plug and unplug.

If all 14 pods are used for about 100 days at a reasonable mask field utilization (MFU), the hardware FLOPS for training Gemini will reach more than 1e26.

For reference, the whistleblower detailed in the last "GPT-4 Architecture" article that the FLOPS of the GPT-4 model is slightly higher than 2e25.

The FLOPS utilization of the Google model is very good on TPUv4, even in large-scale training, that is, the first iteration of Gemini, which is much higher than GPT-4.

This is especially true in terms of superior model architectures, such as enhanced multimodality.

What is really astounding is the next iteration of Gemini, which has started training on TPUv5-based pods with up to ~1e26 FLOPS, which is 5 times larger than training GPT-4.

Allegedly, the first Gemini trained on TPUv5 had some issues with the data, so not sure if Google will release it.

This ~1e26 model is probably what is known publicly as Gemini.

Looking back at the chart above, this is not Google's final form. The race is on, and Google has a huge advantage.

If they can focus their efforts and put them into practice, at least in terms of computational scaling and speed of experimentation before training, they will win out.

They can have multiple clusters that are more powerful than OpenAI's most powerful clusters. Google has fumbled once, will it do it again?

At present, Google's infrastructure not only meets internal needs, cutting-edge model companies such as Anthopic and some of the world's largest companies will also access TPUv5 for internal model training and inference.

⾕Google migrated TPU to the cloud business unit and re-established business sense, which won them a decisive battle in the favor of some big companies.

In the next few months, you will see Google win. Some of these promoted companies will pay for their TPU.

Guess you like

Origin blog.csdn.net/xixiaoyaoww/article/details/132585338