Transformer is going to take over the earth, the most comprehensive inventory of large language models!

As ChatGPT has become popular around the world in the past six months, large language models (LLM) based on the Transformer architecture have gradually come into public view. It can be said that the influence of Transformer in the field of AI is no less than that of Transformers in the field of science fiction.

The core idea of ​​Transformer is to use the Self-Attention Mechanism to establish dependencies between sequences. Just 2 years ago, many models were mainly based on long short-term memory (LSTM) and other variants of recurrent neural networks (RNN), but now large language models are based on the attention mechanism of Transformer. The AI ​​field is developing rapidly from traditional machine learning to neural networks to today's Transformer.

Artificial intelligence development direction

At present, the large language model market is full of flowers (xuè) (yǔ) (xīng) (fēng), so we have compiled an inventory of large language models that may be the most comprehensive in the entire network, in the hope that everyone can grasp the pulse of the AIGC era.

After reading and collecting this article, you will learn:

Global language model development context and genealogy matrix

The iteration process of large language models in the two camps of Google and Microsoft

Inventory of major global and domestic language models

Development history of global large language models

The figure below shows the timeline for the release of large language models with a scale of tens of billions of parameters since 2019. The large models marked in yellow have been open sourced. It can be seen that new models have emerged in an endless stream since 2022, and the iteration speed of OpenAI and Google's large models is significantly higher than that of other manufacturers.

Development trends of large language models

Global large language model family tree matrix

The following table shows the family tree of the main large language models, with different colors representing different technical origins. The horizontal axis is the timeline, and the vertical axis is the parameter scale of model training. Since 2018, the scale of large language model training has continued to expand, and 2022 will also be an explosive year in terms of parameter scale.

Large language model parameter scale quadrant

Large language model technology roadmap genealogy relationship

Giants Confrontation: The Race between Google and Microsoft Continues to Escalate

In November 22, OpenAI released ChatGPT, a new conversational AI model based on the GPT-3.5 series. This iterative upgrade has cross-epochal significance; in February this year, Microsoft integrated ChatGPT into Bing to redefine the search engine; in March, it launched a multi-modal large-scale language model GPT-4 was released, showing stronger capabilities in "understanding + creation".

Faced with the GPT series launched by OpenAI, Google is following closely. In February and March this year, it launched Bard, which benchmarks ChatGPT, and PaLM-E, the largest multimodal embodied visual language model in history; Google officially launched on May 11 "Counterattack", released the large language model PaLM2 to directly address the pain points of GPT-4, and at the same time integrated AI in more than 25 applications.

Microsoft vs. Google releases upgrades

Large language model training data source

Through the training data sources of large language models, we can find that these models are mainly trained by crawling web page data. GPT-3 also adds some book information on the basis of web pages. What’s interesting is that the AlphaCode training data sources developed by DeepMind are all codes, which can be inferred to have strong capabilities in programming. It is understood that AlphaCode participated in 10 programming competitions held by Codeforces in 2022, ranking in the top 54.3%, defeating 46% of the contestants, and having an Elo score of 1238.

Different large language model training data sources

Large language model training hardware resources

Large-scale language training consumes huge amounts of hardware resources. In addition to the earliest use of GPUs as training chips, many large language models have now begun to use TPUs as the main training chips. On the one hand, the rapid development of hardware has undoubtedly improved the iteration efficiency of large language models; on the other hand, the fierce competition in large language models has also led to a sharp increase in the price of hardware, mainly chips and servers. According to Jiemian News, the price of NVIDIA's AI flagship chip H100 has been speculated to US$40,000 through multiple channels, which is a significant increase compared to the previous price quoted by retailers of US$36,000. 10,000 NVIDIA A100 chips are the computing power threshold for developing large language models. .

Comparison of hardware resources for large language model training

Inventory of major global language models

From a global perspective, the main publishers of large language models include Google, OpenAI, Facebook, Microsoft, Deepmind and EleutherAI. The model parameter scale is mainly tens of billions and hundreds of billions, and the technical architecture is mainly Encoder-Decoder. The number of models listed in the table below is close to 100, but it should be much more.

Comparison of major global language models

Inventory of major global language models

Inventory of domestic large language models

Of course, the fire of large language models has also ignited the enthusiasm of domestic technology companies for large language models. Based on the early self-research or open source models, many domestic institutions have launched large language models. According to incomplete statistics, there are more than 20 companies.

Guess you like

Origin blog.csdn.net/mockuai_com/article/details/131660688