Summary of popular large language models (LLMs) in 2023

Large Models (LLMs) are artificial intelligence models designed to understand and generate human language.

By training on large amounts of text data, they can perform a wide range of tasks, including text summarization, translation, sentiment analysis, and more. These models are often based on deep learning architectures such as transformers, which allow them to demonstrate impressive capabilities on a variety of natural language processing tasks.

Remarkable achievements have been made in the field of large models both at home and abroad. Enterprises, institutions and academia in various countries and regions are actively investing resources and efforts to promote the development of large model technology.

For example, abroad, OpenAI launched ChatGPT, a large-scale language model based on GPT-3.5. Due to its excellent performance, ChatGPT and the large-scale language model behind it quickly became a hot topic in the field of artificial intelligence, attracting the attention of a large number of scientific researchers and developers. Pay attention and participate.

In China, as of August 31, 2023, a number of large model companies and institutions have officially announced that their services have been online and open to the whole society. Currently, the large models of eight companies and institutions including Baidu, Zhipu, Baichuan, Byte, SenseTime, and Chinese Academy of Sciences (Zidong Taichu) are among the first batch of registration lists, and they can be officially launched and provide services to the public.

In order to allow everyone to see the development of the large model field more intuitively, we have compiled the top large models at home and abroad for your reference and use.

Summary of foreign large models

Open AI

ChatGPT

ChatGPT is an open source chatbot powered by the GPT-3 language model. It can communicate with users in natural language conversations. ChatGPT is trained on a wide range of topics and can help with tasks ranging from answering questions, providing information, and generating creative content. It is designed to be friendly and helpful and can adapt to different conversation styles and contexts. With ChatGPT you can have interesting and informative conversations on a variety of topics including the latest news, current affairs, hobbies and personal interests.

Paper: https://www.aminer.cn/pub/5ed0e04291e011915d9e43ee

GPT-4

In March 2023, OpenAI released the multi-modal pre-trained large model GPT-4, which can accept image and text input and output correct text responses. Experiments show that GPT-4 performs at human level on a variety of professional tests and academic benchmarks. For example, it passed the Mock Bar Exam with a score in the top 10% of test takers; by comparison, GPT-3.5 scored in the bottom 10%.

Paper: https://www.aminer.cn/pub/641130e378d68457a4a2986f

2

Google

LaMDA

LaMDA is a series of Transformer-based models specifically designed for dialogue. These models have up to 137 billion parameters and are trained using 1.56 trillion public conversation data. LaMDA enables free-flowing conversations on a variety of topics. Unlike traditional chatbots, it is not restricted by predefined paths and can adapt adaptively based on the direction of the conversation.

Paper: https://www.aminer.cn/pub/61ea249b5244ab9dcbabc7ac

PaLM

PaLM is a language model with 540 billion parameters capable of handling a variety of tasks, including complex learning and reasoning. It outperforms state-of-the-art language models and humans in language and reasoning tests. The PaLM system uses a few-shot learning method, which can generalize from a small amount of data and approximately simulate the way humans learn and apply knowledge to solve new problems.

Paper: https://www.aminer.cn/pub/624d050e5aee126c0f4a7920

mT5

Multilingual T5 (mT5) is a text-to-text Transformer model consisting of 13 billion parameters. It is trained on the mC4 corpus, covering 101 languages ​​such as Amharic, Basque, Xhosa, Zulu, etc. mT5 is capable of achieving state-of-the-art performance levels on many cross-language natural language processing tasks.

Paper: https://www.aminer.cn/pub/5f92ba5191e011edb3573ba5

3

Deepmind

Gopher

DeepMind's language model Gopher is more accurate than existing large language models at tasks such as answering questions about professional topics such as science and humanities, and is comparable to them at other tasks such as logical reasoning and mathematics. Gopher has 280 billion parameters to tweak, making it larger than OpenAI’s GPT-3, which only has 175 billion parameters.

Paper: https://www.aminer.cn/pub/61b2c0246750f848a14300ff

Chinchilla

Chinchilla uses the same computational budget as Gopher, but only 70 billion parameters and four times the data. It outperforms models such as Gopher, GPT-3, Jurassic-1, and Megatron-Turing NLG on many downstream evaluation tasks. It uses significantly less computing resources for fine-tuning and inference, greatly facilitating usage in downstream applications.

Paper: https://www.aminer.cn/pub/63a413f690e50fcafd6d190a

Sparrow

Sparrow is a chatbot developed by DeepMind that is designed to answer users' questions correctly while reducing the risk of unsafe and inappropriate responses. Sparrow is motivated by solving the problem of language models producing incorrect, biased, or potentially harmful output. Sparrow is trained by using human judgment, making it more helpful, correct and less harmful than baseline pre-trained language models.

Paper: https://www.aminer.cn/pub/63365e7c90e50fcafd1a2bdd

4

Anthropic

Claude

Claude is an AI-based conversational assistant powered by advanced natural language processing. It aims to be a helpful, harmless and honest helper. It is trained using a technology called Constitutional AI. During the training process, it is restricted and rewarded through model self-supervision and other AI safety methods to exhibit the behavioral characteristics mentioned previously.

Paper: https://www.aminer.cn/pub/63a1750c90e50fcafd1f38d7

5

Meta

OPT-IML

OPT-IML is a pre-trained language model based on Meta's OPT model with 175 billion parameters. OPT-IML is fine-tuned for better performance in natural language tasks such as question answering, text summarization, and translation, and is trained using approximately 2,000 natural language tasks. It is more efficient during training and has lower CO₂ emissions than OpenAI’s GPT-3.

Paper: https://www.aminer.cn/pub/63a910a290e50fcafd2a84fd

BlenderBot-3

BlenderBot 3 is a conversational agent that can interact with people and receive feedback to improve conversational abilities. BlenderBot 3 is built on Meta AI’s publicly available OPT-175B language model, which is approximately 58 times the size of its predecessor, BlenderBot 2. The model incorporates conversational skills such as personality, empathy, and knowledge and conducts meaningful conversations by leveraging long-term memory and searching the Internet.

Paper: https://www.aminer.cn/pub/62f07ec290e50fcafde5ac5e
6

AI21 Labs

Jurassic

Jurassic-1 is a developer platform launched by AI21 Labs, providing the most advanced language model for building applications and services. It offers two models, including a Jumbo version, which is the largest and most complex general-purpose language model released to date. These models are flexible and capable of generating human-like text and solving complex tasks such as question answering and text classification.

Paper: https://www.aminer.cn/pub/62620f1c5aee126c0f686cf5

7

NVIDIA

Megatron-Turing NLG

The Megatron-Turing Natural Language Generation (MT-NLG) model is a Transformer-based language model with 530 billion parameters, making it the largest and most powerful model of its kind. It surpasses previous state-of-the-art models in zero-, one-, and few-shot settings and demonstrates unrivaled accuracy in completing natural language tasks such as prediction, commonsense reasoning, reading comprehension, natural language reasoning, and word sense disambiguation.

Paper: https://www.aminer.cn/pub/61f753205aee126c0f9c2149

Summary of domestic large models

Baidu

Ernie 3.0 Titan

Jointly released by Baidu and Pengcheng Labs, it has 260B parameters and is good at natural language understanding and generation. It was trained on massive unstructured data and achieved top-notch results in more than 60 NLP tasks such as machine reading comprehension, text classification and semantic similarity. Additionally, Titan performs well on 30 few-shot and zero-shot benchmarks, demonstrating its ability to generalize across a variety of downstream tasks using small amounts of labeled data.

Paper: https://www.aminer.cn/pub/61c53a815244ab9dcbcaf3b5

Ernie Bot

The internal testing of the "Ernie Bot" project was completed in March. Ernie Bot is an artificial intelligence language model, similar to OpenAI's ChatGPT, capable of language understanding, language generation and text-to-image generation. The technology is part of a global race to develop generative artificial intelligence.

Paper: https://www.aminer.cn/pub/60e441e0dfae54001623c105

Wisdom spectrum AI

GLM

A general pre-training framework based on autoregressive fill-in-the-blank. By learning bidirectional and unidirectional attention mechanisms simultaneously in a unified framework, the model simultaneously learns context representation and autoregressive generation in the pre-training stage. In the fine-tuning phase for downstream tasks, different types of downstream tasks are unified through cloze form, thereby achieving a common pre-training model for all natural language processing tasks.

Paper: https://www.aminer.cn/pub/622819cdd18a2b26c7ab496a

GLM-130B

GLM-130B is an open source and open bilingual (Chinese and English) bidirectional dense model with 130 billion parameters. The model architecture adopts the General Language Model (GLM). It is designed to support inference of models with hundreds of billions of parameters on one A100 (40G * 8) or V100 (32G * 8) server. Under the INT4 quantization scheme, GLM-130B can perform efficient inference on RTX 3090 (24G * 4) or GTX 1080 Ti (11G * 8) servers with almost no loss in model performance.

Paper: https://www.aminer.cn/pub/633e476890e50fcafde59595

ChatGLM-6B

ChatGLM-6B is an open source conversational language model that supports bilingual question answering in Chinese and English and is optimized for Chinese. The model is based on the General Language Model (GLM) architecture and has 6.2 billion parameters. Combined with model quantization technology, users can deploy it locally on consumer-grade graphics cards (a minimum of 6GB of video memory is required at the INT4 quantization level). ChatGLM-6B uses the same technology as ChatGLM and is optimized for Chinese question and answer and dialogue. After bilingual training in Chinese and English with about 1T identifiers, supplemented by supervised fine-tuning, feedback self-service, human feedback reinforcement learning and other technologies, the 6.2 billion parameter ChatGLM-6B, although not as large as the 100 billion model, has greatly reduced the inference cost and improved It has improved efficiency and can already generate answers that are quite consistent with human preferences.

Huawei

PanGu-Alpha

Huawei has developed a Chinese model equivalent to OpenAI's GPT-3 called PanGu-Alpha. The model is based on 1.1 TB of Chinese resources, including books, news, social media and web pages, and contains more than 200 billion parameters, 25 million more than GPT-3. PanGu-Alpha can efficiently complete a variety of language tasks, such as text summarization, question answering, and dialogue generation.

Paper: https://www.aminer.cn/pub/6087f2ff91e011e25a316d31

Ali

M6

In June 2021, Alibaba and Tsinghua University published a new study, proposing a Chinese pre-training model M6 with a parameter scale of 100 billion, which was the largest Chinese multi-modal pre-training model at the time. M6's applications are suitable for a wide range of tasks, including product description generation, visual question and answer, question answering, Chinese poetry generation, etc. Experimental results show that M6 outperforms a series of powerful benchmarks. Moreover, the researchers also specifically designed text-guided image generation tasks and demonstrated that the fine-tuned M6 can create high-quality images with high resolution and rich details.

Paper: https://www.aminer.cn/pub/60c320b19e795e9243fd1672

Tongyi Qianwen

In April 2023, Alibaba released "Tongyi Qianwen", a very large-scale language model with functions such as multi-round dialogue, copywriting creation, logical reasoning, multi-modal understanding, and multi-language support.

Just a few days ago, Alibaba once again launched a language model based on the Tongyi Qianwen 7 billion parameter model Qwen-7B: Qwen-VL, which supports image and text input and has multi-modal information understanding capabilities. In addition to basic image and text recognition, description, question and answer and dialogue capabilities, it also has new capabilities such as visual positioning and understanding of text in images.

Paper: https://www.aminer.cn/pub/64e826d63fda6d7f06c3150c

Shangtang

daily new

In April 2023, SenseTime launched the large-scale model "RiRixin", including the natural language processing model "Consultation", the Vincent graph model "Miahua" and the digital human video generation platform "Ruying". This is also another ChatGPT-like product from a major domestic manufacturer after Baidu Wenxinyiyan and Alibaba Tongyi Qianwen.

Recently, the Shangtang large model team also proposed the Vincentian large model RAPHAEL. Please see the paper for details.

Paper: https://www.aminer.cn/pub/647572e0d68f896efa7b79ab

In addition to the above models, domestic models include Baichuan Intelligent Model, Douyin's Skylark large model, Chinese Academy of Sciences' "Zidong Taichu" model, Shanghai Artificial Intelligence Laboratory's Scholar large model, MiniMax's ABAB large model, etc.

In 2023, new models continue to emerge at home and abroad, and we have witnessed the explosive growth of large models. As large models continue to evolve and be optimized, we can expect their performance in fields such as natural language processing, image recognition, and speech recognition to continue to improve and even surpass human levels.

This will promote the widespread application of artificial intelligence technology in various industries, from medical to finance, from transportation to education, and large models will become the core of smart devices and services. Our lives will become more intelligent, convenient and personalized.

Of course, the future development of large models also faces some challenges and issues, such as privacy and security. However, with the advancement of technology and the expansion of applications, these problems will gradually be solved and overcome.

All in all, time will tell!

How to use ChatPaper?

The method of using ChatPaper is very simple. Open the AMiner homepage and enter the ChatPaper page from the navigation bar at the top of the page or the lower right corner.
Insert image description here

In the ChatPaper page, you can choose to have a conversation based on a single document or a conversation based on the entire database (personal document database). You can choose to upload a local PDF or directly search for documents on AMiner.

Guess you like

Origin blog.csdn.net/AI_Conf/article/details/132624006