Strong push! The large language model "Hundreds of Treasure Books" clearly understands all large models!

picture

 The original author of Xi Xiaoyao's science and technology
 | Wang Siruo

Recently, large-scale language models are undoubtedly the focus of the AI ​​community. The large-scale models released by major technology companies and research institutions are like crucian carp crossing the river, emerging in endlessly and dazzling.

It makes the author seem to return to the first year of the domestic large-scale model "arms race" in 2020. However, at that time, the massive computing power demand of large-scale models limited this. Do instruction fine-tuning and human feedback to apply to a certain vertical field. The LLMs field is currently showing a "paradoxical" prosperity. There are nearly 16,000 text generation models in the model and data set repository Hugging Face, and the community will have several every week. Hundreds of new models were released, and Hugging Face added 100,000 models within six months from 2022/12 to 2023/6. Research institutions are afraid that they will not be able to catch up with the trend, and they all try to leave their own position in the field of large-scale models.

In any case, the spotlight of the stage has already focused on the large model. On the stage of "you sing, I will appear on the stage", the author will take a closer look at the history and context of the large language model. It is slightly biased. Friends are welcome to comment Area message supplement~

github address:
https://github.com/WangHuiNEU/llm

The large model can be divided into the base model and the fine-tuning model after instruction-tuning such as instruction fine-tuning and human feedback alignment on the base model. But in fact, as the Allen Institute article 'How Far Can Camels Go?' points out: Different instruction fine-tuning datasets can unlock or enhance specific capabilities, but no single dataset or combination can be used in all evaluations. To provide the best performance in a pedestal, we need a larger and more powerful base model.

Large model research test portal

GPT-4 capability research portal (advanced/continue to visit in case of browser warning): ( https://gpt4test.com )

In fact, it can be understood more simply that instruction fine-tuning does not add new capabilities to the model. The base model itself establishes the scope of application, and instruction fine-tuning only uses a very small amount of data to quickly stimulate a strong ability in a certain field. weak. Friends who have actually fine-tuned some large models may feel deeply about this, so a more reasonable large model story line is around the base model. The following will discuss in detail the pedestal models of Google, Meta, OpenAI and other technology companies, as well as some fine-tuning models based on the pedestal.

picture

Large model evolutionary tree

1. Google model

Google has always been the most concerned existence in the large-scale model track, but it is embarrassing that in the face of the strong attack of ChatGPT, Google, which occupies the vast majority of the search field, cannot suddenly turn around in the new search generation competition. The core search advertising business, but in the field of large models, Google has the deepest accumulation. For example, the Transformer architecture was proposed in 2017, and the Pathways architecture proposed in 2021, in my opinion, has pushed engineering optimization to the extreme.

Google originally had two research groups: Google Brain and Deepmind, but they were merged into Google Deepmind in April this year, but they are still described separately here.

1. Base model

Google Brain

model name time Is it open source parameter size
T5 2019-10 yes 13b
LaMDA 2021-05 no 137b
PaLM 2022-04 no 540B

Interestingly, LaMDA is a conversational language model developed and launched by Google as early as 2020, but Google considered security issues and refused to open it to the public. The principal researchers Daniel De Freitas and Noam Shazeer left the company in frustration [last September , These two established the LLM-based chat robot website Character.AI, and they have always insisted on their goals], Google got up early in the direction of dialogue generation, and caught up with the late episode, manual dog head~

PaLM is a super-large language model built based on Pathways, the next-generation AI architecture proposed by Google for efficient model training. With 540 billion parameters, it is currently the largest dense Transformer model, and it is also the base language model with the most comprehensive performance.

DeepMind

model name time Is it open source parameter size
Gopher 2021-12 no 280B
Chinchilla 2022-04 no 70b

Chinchilla is DeepMind rethinking the scaling laws of large models. Empirical research analysis shows that the data size is as important as the parameter size. Therefore, the Chinchilla obtained by training uses only Gopher1/4 parameters, but the performance significantly exceeds Gopher. The training data Size matters as much as training parameters!

Google DeepMind

model name time Is it open source parameter size
PaLM 2 2023-05 no 340B (Gossip, unconfirmed~)

In April, Google decided to concentrate on doing big things, and Google Brain and DeepMind merged into Google DeepMind. In May, at the Google I/O 2023 conference, Google proposed a more powerful PaLM 2, reasonable large-scale model scaling rules + diverse data sets, and PaLM 2 is undoubtedly the foundation for Google to maintain its leading position in this wave .

2. Instruction fine-tuning model

unit model name base model Is it open source
Hugging Face T0 T5 yes
Google FLAN T5 no
Google Flan-T5/Faln-PaLM T5/PaLM no
Google Bard (generates an AI chatbot) LaMDA before, PaLM 2 after no

Based on the base model, instruction-tuning can endow the model with powerful alignment capabilities. What’s interesting here is that on February 6, Google announced the launch of Bard, a conversational AI chatbot powered by LaMDA. However, when it first launched, its poor performance compared to ChatGPT once caused Google’s stock price to fall Google made improvements based on the more powerful LaMDA, but there were constant doubts both internally and externally. In May, the Google I/O conference announced that Bard had been updated based on the PaLM 2 model.

Bard is undoubtedly Google's response to ChatGPT, but search ads account for 60% of its total revenue, and Google cannot directly add search generation to browser results like Bing does. Google has chosen a different path here, making search and Bard into two complementary products. Bard has also been in the small space of bard.google.com. At present, Google plans to combine Technology developed a more powerful model Gemini.

2. Meta model

Meta is the technology company that most embraces open source among all the giants. Yann LeCun, the chief artificial intelligence scientist of the Meta AI Basic Artificial Intelligence Research Institute team, said: The only way to make the AI ​​​​platform safe, good and practical is to open source. Of course, Meta's open source model has also benefited the vast majority of large-scale model players in China, and again manual dog heads~

1. Base model

model name time Is it open source parameter size
OPT 2022-05 yes 125M-175B
LLaMA 2023-02 yes 7B-65B

The open source pioneer Meta, when OpenAI chose the closed source GPT-3, benchmarked GPT-3 and directly open sourced the OPT model with 100 billion parameters, but the performance of OPT is relatively worse than that of the GPT-3 model. Later, inspired by the scaling law discovered by DeepMind, Meta narrowed the model parameters and trained LLaMA (Large Language Model Meta AI) on a larger data set. The model with 13 billion parameters has roughly the same performance as GPT-3, and the model with 65 billion parameters has roughly the same performance as GPT-3. The performance of the model is comparable to that of Chinchilla-70B and PaLM-540B, and the large model opens the era of the camel system (LLaMA)~

2. Instruction fine-tuning model

unit model name base model Is it open source
Meta OPT-IML OPT-175B yes
Stanford Alphaca LLaMA yes
Stanford Vicuna LLaMA yes

LLaMA is undoubtedly the most commonly used as a base model for instruction fine-tuning and adaptation to professional fields such as law and medicine. Especially on July 19, Meta AI released the free commercial open source model LLaMA 2, including 7B, 13B and 70B There are three scales, and those who are quick to use have already made instruction fine-tuning with Chinese data. For example, the number of stars in Llama2-chinese has soared by 1.7k in a few days, and it’s time to fight hand speed~

picture

picture

LLaMA variant, the picture comes from "A Survey of Large Language Models

The disciples and grandchildren of LLaMA include camels (Alpaca, Vicuna), zoos (Koala, Goat, Panda), and myths (Ziya Jiang, Baize), gradually moving from zoos to myths. legend~

3. OpenAI model

1. Base model

picture

If we trace back the timeline of the development of the GPT series, we will find that this is a technical exploration spanning five years. From GPT-2 to GPT-3, in fact, it is only from 1.5 billion parameters without changing the model framework. The volume has been iterated to 175 billion. Unlike Google's horse racing mechanism that launched a series of large models such as T5, Switch Transformer, and PaLM, OpenAI 'unswervingly' adheres to the GPT route.

2. Instruction fine-tuning model

picture

In 2017, OpenAI proposed RLHF (Reinforcement Learning from Human Feedback) technology; in 2022, OpenAI applied RLHF to GPT-3 and developed InstructGPT, which is better at following user intentions than GPT-3, although the number of parameters is only 1.3B, which is more than 100 times less than the 175B GPT-3 model parameters, and the fine-tuning cost is only 2% of GPT-3.

On March 14, GPT-4 was released, and OpenAI gave a technical report and a 3-minute trailer. GPT-4 supports multi-modality, can recognize pictures, generate lyrics, and make websites, and has maxed out exams in various fields of human society, and has reached the level of top universities such as Harvard and Stanford. Now integrated into Microsoft New Bing and ChatGPT Plus.

Microsoft Microsoft 365 fully introduces the generative AI assistant Copilot, and integrates GPT-4 into applications such as Word, Excel, PowerPoint, Outlook, and Teams. Users can ask questions and prompt AI to write drafts, make presentations, edit emails, and make presentations Manuscripts, wrap-up meetings, etc.

4. Large models built by open source communities, research institutes and some technology companies

1. Base model

In order to break the monopoly of OpenAI and Microsoft on natural language processing AI models, former OpenAI research vice president Dario Amodei led a group of employees who left OpenAI to found Anthropic, an artificial intelligence security and research company dedicated to improving AI security and explainability.

Connor Leahy, Leo Gao, and Sid Black founded EleutherAI, an organization focused on AI alignment, scaling, and open source AI research.

Afterwards, the Hugging Face community took the lead in establishing the BigScience project, which is an inclusive, open, collaborative and shared large language model (LLM) community. It is an open collaborative workshop around the research and creation of very large language models, initiated by HuggingFace, GENCI and IDRIS. An open collaboration that brings together more than 1,000 researchers from around the world.

Domestic companies including Beijing Zhiyuan Artificial Intelligence Research Institute BAAI, Tsinghua University and Baidu have also created their own base models.

organize model name time Is it open source parameter size
Anthropic Anthropic-LM v4-s3 2021-12 no 52b
Beijing Zhiyuan Artificial Intelligence Research Institute Skyhawk Aquila 2023-06 yes 7B/33B
Baidu ERNIE 3.0 2021-12 no 260b
Tsinghua University GLM 2022-8 yes 130B
EleutherAI GPT-Neo 2021-03 yes 2.7b
EleutherAI GPT-J 2021-06 yes 6b
EleutherAI GPT-NeoX 2022-04 yes 20B
BigScience BLOOM 2022-11 yes 176B

2. Instruction fine-tuning model

unit model name base model Is it open source
Beijing Zhiyuan Artificial Intelligence Research Institute AquilaChat-7B Aquila-7b yes
Beijing Zhiyuan Artificial Intelligence Research Institute AquilaChat-33B Aquila-33B yes
BigScience BLOOMZ BLOOM yes
EleutherAI GPT-NeoX GPT-Neo yes
Baidu Wenxinyiyan ERNIE 3.0 no
Anthropic Claude  2 Anthropic-LM v4-s3 no

Summarize

This article summarizes the mainstream base model and the corresponding instruction fine-tuning model. I hope that the friends in the community will discuss more and work together to build a more powerful language model for the Chinese community~

 

Guess you like

Origin blog.csdn.net/xixiaoyaoww/article/details/132017299