GPT at your fingertips - LLaMA

Producer: Towhee Technical Team

The emergence of ChatGPT in recent months has attracted widespread attention and discussion, and its performance in many fields has surpassed human performance. It can generate human-level language, and can learn and adapt in different tasks, making people full of hope and vision for the future of artificial intelligence.

The reason why ChatGPT performs so well depends on the underlying model (GPT series) that ChatGPT relies on. At that time, the few-shot ability of GPT-3 has already begun to shock people that in-context learning can have such a strong learning ability. There are various evidences that GPT is ready for everything (a lot of knowledge has been learned), and it only owes the east wind (lack of a suitable way to prompt it out).

However, OpenAI is no longer open source for models after GPT-2. If the NLP academic community does not have a large model base that is strong enough, efficient enough, and widely recognized, it can only watch this new wave of large models. .

Although there are some open source large models (such as OPT, BLOOM), but the real large models can not be run by everyone. In fact, what everyone wants is a small and large model. Fortunately, Meta, which has been addicted to the Metaverse for a long time, has launched the LLaMA (llama) [1] series of models to fill this gap. There are four versions of this series of models (7B, 13B, 33B, 65B), and they are all trained on public data sets exceeding trillions tokens, which can be said to be born for benchmarks.

alt(Dataset used for training)

From the structure of the model, LLaMA uses the large model improvement technology of these years:

1) Use pre-normalization (GPT-3) to improve the stability of model training 2) Use SwiGLU activation function instead of ReLU (PaLM) 3) Use Rotary Embeddings to replace absolute position embeddings (GPTNeo)

Using this model, Meta used 2048 blocks of A100 to train the 65-B version of the model on 1.4T token, which took about 21 days.

alt(Comparison between LLaMA and other similar types of large models on the dataset)

接下来文章花了很多篇幅进行了各种任务上的对比,但从上表来看,作为大型语言模型,LLaMA 还是和各种不开源的大模型是处于同一个水平的,尤其是 7B 和 13B 两个模型体现出了惊人的性价比,毫无疑问的会成为作为学术圈内各种下游实验的闪亮明星。

现在斯坦福大学的 tatsu-lab 实验室现在利用了开源的 LLaMA 给出了完整的下游 finetune 方案,即 stanford alpaca(羊驼)[2]。stanford alpaca 项目使用 ChatGPT 生成了 52k 的训练数据,只 finetune 了 7B 模型,就达到了类似 ChatGPT 的效果,因为有了 ChatGPT 提供的高质量问答式监督数据,整个工作流非常的直观。配合着使用着 bitsandbytes 来进行 int8 加速 LLaMA 的项目[3],就可以端到端的完成一个个人版的 ChatGPT。

另一条线是使用更经济的 finetune 方案,就是近期也在 finetune stable-diffusion 中大红大紫的 LoRA 开发的 alpaca-lora[4],这个项目可以在一块 RTX 4090 上 finetune 几小时就可以得到一个和 stanford alpaca 相比的模型,可以说是真正的旧时王谢堂前燕,飞入寻常百姓家。已经有多个语言版本的 alpaca-lora 已经在社区中被分享。而 LoRA 对于大模型小数据的良好表现让这个方案也显示出了很大的潜力。

alt (钢铁侠与 alpaca)

LLaMA 让可以与 GPT 相比的能力可以广泛的被普及,而且社区内已经认可了使用 LLaMA 进行开发的潜力,也让经费没有工业界那么充裕的学术界可以充分参与这一次 AI 重大的机遇。随着针对 LLaMA 训练和推理效率的不断优化,也许人人都可以拥有一个像钢铁侠中的 JARVIS 一样的定制化的 AI 助理。

[1]https://github.com/facebookresearch/llama [2]https://github.com/tatsu-lab/stanford_alpaca [3]https://github.com/tloen/llama-int8 [4]https://github.com/tloen/alpaca-lora

This article is published by mdnice multi-platform

Guess you like

Origin blog.csdn.net/weixin_44839084/article/details/130001255