[AI hotspot technology] ChatGPT open source alternative - "Alpaca Family" of LLaMA series


Now if you ask what is the hottest, many people's first reaction must be ChatGPT. Indeed, since the beginning of 2023, the fire of AIGC, ChatGPT has been called a hot topic. So besides ChatGPT, are there other similar large language models? This article starts with an accidental LLaMA leak and introduces the biggest spark of innovation in the field of open source LLM.

LLaMA
Corresponding to OpenAI's launch of ChatGPT, Meta AI (formerly Facebook) also launched its own large language model LLaMA. It has different versions, including parameters of 7B, 13B, 33B, and 65B, and although it is smaller than GPT-3, it can match the performance of GPT-3 on many tasks. However, an accidental leak led to the biggest spark of innovation in the LLM field.

In a short period of time, a series of innovative applications based on LLaMA were born, such as Alpaca, Vicuna, Koala, ChatLLaMA, FreedomGPT, ColossalChat... . They are collectively known as the "alpaca family".

1. Alpaca

Alpaca
Alpaca is a brand new model fine-tuned by Meta's LLaMA 7B. It only uses 52k data and its performance is about equal to GPT-3.5. More importantly, the training cost is extremely low, less than 600 US dollars.
Address: https://crfm.stanford.edu/2023/03/13/alpaca.html

Introduction GPT-3.5 (text-davinci-003), ChatGPT, Claude, and Bing Chat's instruction-following models are getting more powerful. Many users now interact with these models on a regular basis and even use them at work. However, despite their widespread deployment, instruction-following models have many shortcomings: they generate disinformation, propagate social stereotypes, and generate toxic language.

To make the most progress in addressing these pressing issues, the involvement of academia is critical. Unfortunately, researching instruction-following models in academia has been difficult because none of the easily accessible models come close in functionality to closed-source models like OpenAI's text-davinci-003.

Stanford University published the results of an instruction-following language model called Alpaca that was fine-tuned from Meta's LLaMA 7B model. The author team trained the Alpaca model on 52K instructions and subsequently used text-davinci-003 to generate demos in a self-taught style. On the self-guided evaluation set, Alpaca exhibits many behaviors similar to OpenAI's text-davinci-003, but it is also surprisingly small and easy/cheap to reproduce.

The author team is publishing our training recipe and data, and intends to release model weights in the future. They also hosted an interactive demonstration that allowed the research community to better understand Alpaca's behavior. Interactions can expose unexpected functionality and failures, which will guide the authors' team's evaluation of these models later. Users are also encouraged to report any relevant behaviors in our web demos so they can be better understood and mitigated. Since any version has risks, the thought process for this open version will be discussed later in this article.

The author emphasizes that Alpaca is for academic research only and any commercial use is prohibited. There were three factors in this decision: First, Alpaca is based on LLaMA, which has a non-commercial license, so this decision had to be inherited. Second, the instruction data is based on OpenAI's text-davinci-003, whose terms of use prohibit the development of models that compete with OpenAI. In the end, there were not enough security measures designed in, so the Alpaca is not ready for general use.

The figure below illustrates how the author's team obtained the Alpaca model. For data, build self-teaching methods to generate demonstrations that follow instructions. Start with 175 human-written instruction-output pairs from the self-instruct seed set. Then, prompt Text-Davinci-003 to use the seed set as text examples to generate more instructions. Improved self-teaching by simplifying the build pipeline (see details in GitHub) and greatly reducing costs. The data generation process produced 52K unique descriptions and corresponding outputs, which cost less than $500 using the OpenAI API.
The Birth Process of the Alpaca Model

2. Vicuna

In late March 2023, researchers from UC Berkeley, Carnegie Mellon, Stanford, and UC San Diego open sourced Vicuna, a fine-tuned version of LLaMA that matches the performance of GPT-4.
Vicuna
Introduction The authors present Vicuna-13B, an open-source chatbot trained by micro-Llama in user-shared conversations collected from ShareGpt. Preliminary evaluations using GPT-4 as a judge show that Vicuna-13b outperforms other models such as Llama and Stanford Alpaca in more than 90% of the cases in Openai Chatgpt and Google bard quality over 90%. The cost of training Vicuna-13b is about $300. Code and weights, as well as an online demo , are open for non-commercial use.
Chat with Open Large Language Models
Demo
图2 Workflow Overview
The diagram above outlines the work of the author team. First, about 70,000 conversations were collected from ShareGPT.com, a website where users can share their ChatGPT conversations. Next, the training script provided by Alpaca was enhanced to better handle multiple turns of dialogue and long sequences. Training is done using PyTorch FSDP on 8 A100 GPUs in one day. In order to provide demonstration services, the author implemented a lightweight distributed service system. An initial assessment of model quality was performed by creating a set of 80 different questions and using GPT-4 to judge the model output. To compare two different models, the output of each model is combined into a single prompt for each question. The hints are then sent to GPT-4, which evaluates which model provided the better response.

3. Koala

A new model "Koala" released by UC Berkeley AI Research Institute (BAIR). Compared with the previous use of OpenAI's GPT data for instruction fine-tuning, the difference of Koala is that it uses high-quality data obtained from the network for training .
koala
In this paper, the authors introduce Koala, a chatbot trained by fine-tuning Meta's LLaMA on conversational data collected from the web. The authors describe the dataset management and training process of our model, and present the results of a user study that compared the model with ChatGPT and Stanford University's Alpaca. The findings show that koalas can effectively respond to a variety of user queries, generating responses that are generally more popular than alpacas, and tied with ChatGPT at least in more than half of the cases.
Clala
The authors hope that these results further contribute to the discussion around the relative performance of large closed-source models versus small public models. In particular, it shows that sufficiently small models can run locally and achieve much of the performance of their larger cousins ​​if trained on carefully collected data. This may mean, for example, that the community should devote more effort to curating high-quality datasets, as this may contribute to safer, more realistic, and more capable models than simply increasing the size of existing systems . The authors stress that Koala is a research prototype, and while it is hoped that its release will provide a valuable community resource, it still has significant flaws in content, safety, and reliability and should not be used outside of research.
Online interactive demo

4. ChatLLaMA

Nebuly open-sourced ChatLLaMA, a framework that lets us create conversational assistants using our own data.
ChatLLaMA
ChatLLaMA is a library that allows you to create hyper-personalized ChatGPT-like assistants using your own data and as little computation as possible. Instead of relying on one big assistant that “rules us all,” we envision a future where each of us can create our own personalized version of a ChatGPT-like assistant. Imagine a future where many ChatLLaMAs at the "edge" will support various human needs. However, creating a personalized assistant at the "edge" requires a huge optimization effort in multiple aspects: dataset creation, efficient training of RLHF, and inference optimization.
ChatLLaMA
This library aims to simplify the development of hyper-personalized ChatLLaMA assistants. Its purpose is to give developers peace of mind by abstracting the work required to compute optimizations and collect large amounts of data. ChatLLaMA is designed to help developers tackle various use cases, all related to RLHF training and optimizing inference. Here are some use case references:

  • Create ChatGPT-like personalized assistants for vertical-specific tasks (legal, medical, gaming, academic research, etc.);
  • Want to use limited data on local hardware infrastructure to train an efficient ChatGPT-like assistant;
  • Want to create your own personalized version of ChatGPT-like assistants while avoiding out-of-control costs;
  • Want to know which model architecture (LLaMA, OPT, GPTJ, etc.) best meets my requirements in terms of hardware, computing budget and performance;
  • Wanting to align the Assistant with my personal/company values, culture, brand and manifesto.

5. FreedomGPT

Built with Electron and React, FreedomGPT is a desktop application that allows users to run LLaMA on their local machines.
FreedomGPT
The characteristics of FreedomGPT can be seen from its name - the questions it answers are not subject to any review or security filtering . The program was developed by Age of AI, an AI venture capital firm. FreedomGPT is built on top of Alpaca. FreedomGPT uses the salient features of Alpaca because it is relatively more accessible and customizable than other models.

ChatGPT follows OpenAI's usage policy, limiting hate, self-harm, threats, violence, and sexual content. Unlike ChatGPT, FreedomGPT answers questions without bias or favoritism, and does not hesitate to answer controversial or controversial topics.

FreedomGPT even answered "how to make a bomb at home", while OpenAI specifically removed this from GPT-4. FreedomGPT is unique because it overcomes censorship restrictions and caters to controversial topics without any guarantees. Its symbol is the Statue of Liberty, because this unique and bold big language model symbolizes freedom.

FreedomGPT can even run locally on a computer without the need for an internet connection.

6. ColossalChat

Colossal AI was developed based on the expertise of Professor James Demmel, Distinguished Professor at UC Berkeley, and Professor Yang You, Presidential Youth Professor at the National University of Singapore. Since the release of open source, Colossal AI has been ranked first on GitHub Trending many times, with about 20,000 GitHub stars, and has been successfully accepted as the official tutorial of top international artificial intelligence and HPC conferences such as SC, AAAI, PPoPP, CVPR and ISC .

ColossalChat needs less than 10 billion parameters to achieve Chinese-English bilingual ability, and the effect is comparable to ChatGPT and GPT-3.5. In addition, ColossalChat, based on the LLaMA model, also reproduces the complete RLHF process, which is currently the closest open source project to the original technical route of ChatGPT.
https://chat.colossalai.org/

Complete ChatGPT Cloning Solution

ColossalChat is the first open-source complete RLHF pipeline implementation based on the LLaMA pre-training model, including supervised data collection, supervised fine-tuning, reward model training, and reinforcement learning fine-tuning. The ChatGPT training process can be replicated with 1.6GB of GPU memory and experience a 7.73x speedup during training. It includes the following:

  • Demo: An interactive demo to try online without registration or waiting list.
  • Training code: open source complete RLHF training code, including 7B and 13B models.
  • Dataset: Open source 104K Chinese-English bilingual dataset.
  • Inference: 4-bit quantized inference for a 7 billion parameter model, requiring only 4GB of GPU memory.
  • Model weight: A single server only needs a small amount of computing power to achieve rapid reproduction.
  • Other larger models, datasets, and other optimizations will be updated and added quickly.

While models in the GPT family, such as ChatGPT and GPT-4, are very powerful, they are unlikely to be fully open source. Fortunately, the open source community has been working hard to solve this problem.
For example, Meta has open-sourced the LLaMA model, which provides parameter sizes ranging from 7 billion to 65 billion. A 13 billion parameter model can outperform a 175 billion GPT-3 model on most benchmarks. However, since it does not have an instruction adjustment stage, the actual results it generates are not satisfactory.
Stanford's Alpaca generates training data in a self-guided fashion by calling OpenAI's API. With only 7 billion parameters, this lightweight model can be fine-tuned at a fraction of the cost to achieve conversational performance similar to very large language models with 175 billion parameters such as GPT-3.5.
However, existing open-source solutions can only be regarded as supervised fine-tuning models in the first stage of RLHF (Reinforcement Learning with Human Feedback), without performing subsequent alignment and fine-tuning stages. In addition, Alpaca's training dataset is limited to English, which limits the performance of the model to some extent.
However, the impressive performance of ChatGPT and GPT-4 is due to the introduction of RLHF during training, which increases the consistency of generated content with human values.
RLHF

Chinese-English bilingual training data set

ColossalChat released a bilingual dataset containing about 100,000 Chinese-English question-answer pairs. The dataset is collected and cleaned from real problem scenarios on social media platforms as a seed dataset, augmented with self-instruct, and the labeling cost is ~$900. Compared with datasets generated by other self-instruct methods, this dataset contains more realistic and diverse seed data covering a wider range of topics. This dataset is suitable for fine-tuning and RLHF training. In the case of providing high-quality data, ColossalChat can achieve better dialogue interaction, and it also supports Chinese.
Bilingual training dataset

Complete RLHF pipeline

RLHF algorithm forking involves three phases:

  • In RLHF-Stage1, the aforementioned dataset is used for supervised instruction fine-tuning to fine-tune the model.
  • In RLHF-Stage2, the reward model is trained by manually ranking different outputs of the same cue to assign corresponding scores, and then the training of the reward model is supervised.
  • In RLHF-Stage3, the reinforcement learning algorithm is used, which is the most complicated part of the training process:
    RLHF-Stage3
    in the PPO part, ColossalChat follows a two-stage process: the first is the make experience stage, using SFT (Supervised Fine-Tuning), Actor , RM (Reward Model) and Critic model calculate the generated experience and store it in the buffer. Then there is the parameter update stage, using experience to calculate the strategy loss and value loss.

In the PTX part, ColossalChat computes the cross-entropy loss between the actor's output response and the response part of the input corpus. This loss is used to add pre-training gradients to the PPO gradient to maintain the original performance of the language model and prevent forgetting. Finally, policy loss, value loss and PTX loss are summed for backpropagation and parameter update.

Related Links

  1. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality
  2. Koala: A Dialogue Model for Academic Research
  3. ColossalChat: an open-source solution for cloning ChatGPT and the full RLHF pipeline
  4. ColossalChat: an open source solution for complete RLHF replacement of ChatGPT

Guess you like

Origin blog.csdn.net/ARPOSPF/article/details/130222883