A brief history of large model open source, an important promoter of catching up with chatGPT!

Large language models (LLMs) have revolutionized the field of artificial intelligence, and their long-term impact is growing stronger. OpenAI's ChatGPT, a highly advanced form of conversational artificial intelligence, has seen major breakthroughs in recent months, sparking fierce competition among companies and researchers. Many are racing to develop the most advanced conversational AI systems, vying to rival OpenAI's excellence.

Google contributed through Bard, which fine-tunes on PaLM-E, and openAI developed a GPT-4 large-scale language model with multimodal capabilities. In addition, Meta developed its own LLM, called LLaMa, as a response to the push for open-source LLMs. A lot of information related to state-of-the-art LLMs has recently emerged, not least because Meta has chosen to only share LLaMa's architecture with the research community for non-commercial purposes.

Interestingly, LLaMa's weights were eventually leaked, enabling anyone, not just experts or commercial entities, to try out these high-performance models for themselves.

Meta released LLaMa on February 24, 2023, with the main goal of providing the academic research community with access to this superior LLM. The team presented four versions of LLaMa with different parameters: 7B, 13B, 33B and 65B. Like other large language models, LLaMa recursively generates text by feeding it a sequence of words and predicting the next word. According to its paper, LLaMa-13B surpasses GPT-3 (175B) in most benchmarks, while LLaMa-65B is comparable to the best models such as Chinchilla-70B (DeepMind) and PaLM-540B (Google).

The LLaMa model is publicly available through the Facebook Research GitHub for non-commercial use by the research community. However, only the untrained model is available, and the trained weights are available separately via a Google form for research purposes. It is worth noting that training LLaMa at this scale requires 2048 A100 GPUs, each of which costs about $15,000. This shows the enormous resources required to create such a model.

Besides the overhead, having a large and clean dataset is crucial for training LLaMa. These models require trillions of tokens to train, with 1.4 trillion tokens for LLaMa-65B and LLaMa-33B and 1 trillion tokens for LLaMa-7B. By using these pre-trained LLMs, fine-tuning can be done to obtain dialogue models capable of human interaction, like a replica of ChatGPT.

An important challenge, however, is obtaining the data needed to fine-tune the models without spending millions of dollars in manual intervention. This is what OpenAI used to train InstructGPT (the model behind ChatGPT).

Researchers at Stanford University have discovered an inexpensive alternative to fine-tuning LLaMa without spending too much money. They presented Alpaca-7B, a model fine-tuned from the LLaMa-7B model, using a 52,000 instruction-following demonstration. A key problem with instruction-following models such as ChatGPT is generating disinformation, propagating social stereotypes, and generating harmful language.

To solve these problems, OpenAI created InstructGPT by spending millions of dollars evaluating "bad" answers using human feedback (RLHF). However, OpenAI does not publicly disclose the dataset used to train InstructGPT, making replicating such models a challenge. Researchers at Stanford University addressed this problem by using Da-Vinci-003 built on top of InstructGPT to generate 52,000 instruction-following examples for 175 self-guided seed tasks.

According to the Stanford team, it cost about $500 to generate the 52,000 instruction-following examples, and about $100 to train the model using eight 80GB A100 GPUs in just three hours. Despite the smaller model size, Alpaca and Da-Vinci-003 perform similarly in human evaluation in terms of answer quality.

Additionally, Vicuna is built on top of the original LLaMa model and is said to perform nearly as well as OpenAI's ChatGPT or Google's Bard on instruction-following tasks, all at an overall training cost of just $300. Two versions of Vicuna have been released for non-commercial use: 7B and 13B parameters. A major upgrade in Vicuna compared to previous models is an increase in the maximum context length, from 512 tokens in Alpaca to 2048 tokens.

However, a limitation of these models is their large size and high memory requirements. Deploying these models requires high energy and financial costs. This limitation has led some developers to believe that only enterprises with large-scale infrastructure can really benefit from these models. However, the work of Georgi Gerganov at llama.ccp has changed that.

Gerganov's llama.ccp code takes LLMs to a new level by converting process LLMs originally written in Python to C/C++. C/C++ is a low-level programming language that does not require machine compilation and therefore executes faster. Additionally, the code supports 4-bit quantization, a process of converting 32-bit floating-point numbers such as weights and activation outputs to the nearest 8-bit fixed-point number, allowing for smaller models and faster inference.

Thanks to the contributions of Gerganov and others, plus the leaked LLaMa weights, it is now possible to run any instruction-following model (like Alpaca or Vicuna) directly on a laptop. Multiple projects detail the use of llama.ccp to run Vicuna on personal devices, paving the way for accessible open-source AI advances without significant resource constraints.

read

English original

recommend

No public

AI Good Book Recommendation

AI is changing with each passing day, but a high-rise building cannot be separated from a good foundation. Are you interested in learning about the principles and practice of artificial intelligence? Look no further! Our book on AI principles and practices is the perfect resource for anyone looking to gain insight into the world of AI. Written by leading experts in the field, this comprehensive guide covers everything from the basics of machine learning to advanced techniques for building intelligent systems. Whether you are a beginner or an experienced AI practitioner, this book has you covered. So why wait?

The principles and practices of artificial intelligence comprehensively cover the classics of various important systems of artificial intelligence and data science

Peking University Press, Principles and Practice of Artificial Intelligence Artificial intelligence and data science from entry to proficiency Detailed explanation of machine learning deep learning algorithm principles

Guess you like

Origin blog.csdn.net/robot_learner/article/details/131201824