Introduction to LLaMA: An introduction to the official website of a large-scale language model with 65 billion parameters

February 24, 2023

UPDATE: We just launched Llama 2  - see our blog post on Llama 2 for more information on the latest .

As part of Meta's commitment to open science, today we are publicly releasing LLaMA (Large Language Model Meta AI), a state-of-the-art foundational large language model designed to help researchers advance work in this subfield of artificial intelligence. Smaller, higher-performing models such as LLaMA enable others in the research community who do not have access to extensive infrastructure to study these models, further democratizing access to this important and rapidly changing field.

Training a small base model in the large language model space is desirable because it requires less computing power and resources to test new methods, validate the work of others, and explore new use cases. Base models are trained on large amounts of unlabeled data, which makes them ideal for fine-tuning on a variety of tasks. We are making LLaMA available in multiple sizes (7B, 13B , 33B, and 65B parameters ) and sharing LLaMA model cards detailing how we build models according to our approach to responsible AI practices .

Last year, large language models—natural language processing (NLP) systems with billions of parameters—shown new capabilities for generating creative text, solving mathematical theorems , predicting protein structures , answering reading comprehension questions , and more. They are one of the clearest examples of the enormous potential benefits AI can deliver to billions of people at scale.

Despite all recent progress in large language models, comprehensive research access to them remains limited due to the resources required to train and run such large models. This restricted access limits researchers' ability to understand how and why these large language models work, hampering progress in efforts to improve their robustness and mitigate known issues such as bias, toxicity, and the potential for misinformation .

Smaller models trained on more tokens (i.e., word fragments) are easier to retrain and fine-tune for specific potential product use cases. We train LLaMA 65B and LLaMA 33B on 1.4 trillion tokens . Our smallest model, LLaMA 7B, is trained on one trillion tokens .

Like other large language models, LLaMA works by taking a sequence of words as input and predicting the next word to recursively generate text. To train our model, we selected text from the 20 most spoken languages, focusing on languages ​​with Latin and Cyrillic scripts.

More research is needed to address the risks of bias, toxic comments, and hallucinations in large language models . Like other models, LLaMA faces these challenges. As a base model, LLaMA is designed to be versatile and can be applied to many different use cases, rather than a fine-tuned model designed for a specific task. By sharing the code for LLaMA, other researchers can more easily test new ways to limit or eliminate these problems in large language models. We also provide a series of benchmark assessments in this paper to assess model bias and toxicity, to show the limitations of the models and to support further research in this critical area.

To maintain integrity and prevent misuse, we will release our models under a non-commercial license focused on research use cases . Access to the model will be granted on a case-by-case basis to academic researchers; those affiliated with government, civil society, and academic organizations; and industry research laboratories around the world. Those interested in applying for access can find a link to that application in our research paper.

We believe that the entire AI community—academic researchers, civil society, policymakers, and industry—must work together to develop clear guidelines around responsible AI, and in particular responsible large language models. We look forward to seeing what the community can learn and ultimately build using LLaMA.

Guess you like

Origin blog.csdn.net/sinat_37574187/article/details/131997720