[MetaAI] Open source models and tools released by MetaAI in 2023

Meta AI

Meta CEO Zuckerberg said sharing the models developed by Meta with other researchers can help the company promote innovation, discover security vulnerabilities and reduce costs. "For us, if the industry standardizes on the basic tools that we are using, then we can benefit from the improvements of others," he told investors in April.

Llama

2023.02.24
LLaMA: Open and efficient basic language model
This is a collection of basic language models with parameters ranging from 7B to 65B. We train our models on trillions of tokens and show that state-of-the-art models can be trained exclusively using publicly available datasets without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) in most benchmarks, while LLaMA-65B can compete with the best models Chinchilla-70B and PaLM-540B.
Is Meta’s open source LLaMa easy to use? The most complete evaluation results are here - Xixiaoyao Technology says
the open source MMS model can recognize 1100+ languages ​​- Xinzhiyuan

Segment Anything

2023.04.05
Segment Anything (SAM) is a universal segmentation model
https://arxiv.org/abs/2304.02643
[segment-anything] - Meta open source everything can be segmented AI model, a blog written before
Insert image description here

DINOv2

2023.04.18
State-of-the-art computer vision model with self-supervised learning

  • Meta AI builds DINOv2, a new way to train high-performance computer vision models.
  • DINOv2 provides powerful performance and requires no fine-tuning. This makes it suitable for use as a backbone for many different computer vision tasks.
  • Because it uses self-supervision, DINOv2 can learn from any collection of images. It can also learn features that current standard methods cannot learn, such as depth estimation.
  • We are open sourcing our model and sharing interactive demos.
    Insert image description here

ImageBind

2023.05.09
Article address
GitHub repository
allows models to communicate across 6 different modalities (image, text, audio, depth, thermal energy and IMU data)! Based on this project, developers can implement various emerging applications "out of the box" including cross-modal retrieval, use of arithmetic synthesis modalities, cross-modal detection and generation, etc.
ImageBind is a multi-modal AI model that can embed text, audio, visual, thermal (infrared), and IMU data into a vector space.
From the demonstration, it is possible to convert pictures to audio, audio to image, text to image and audio, image and audio to image, and audio to generate images with other models.
Insert image description here

Insert image description here

MMS

2023.05.23
Github warehouse address
open source MMS model can recognize 1100+ languages ​​- Xinzhiyuan
Massively Multilingual Speech:
Using self-supervised learning of wav2vec 2.0, MMS expands speech technology to 1100 to 4000 languages.

  • From text to speech
  • and speech-to-text conversion
  • It can speak 1,100 languages ​​and understand 4,000 languages.
    The most popular model before this was probably Whisper
    Meta. It is mentioned in the document that the error rate is 50% lower than Whisper.

Lima

2023.05.23
The paper address
does not have RLHF, but it is still comparable to GPT-4 and Bard. Meta releases the 65 billion parameter language model LIMA-Heart of the Machine

Lima is an improvement on llama. I feel that the idea of ​​​​LIMA is strong enough pre-training. Add a few examples of your task SFT to activate the effect on your task.

LIMA is Meta's new large language model (LLM). It is based on the 65B LLAMA and is trained on only 1000 samples. It performs as well as the current state-of-the-art LLM. LLM doesn't require too many examples, and large models don't need to be really "big".

LLaMa's large-scale fine-tuning model, LIMA, claims to have achieved very good results by using only 1,000 carefully planned tips and feedback for fine-tuning.

We measure the performance of these two stages by training LIMA, a 65 billion-parameter LLaMa language model, fine-tuned on 1000 curated prompts and feedback using only standard supervised learning losses, without any reinforcement learning or human preference models. relative importance.

LIMA has demonstrated strong performance in learning specific response formats from only a few samples in the training data, including complex queries ranging from planning travel itineraries to inferring historical alternative scenarios.

Furthermore, the model tends to generalize well to new tasks that do not appear in the training data. In a controlled human study,
LIMA's feedback was comparable to or strictly preferred
to GPT-4 43% of the time, and as high as 58% when compared to Bard, and 58% when compared to DaVinci003 trained on human feedback. The proportion reaches 65%.

Taken together, these results strongly suggest that nearly all knowledge in large language models is learned in the pre-training stage, and only limited instructions to adjust the data are needed to teach the model to produce high-quality output.

Voicebox

2023.06.16
Article address

Meta AI has developed a speech generation AI model that is very advanced in all aspects: Voicebox is
different from other speech generation AIs in that it requires specific training for each task using carefully prepared training data.
Voicebox uses a new method to learn from only the original audio and accompanying transcriptions. This approach increases the flexibility of the model, allowing it to be better adapted to a variety of tasks

MusicGen

2023.06.19Official
website
experience address
demo address

Simple and controllable music generation model

MusicGen is a single-order autoregressive Transformer model trained on an EnCodec tokenizer at 32kHz, with 4 codebooks sampled at 50Hz.

  • Monolinguistic model (LM) for conditional music generation
  • Runs with compressed music tokens, no need for multiple models
  • Generate high-quality samples guided by text or melody
  • Extensive evaluation shows that MusicGen outperforms baseline models
  • Research highlights the importance of each component in MusicGen

Llama 2

2023.07.18
Article address

Meta releases the free commercial version Llama 2, and the large model landscape has changed dramatically again

  1. Contains 3 scales: LLAMB 70 billion parameters, LLAMM 13 billion parameters, and LLAMS 7 billion parameters. Using Transformer architecture.
  2. Compared with Llama 1, the training data is increased by 40% and the model context length is doubled. Performance is significantly improved, almost rivaling the proprietary model GPT-3.5.
  3. Llama 2-Chat is a dialogue-optimized version. Through supervised fine-tuning and RLHF methods, it outperforms other open source models in the naturalness and coherence of single-round and multi-round conversations, and is comparable to ChatGPT.
  4. The security of the model has been strengthened, various technologies have been used to reduce harmful output, and the security evaluation results are better than other open source models.

AudioCraft

2023.08.02
Article address

AudioCraft is a simple framework that generates high-quality, realistic audio and music based on text-based user input, trained on raw audio signals (rather than MIDI or piano rolls).

AudioCraft contains three models: MusicGen, AudioGen and EnCodec. MusicGen is trained using Meta-owned and exclusively licensed music to generate music based on text-based user input, while AudioGen is trained using public sound effects to generate audio based on text-based user input. Today we are excited to announce an improved version of the EnCodec decoder that generates higher quality music with fewer artifacts; our pre-trained AudioGen model that allows you to generate ambient sounds and sound effects such as barking dogs, A car horn or footsteps on a wooden floor; as well as all AudioCraft model weights and code. These models can be used for research purposes and to improve understanding of the technology.
Insert image description here

SeamlessM4T

2023.08.22
Article address

This is a foundational multilingual and multitasking model that can seamlessly translate and transcribe speech and text. SeamlessM4T supports:

  • Automatic speech recognition for nearly 100 languages
  • Speech-to-text translation for nearly 100 input and output languages
  • Voice translation, supporting nearly 100 input languages ​​and 35 (+ English) output languages
  • Text-to-text translation for nearly 100 languages
  • Text-to-speech translation, supporting nearly 100 input languages ​​and 35 (+ English) output languages
    Insert image description here

Guess you like

Origin blog.csdn.net/liluo_2951121599/article/details/132619842