Meta open source AI audio and music generation model

Over the past few years, we have seen huge advances in AI for image, video, and text generation. However, progress in the field of audio generation has lagged behind. This time MetaAI contributes another major product to open source: AudioCraft, an audio generation development framework that supports multiple audio generation models.

picture

AudioCraft open source address

Open source address: https://github.com/facebookresearch/audiocraft

Note that the framework is open source, but the three models are open source and not commercially available~~

AudioGen model address:

https://www.datalearner.com/ai-models/pretrained-models/AudioGen


MusicGen model address:

https://www.datalearner.com/ai-models/pretrained-models/MusicGen

Introduction to AudioCraft

Producing High-Fidelity Audio Audio of any kind requires modeling complex signals and patterns at different scales. Music is perhaps the most challenging type of audio, as it consists of local and long-range patterns, ranging from a sequence of notes to global musical structures with multiple instruments. Generating coherent music with AI is often achieved by using a notational representation like MIDI or a piano roll. However, these methods cannot fully capture the performance nuances and stylistic elements in music.

To this end, MetaAI has open sourced AudioCraft, a framework that can be used to generate audio. It supports a range of models, produces high-quality audio with long-term consistency, and users can easily interact with it through a natural interface.

AudioCraft is for music and sound generation as well as compression, all on the same platform. Because it is easy to build and reuse, people looking to build better sound generators, compression algorithms, or music generators can do it all in the same code base and build on what others have already built.

Models Supported by AudioCraft

AudioCraft consists of three models: MusicGen, AudioGen and EnCodec. MusicGen was trained using Meta-owned and specially licensed music to generate music from text input, while AudioGen was trained using publicly available sound effects to generate audio from text input. In addition, there is an improved version of the EnCodec decoder, which can generate higher quality music with less artifacts.

Simply put, MusicGen is a model for text-generated music:

https://www.datalearner.com/ai-models/pretrained-models/MusicGen


AudioGen is a model for generating arbitrary audio from text:

https://www.datalearner.com/ai-models/pretrained-models/AudioGen


The other EnCodec refers to a real-time, high-fidelity audio codec that utilizes neural networks.

The picture below is the actual case of the official demo of AudioGen and MusicGen:

picture

It can be seen that for the AudioGen model, it only needs to give a piece of text to generate music. The first example is to let the model generate a whistle with wind, and the result is very good.
Note that I can't actually test the pictures here, you can go to the official website to see the real effect.

The MusicGen model is a description that can generate music. Although I don't understand it, I think it sounds pretty good.

AudioCraft uses

AudioCraft relies on Python3.9 and PyTorch2.0, so you need to ensure that your system environment is satisfactory, you can install and upgrade through the following commands:

# Best to make sure you have torch installed first, in particular before installing xformers.

# Don't run this if you already have PyTorch installed.

pip install 'torch>=2.0'

# Then proceed to one of the following

pip install -U audiocraft  # stable release

pip install -U git+https://[email protected]/facebookresearch/audiocraft#egg=audiocraft  # bleeding edge

pip install -e .# or if you cloned the repo locally (mandatory if you want to train).

The official also recommends installing in the system ffmpeg:

sudo apt-get install ffmpeg

If you have anaconda, you can also install it with the following command:

conda install 'ffmpeg<5'-c  conda-forge

Once installed it is easy to use:

import torchaudio

from audiocraft.models importAudioGen

from audiocraft.data.audio import audio_write


model =AudioGen.get_pretrained('facebook/audiogen-medium')

model.set_generation_params(duration=5)# generate 8 seconds.

wav = model.generate_unconditional(4)# generates 4 unconditional audio samples

descriptions =['dog barking','sirene of an emergency vehicule','footsteps in a corridor']

wav = model.generate(descriptions)# generates 3 samples.


for idx, one_wav in enumerate(wav):

# Will save under {idx}.wav, with loudness normalization at -14 db LUFS.

    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)

Guess you like

Origin blog.csdn.net/weixin_48827824/article/details/132104800