3 amazing AI projects, open source!

The public account follows "GitHubDaily"

Set it as a "star" and take you to GitHub every day!

1789f4bcdc6052ccbeb37d5a8e94e4f4.jpeg

In the past week, from the outside world, AI seems to have slowed down the pace of progress, but only those who are in it can know that AI has not stopped evolving.

The following are a number of practical AI open source projects that have been born on GitHub in the past week, and I will introduce them to you today.

1. Meta open source AI generative music model

Meta today open sourced a Python library on GitHub: Audiocraft, which can generate music directly with AI.

Click to play the video below to see the effect of music generation:

GitHub:https://github.com/facebookresearch/audiocraft

A music generation model called MusicGen is mainly used in it. MusicGen is a single-stage autoregressive Transformer model trained on a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50Hz.

Unlike existing methods such as MusicLM, MusicGen does not require self-supervised semantic representations, and it generates all 4 codebooks at once.

After finishing the volume of Wen Shengwen and Wen Shengtu, the next step is to see how the text will generate music.

2. Diffusers released a major update

Diffusers v0.17.0 is officially released, improving LoRA, Kandinsky 2.1, Torch compilation acceleration and other features.

Diffusers is a well-known go-to library for pretrained diffusion models on GitHub that can be used to generate images, audio, and even 3D structures of molecules.

1a10a383212c9346ae04f538814948f1.jpeg

GitHub:https://github.com/huggingface/diffusers

Whether you're looking for simple inference solutions or training your own diffusion models, Diffusers provides support as a modular toolbox.

The library design focuses on usability and customizability, and mainly provides the following three core components:

  • A state-of-the-art diffusion pipeline that runs in inference with just a few lines of code;

  • Interchangeable noise schedulers for different diffusion speeds and output qualities;

  • Pretrained models can be used as building blocks and combined with schedulers to create your own end-to-end diffusion systems.

This project is free and open sourced by Hugging Face, and you can use it to quickly train ControlNet to further improve the effect and quality of AI painting.

3. Everything is identifiable

Meta has open sourced a Segment Anything Model on GitHub before, which can automatically realize image segmentation.

However, the model performed well in image localization, but the response was mediocre in image recognition.

To this end, Fudan University, together with OPPO researchers and the International School of Digital Economy, has open sourced a powerful basic image tagging model on GitHub: Recognize Anything Model (RAM) .

The model employs a new image labeling paradigm to recognize any common category with high accuracy, and is trained with large-scale image-text pairs rather than manual annotations.

d086eabe5a39f592f25dcfb8016ee7e1.jpeg

GitHub:https://github.com/xinyu1205/Recognize_Anything-Tag2Text

The development of RAM consists of four key steps:

  1. Acquiring unannotated image labels at scale through automatic text semantic parsing;

  2. Using the unified captioning and labeling tasks, a preliminary model is trained for automatic annotation, supervised by raw text and parsed labels, respectively;

  3. Use the data engine to generate additional annotations and clean up incorrect annotations;

  4. The model is retrained on the processed data and fine-tuned using a smaller but higher quality dataset.

After numerous benchmark evaluations, the marking ability of RAM is quite good, and the effect is significantly better than CLIP and BLIP. Remarkably, RAM even outperforms fully supervised approaches and is even comparable to the Google API.

At the same time, the project also includes a tool called Tag2Text, which can directly generate tags for specified objects in the image in batches.

If combined with Meta's open source SAM model, we can remove specified objects in the image in batches, further improving image processing efficiency.

The above are the AI ​​open source projects recommended to you in this issue.

If you want to know more about AIGC, please scan the QR code at the bottom of the article and join our planet for further discussion and communication:

510531b3dea8970ed5786a0e2be1f5fb.png

adf1f456d8cdf0fd374b219d04d872ca.png

Guess you like

Origin blog.csdn.net/sinat_33224091/article/details/131148551