Training efficiency increases 40 times! Open source pictures generate 3D models, Stable Zero123 is here

On December 14, the famous generative AI open source platform Stability.ai was open sourced on the official website. Pictures generate high-quality 3D models - Stable Zero123.

Stable Zero123 is developed based on the Zero123 model jointly open sourced by Toyota Research Institute and Columbia University in March this year. The model has been greatly optimized mainly by changing the rendering data set and fractional distillation. Not only does the generated 3D model perform better than Zero123, but the training efficiency is also increased by 40 times.

It is worth mentioning thatStable Zero123 can be used in combination with Stability.ai’s latest open source high-precision image model SDXL, which is equivalent to 3D model extension plug-in.

Stable Zero123 open source address: https://huggingface.co/stabilityai/stable-zero123

zero123 open source address: https://github.com/cvlab-columbia/zero123

zero123 paper: https://arxiv.org/abs/2303.11328

picture

High quality data set

High-quality data sets have become one of the important aspects of pre-training large models, even more than more neurons.

Therefore, Stability.ai filtered Objaverse-XL, a data set of more than 10 million 3D models, and only retained models with high quality, accuracy, and accurate data annotation.

picture

This allows Stable Zero123 to better understand and generate 3D models during the generation process. Zero123’s latest model XL is also trained based on this data set.

Objaverse-XL address: https://github.com/allenai/objaverse-xl

picture

A brief introduction to Stable Zero123

Since Stable Zero123 has no open papers for the time being, we can only use Zero123 to interpret it for everyone.

In fact, what is interesting about these two models is that they learn from each other. Zero123 is based on Stability.ai's open source Vincent graph model Stable Diffusion, which has been innovated and fine-tuned so that it can learn to control the relative perspective transformation of the camera, then perform denoising based on the perspective diffusion method, and finally use the 3D reconstruction method to reshape the model.

Learning to control the camera perspective: Zero123 pre-trained the Stable Diffusion model through fine-tuning on a synthetic data set, allowing it to learn to control relative camera perspective changes without destroying other aspects of the model. express.

View-based diffusion: The CLIP embedding of the image and the relative perspective transformation are spliced ​​as condition information to guide the denoising process. At the same time, the input image is also spliced ​​with the image being denoised to help maintain the recognition and details of the target object.

picture


Detailed comparison of the generation effects of the two models

3D model reconstruction: Optimize a voxel radiation field representation via SJC and supervise it using a view-based diffusion model. Randomly sample the perspective, use Zero123 to synthesize the image under the corresponding perspective, calculate the score between the image and the voxel rendering result, and update the voxel field.

This allows the rich 2D textures and shapes output by the Stable Diffusion model to be perfectly injected into the 3D modeling process to form a 3D model.

Currently, the Stable Zero123 model is mainly used for research and will be open for commercial use in the future.

The material of this article comes from Stability.ai official website and Zero123 paper. If there is any infringement, please contact us to delete it.

Supongo que te gusta

Origin blog.csdn.net/richerg85/article/details/135004081
Recomendado
Clasificación