Yun Zhong sent from Aofei Temple
Qubit | Public Account QbitAI
With the development of computer graphics, 3D generation technology is gradually becoming a research hotspot. However, there are still many challenges in generating 3D models from text or images.
Recently, companies such as Google, NVIDIA, and Microsoft have introduced 3D generation methods based on Neural Radiation Fields (NeRF), but these methods have compatibility issues with traditional 3D rendering software (such as Unity, Unreal Engine, and Maya, etc.), which limits their use. widely used in practical applications.
To this end, the research and development team of Yingmu Technology and ShanghaiTech University proposed a text-guided progressive 3D generation framework, aiming to solve these problems.
Generate 3D assets based on text description
The text-guided progressive 3D generation framework (DreamFace for short) proposed by the research team combines visual-language models, implicit diffusion models and physically-based material diffusion techniques to generate 3D assets that meet computer graphics production standards.
The innovation of this framework lies in its three modules of geometry generation, physics-based material diffusion generation and animation ability generation.
This work has been accepted by the top journal Transactions on Graphics and will be presented at SIGGRAPH 2023, the top international conference on computer graphics.
Project website: https://sites.google.com/view/dreamface
Preprint paper: https://arxiv.org/abs/2304.03117
Web Demo: https://hyperhuman.top
HuggingFace Space: https://huggingface .co/spaces/DEEMOSTECH/ChatAvatar
How to realize the three major functions of DreamFace
DreamFace mainly includes three modules, geometry generation, physics-based material diffusion and animation ability generation. Compared to previous 3D generation work, the main contributions of this work include:
We propose DreamFace, a novel generative scheme that combines state-of-the-art visual-language models with animatable and physically materialized facial assets, with incremental learning to separate geometry, appearance, and animation capabilities.
A design for dual-pass appearance generation is introduced, combining a novel material diffusion model with a pre-trained model and two-stage optimization in both latent space and image space.
Facial assets using BlendShapes or generated Personalized BlendShapes are capable of animation and further demonstrate the use of DreamFace for natural character design.
Geometry generation : This module generates a geometric model based on text prompts through the CLIP (Contrastive Language-Image Pre-Training) selection framework.
Candidates are first randomly sampled from the face geometric parameter space, and then the rough geometric model with the highest matching score is selected according to the text prompt.
Next, the implicit diffusion model (LDM) and score distillation sampling (SDS) are used to add facial details and detailed normal maps to the coarse geometry model, resulting in high-precision geometry.
Physically Based Material Diffusion Generation : This module generates facial textures for predicted geometry and text cues. First, the LDM is fine-tuned to obtain two diffusion models.
Then, these two models are coordinated through a joint training scheme, one for directly denoising U texture maps and the other for supervised rendering of images. In addition, a hint learning strategy and non-face area masks are employed to ensure the quality of the generated diffuse maps.
Finally, a super-resolution module is applied to generate 4K physically-based textures for high-quality rendering.
Animation ability generation : The model generated by DreamFace has animation ability. Unlike traditional BlendShapes-based methods, this framework animates resting (Neutral) models by predicting unique deformations, thereby generating personalized animations.
A geometric generator is first trained to learn the expression latent space, and then an expression encoder is trained to extract expression features from RGB images. Finally, a personalized animation is generated by using monocular RGB images.
Generate specified 3D assets in 5 minutes
The DreamFace framework achieves promising results on tasks such as celebrity generation, description generation and character generation, and outperforms previous work in user evaluation.
At the same time, compared with existing methods, it has obvious advantages in running time.
In addition, DreamFace supports texture editing using hints and sketches to achieve global editing effects (such as aging, makeup) and local editing effects (such as tattoos, beards, birthmarks).
Can be used in film and television, games and other industries
As a text-guided progressive 3D generation framework, DreamFace combines vision-language model, implicit diffusion model and physically-based material diffusion technology to achieve 3D generation with high precision, high efficiency and good compatibility.
This framework provides an effective solution to complex 3D generation tasks and is expected to promote more similar research and technological development.
In addition, physics-based material diffusion generation and animation ability generation will promote the application of 3D generation technology in film and television production, game development and other related industries.