Create a Marvel 3D digital human in 5 minutes! The US team's Spider-Man clown can handle it, restore facial details in high-definition丨SIGGRAPH 2023...

Yun Zhong sent from Aofei Temple
Qubit | Public Account QbitAI

With the development of computer graphics, 3D generation technology is gradually becoming a research hotspot. However, there are still many challenges in generating 3D models from text or images.

Recently, companies such as Google, NVIDIA, and Microsoft have introduced 3D generation methods based on Neural Radiation Fields (NeRF), but these methods have compatibility issues with traditional 3D rendering software (such as Unity, Unreal Engine, and Maya, etc.), which limits their use. widely used in practical applications.

To this end, the research and development team of Yingmu Technology and ShanghaiTech University proposed a text-guided progressive 3D generation framework, aiming to solve these problems.

Generate 3D assets based on text description

The text-guided progressive 3D generation framework (DreamFace for short) proposed by the research team combines visual-language models, implicit diffusion models and physically-based material diffusion techniques to generate 3D assets that meet computer graphics production standards.

The innovation of this framework lies in its three modules of geometry generation, physics-based material diffusion generation and animation ability generation.

This work has been accepted by the top journal Transactions on Graphics and will be presented at SIGGRAPH 2023, the top international conference on computer graphics.

b20008ff4df94b93fde54b6ca1ffc595.png

Project website: https://sites.google.com/view/dreamface
Preprint paper: https://arxiv.org/abs/2304.03117
Web Demo: https://hyperhuman.top
HuggingFace Space: https://huggingface .co/spaces/DEEMOSTECH/ChatAvatar

How to realize the three major functions of DreamFace

DreamFace mainly includes three modules, geometry generation, physics-based material diffusion and animation ability generation. Compared to previous 3D generation work, the main contributions of this work include:

adfca599102f04d3b48d2d52c48154c6.png

  1. We propose DreamFace, a novel generative scheme that combines state-of-the-art visual-language models with animatable and physically materialized facial assets, with incremental learning to separate geometry, appearance, and animation capabilities.

  2. A design for dual-pass appearance generation is introduced, combining a novel material diffusion model with a pre-trained model and two-stage optimization in both latent space and image space.

  3. Facial assets using BlendShapes or generated Personalized BlendShapes are capable of animation and further demonstrate the use of DreamFace for natural character design.

3226ff3e15694843c959a25aac0cc41e.png

Geometry generation : This module generates a geometric model based on text prompts through the CLIP (Contrastive Language-Image Pre-Training) selection framework.

Candidates are first randomly sampled from the face geometric parameter space, and then the rough geometric model with the highest matching score is selected according to the text prompt.

Next, the implicit diffusion model (LDM) and score distillation sampling (SDS) are used to add facial details and detailed normal maps to the coarse geometry model, resulting in high-precision geometry.

97a2fa24d284d1d849fa2e1cd0ab3ef9.pnga21d7406774e6867054b9afc3635cea3.png

Physically Based Material Diffusion Generation : This module generates facial textures for predicted geometry and text cues. First, the LDM is fine-tuned to obtain two diffusion models.

Then, these two models are coordinated through a joint training scheme, one for directly denoising U texture maps and the other for supervised rendering of images. In addition, a hint learning strategy and non-face area masks are employed to ensure the quality of the generated diffuse maps.

Finally, a super-resolution module is applied to generate 4K physically-based textures for high-quality rendering.

80861e87e713b5ad846081e32c92bd72.png48ed5088b325b18d4e9d3f55effa13f4.png

Animation ability generation : The model generated by DreamFace has animation ability. Unlike traditional BlendShapes-based methods, this framework animates resting (Neutral) models by predicting unique deformations, thereby generating personalized animations.

A geometric generator is first trained to learn the expression latent space, and then an expression encoder is trained to extract expression features from RGB images. Finally, a personalized animation is generated by using monocular RGB images.

Generate specified 3D assets in 5 minutes

The DreamFace framework achieves promising results on tasks such as celebrity generation, description generation and character generation, and outperforms previous work in user evaluation.

5800872309b1f4902a585ba0a15fb93a.png76a6c2d3d23acdeede08ea4cc54bdc9d.png

At the same time, compared with existing methods, it has obvious advantages in running time.

4502481a532120dd53bdde351bdd2cd9.png

In addition, DreamFace supports texture editing using hints and sketches to achieve global editing effects (such as aging, makeup) and local editing effects (such as tattoos, beards, birthmarks).

9721d44d8443a3b3ba8960e783db6ec7.png

Can be used in film and television, games and other industries

As a text-guided progressive 3D generation framework, DreamFace combines vision-language model, implicit diffusion model and physically-based material diffusion technology to achieve 3D generation with high precision, high efficiency and good compatibility.

This framework provides an effective solution to complex 3D generation tasks and is expected to promote more similar research and technological development.

In addition, physics-based material diffusion generation and animation ability generation will promote the application of 3D generation technology in film and television production, game development and other related industries.

Guess you like

Origin blog.csdn.net/QbitAI/article/details/130479152
Recommended