Exploration and Practice of AIGC Generated 3D Model




This article will focus on the exploration and practice of 3D models generated by AIGC, combined with the application requirements of e-commerce platforms, discuss how to use AIGC technology to realize personalized generation, large-scale production and rapid promotion of 3D models, and its role in promoting the e-commerce industry .



background

With the rise of e-commerce platforms, 3D models have become an important means of displaying products. Mobile phone Tmall is actively exploring 3D-related delivery scenarios. 3D e-commerce products allow users to browse 3D product models in real time in the APP, and more Intuitively understand the appearance color, shape structure, physical material and other information of the product, bring users a better consumption experience, bring more possibilities for the combination of virtual and real, and can also be used in the production of AR\VR content in the future. However, the traditional 3D model generation method has problems such as low efficiency, insufficient precision, and difficulty in meeting individual needs, which makes it difficult for merchants to achieve the desired effect when displaying products. The emergence of AIGC (Artificial Intelligence in Graphics and Computing) technology provides a new idea and method for the generation of 3D models. AIGC technology can use the combination of artificial intelligence and computer graphics to achieve more efficient, accurate and flexible 3D model generation.

technical model


NeRF neural radiation field  


In 2020, Google proposed NeRF, a 3D modeling method for neural radiation fields, which can render higher-quality images, does not require manual model modification, and has a lower cost of 3D modeling, bringing new ideas to large-scale 3D model production. At first, the Taobao Meta team was mainly committed to improving the NeRF reconstruction effect (improvement of the clarity of fabric details, the clarity of product text, etc.), so that its rendering clarity can reach the industrial landing standard.


Neural Radiant Field (NeRF) is a simple fully-connected network (about 5MB in weight) trained to reproduce an input view of a single scene using a rendering loss. The network maps directly from spatial position and viewing direction (5D input) to color and opacity (4D output), acting as a "volume", so we can use volume rendering to render new views. Neural Radiation Field is a deep learning model for three-dimensional implicit space modeling. This deep learning model is also called a fully connected neural network (also known as a multi-layer perceptron). The task NeRF needs to do is Novel View Synthesis, which is generally translated as a new perspective synthesis task. The definition is: a series of captures of the scene under a known perspective (including captured images, and the internal and external parameters corresponding to each image), The intermediate 3D reconstruction process is not required, and the image under the new perspective is synthesized only based on the internal reference of the pose and the image. Under the Nerf-based representation method, the three-dimensional space is represented as a set of learnable and continuous radiation fields. After learning from the input perspective + position, density + color is obtained.



  Point-E


Although Rerf-based correlation generative models have made great progress in text-to-3D related tasks, most of the methods usually require multiple GPU hours to generate a sample. This is in stark contrast to state-of-the-art generative image models, which can generate samples in seconds or minutes. In 2022, OpenAI proposed Point-E's unique 3D point cloud generation method, which can generate 3D models in only 1-2 minutes on a single GPU. Point-E first generates a single synthetic view using a text-to-image diffusion model, and then uses a diffusion model to produce a 3D point cloud conditioned on the generated image. While Point-E is still inferior to state-of-the-art methods in terms of sample quality, it is 1 to 2 orders of magnitude faster than state-of-the-art methods in sample generation.


Code address: https://github.com/openai/point-e


  Shap-E


OpenAI released an upgraded model Shap-E again. Compared with the point cloud-based explicit generation model Point-E, Shap-E directly generates the parameters of the implicit function to render the texture and neural radiation field, and the convergence speed is faster. Better sample quality achieved in high-dimensional multi-representation output space! Shap-E is a conditional generative model for generating 3D assets. Unlike recent 3D generative models that can only generate a single output representation, Shap-E directly generates implicit function parameters that can be presented as textured meshes and neural radiation fields. Shap-E has two stages: first, an encoder is trained to deterministically map 3D assets into the parameters of the implicit function; second, a conditional diffusion model is trained on the encoder output. When Shap-E is trained on large paired 3D and text datasets, the resulting model is capable of generating complex and diverse 3D assets in seconds. Compared with Point-E, an explicit generative model on point clouds, Shap-E converges faster and achieves better sample quality despite modeling a higher-dimensional multi-representation output space.


git: https://github.com/openai/shap-e/tree/main



  DreamFusion



DreamFusion is a Text-to-3D model proposed by Google. The general idea is to generate 3D views from multiple perspectives through a 2D generation model (such as Imagen), and then reconstruct it with NeRF. There is a "chicken and egg" problem: if there is no well-trained NeRF, the image spit out by Imagen will have no consistency between the perspectives; and there is no consistent multi-view image, and a good one cannot be obtained. NeRF. So the author thought of a method similar to GAN, and NeRF and Imagen iterated back and forth. The advantage is that the diversity is relatively strong, and the problem is also obvious, because it needs to iterate back and forth 15,000 times on both sides, and it takes 1.5 hours to train on 4 TPUv4s to generate a model.


DreamFusion: 3D NeRF and 2D generative model iterative optimization back and forth
https://github.com/ashawkey/stable-dreamfusion

  Magic3D


In November 2022, the Magic3D model proposed by Nvidia proposed a two-step optimization strategy based on DreamFusion: first, a low-resolution, simple-rendered hash grid 3D model was generated using a diffusion model similar to DreamFusion, and then Higher-quality rendering of 3D models using methods similar to traditional computer graphics.


Compared with DreamFusion, the 3D model generated by Magic3D model has higher resolution and better rendering effect, and the generation efficiency has also been significantly improved. Since the rendering method of the Magic3D model is closely related to traditional computer graphics, and the generated results can be directly viewed in standard image software, the Magic3D model can be better connected with the traditional 3D generation work. In view of various advantages, the Magic3D model already has the capability base for industrial application.


After the Magic3D model, more 3D generative models have been proposed by the academic circles and the industry, and more in-depth discussions have been carried out in terms of generation quality, generation efficiency, hardware requirements, and scene applications, and they also have obvious advantages and disadvantages.



model practice


The local machine configuration is as follows:

  1. Graphics card: Nvidia 3060 12G

  2. CPU:Intel  I9-13900KF

  3. Memory: 64GB


  Shap-E


We deployed the Shap-E model locally through cuda, and used Jupyter Notebook for code testing. After testing, the average time to generate a 3D model was 5 minutes, but the details and quality of the generated model were poor.
git: https://github.com/openai/shap-e/tree/main

DEMO

Prompt :A  shark



Prompt:“A beautiful girl in a long dress”



  AvatarCLIP


我们基于顶会论文在本地部署了AvatarCLIP,AvatarCLIP是一个基于Zero-shot的文本驱动的三维数字人模型与动作生成器。训练一个精模至少需要10h以上的时间,生成的模型包含基础的人体骨骼,可以通过mixamo平台绑定骨骼获取不错的动画效果,但是近看面部、手部等处细节比较差。项目地址:https://hongfz16.github.io/projects/AvatarCLIP.html



Prompt:a 3d rendering of a XX in unreal engine


生成一个宇航员的前1/5过程:

产出的梅西模型在mixamo平台绑定骨骼后的动画效果:


业界的应用场景


  文本生成平面贴图


  1. barium.ai(地址:https://unity.com/cn)

  2. spline.design(地址:https://spline.design/)

  3. Maliang(地址:https://www.bilibili.com/video/BV1A24y1x7vD/)


  根据几何(mesh)在 UV 空间生成贴图(“AI 画贴图”)


  1. Meshy.ai (地址:https://www.meshy.ai/)

  2. Polyhive.ai(地址:https://polyhive.ai/)


  文本直接生成 3D 模型


目前还没有真正公测的产品


aigc生成3D模型目前存在的一些问题


  1. 数据质量问题:由于采集数据的不准确或缺失,可能导致3D模型中存在缺陷、错位或其他问题。

  2. 计算机性能问题:在生成复杂的3D模型时,需要大量的计算和存储资源,计算机性能不足可能导致生成效果差。

  3. 纹理映射问题:生成的3D模型需要进行纹理映射,但是在现实世界中很难找到完美匹配的纹理图像,这可能导致纹理不自然或者出现缝隙等问题。

  4. 模型解释问题:生成的3D模型需要能够被人理解和识别,但是可能存在歧义或者难以解释的地方,导致使用者无法充分利用模型。


参考资料

  1. https://www.zhihu.com/search?type=content&q=DreamFusion
  2. Taichi NeRF (下): 关于 3D AIGC 的务实探讨(地址:https://zhuanlan.zhihu.com/p/613679756)
  3. Taichi NeRF(上):不写 CUDA 也能开发、部署 Instant NGP
  4. 详解神经渲染算法NeRF及其发展(地址:https://zhuanlan.zhihu.com/p/612102573)
  5. https://github.com/awesome-NeRF/awesome-NeRF

团队介绍

我们是大淘宝技术-手猫技术-营销&导购团队,我作为一支专注于手机天猫创新的商业化及导购场景探索的团队,我们团队紧密依托淘天集团强大的互联网背景,致力于为手机天猫带来更高效、更具创新性的技术支持和商业化的导购场景。

我们的团队成员来自不同的技术领域和营销导购领域,拥有丰富的技术经验和营销经验。我们不断探索并实践新的技术,创新商业化的导购场景,并将这些创新技术应用于手机天猫业务中,提高了用户体验和平台运营效率。

作为一支技术创新和商业化的团队,我们致力于为手机天猫带来更广阔的商业化空间和更高效的技术支持,赢得了用户和客户的高度评价和认可。

我们团队一直秉承“技术领先、用户至上”的理念,不断探索创新、提升技术水平,为手机天猫的导购场景和商业化发展做出重要贡献。

¤  拓展阅读  ¤

3DXR技术 |  终端技术 |  音视频技术
服务端技术  |  技术质量 |  数据算法


本文分享自微信公众号 - 大淘宝技术(AlibabaMTT)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

微软官宣:Visual Studio for Mac 退役 中国开发者团队创建的编程语言:MoonBit(月兔) C++ 之父 Bjarne Stroustrup 分享人生建议 Linus 也反感乱七八糟的缩写,什么 TM 的叫 "GenPD" Rust 1.72.0 发布,未来支持版本最低为 Windows 10 文心一言面向全社会开放 WordPress 推出 “百年计划” 微软不讲武德,用“恶意弹窗”提示用户弃用 Google 高级、函数式、解释型、动态编程语言:Crumb 青语言 V1.0 正式发布
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4662964/blog/10106137