(2023, 3D scene generator Infinigen) Infinitely realistic worlds using procedural generation

Infinite PhotorealisticWorlds using Procedural Generation

Official account: EDPJ

Table of contents

0. Summary

1 Introduction

2. Related work

3. Method

4. Experiment 

reference

S. Summary

S.1 Main idea

S.2 method

S.3 Scene Generation


0. Summary

We introduce Infinigen, a procedural generator of realistic 3D scenes of the natural world. Infinigen is completely procedural: every asset, from shape to texture, is generated from scratch using random mathematical rules, using no external sources and allowing infinite variations and combinations. Infinigen provides extensive coverage of natural world objects and scenes, including plants, animals, terrain, and natural phenomena such as fire, clouds, rain, and snow. Infinigen can be used to generate an infinite variety of training data for a wide range of computer vision tasks, including object detection, semantic segmentation, optical flow, and 3D reconstruction. We expect Infinigen to be a useful resource for computer vision research and beyond. Visit infinigen.org for videos, code, and pre-generated data.

1 Introduction

Data, especially labeled data at scale, has been a key driver of advances in computer vision. At the same time, data is also a major challenge, as high-quality data is still lacking for many important vision tasks. This is especially true for 3D vision, since it is difficult to obtain accurate 3D ground truth labels for real images.

Synthetic data from computer graphics is a promising solution to this data challenge. Synthetic data can be generated indefinitely with high-quality labels. Synthetic data has been used in a wide range of tasks, with notable success in 3D vision, where models trained zero-shot on synthetic data can perform well on real images.

Despite its promise, synthetic data is still much less used in computer vision than real images. We hypothesize that a key reason is the limited diversity of 3D assets: for synthetic data to be most useful, it needs to capture the variety and complexity of the real world, but existing freely synthetic datasets are mostly limited to a fairly narrow set of objects and shapes, typically man-made objects in driving scenes or indoor environments.

In this work, we seek to substantially expand the coverage of synthetic data, especially for objects and scenes from the natural world. We introduce Infinigen, a procedural generator of realistic 3D scenes of the natural world. Infinigen is unique compared to existing synthetic data sources in that it combines the following features :

  • Procedural: Infinigen is not a finite collection of 3D assets or compositing images; rather, it is a generator that can create an infinite number of different combinations of shapes, textures, materials, and scenes. Every asset, from shape to texture, is fully procedural, generated from scratch with randomized mathematical rules that allow infinite variations and combinations. This distinguishes it from datasets or dataset generators that rely on external assets.
  • Diversity: Infinigen has extensive coverage of objects and scenes in the natural world, including plants, animals, terrain, and natural phenomena such as fire, clouds, rain, and snow.
  • Realism: Infinigen creates highly realistic 3D scenes. It procedurally generates not only coarse structures, but also fine details of geometry and textures, resulting in a high degree of realism.
  • Realistic Geometry: Unlike video game assets, which often use texture maps to fake geometric details (for example, surfaces that look bumpy but are actually flat), all geometric details in Infinigen are real. This ensures accurate geometric ground truth for 3D reconstruction tasks.
  • Free and Open Source : Infinigen is built on the free and open source graphics tool Blender. Infinigen's code is freely distributed under the GPL license, the same as Blender. Anyone is free to use Infinigen with unlimited assets and renders.

Infinigen focuses on the natural world for two reasons.

  • First, many applications require accurate perception of natural objects, including geological surveys, drone navigation, ecological monitoring, rescue robots, and agricultural automation, but existing synthetic datasets have limited coverage of the natural world.
  • Second, we assume that the natural world alone is sufficient to pre-train a strong "base model"—the human visual system evolved entirely in the natural world; exposure to man-made objects may be unnecessary.

Infinigen is useful in many ways.

  • It can serve as an unlimited training data generator for a variety of computer vision tasks, including object detection, semantic segmentation, pose estimation, 3D reconstruction, view synthesis, and video generation.
  • Since users have access to all procedural rules and parameters behind each 3D scene, Infinigen can be easily customized to generate a variety of task-specific realistic values.
  • Infinigen can also be used as a 3D asset generator that can be used to build simulated environments for training physical robots and virtual entity agents.
  • The same 3D assets can also be used for 3D printing, game development, virtual reality, filmmaking and general content creation.

We built Infinigen on top of Blender, a graphics system that provides many useful primitives for procedural generation. Using these primitives, we design and implement a library of procedural rules to cover a wide range of natural objects and scenes. In addition, we have developed utilities that facilitate the creation of procedural rules and enable all Blender users (including non-programmers) to contribute; these utilities include a translator that automatically converts Blender node graphs (often used by Blender artists to Intuitive visual representation of program rules) into Python code. We also develop utilities to render synthetic images and extract common ground-truth labels, including depth, occlusion boundaries, surface normals, optical flow, object categories, bounding boxes ( bounding boxes) and instance segmentation. Building Infinigen involved a lot of software engineering: the latest major branch of the Infinigen codebase consists of 40,485 lines of code.

In this paper, we describe our procedural system in detail. We also conduct experiments to verify the quality of the generated synthetic data; our experiments show that data from Infinigen is indeed useful, especially for bridging the gap in coverage of natural objects. Finally, we provide an analysis of computational costs, including a detailed analysis of the generation process.

We expect Infinigen to be a useful resource for computer vision research and beyond. In future work, we intend to make Infinigen an active project that will expand to cover almost everything in the visual world through open source collaboration with the entire community. 

2. Related work

Synthetic data from computer graphics has been used in computer vision for a wide range of tasks. We refer readers to [65] for a comprehensive survey. Below we categorize existing work according to application domain, generation method, and accessibility. Table 1 provides a detailed comparison.

application field . Comprehensive datasets or dataset generators have been developed to cover various domains.

  • The built environment has been covered by the greatest amount of existing work, especially interior and urban scenes. An important source of synthetic data for the built environment is the simulation platform for embedded artificial intelligence, for example, AI2-THOR, Habitat, BEHAVIOR, SAPIEN, RLBench, CARLA.
  • Some datasets, such as TartanAir and Sintel, contain a mixture of built and natural environments.
  • There also exist datasets, e.g., FlyingThings, FallingThings [86] and Kubric [24], which do not render real scenes but scatter (mainly artificial) objects on simple backgrounds.
  • Synthetic humans is another important application area, generating high-quality synthetic data to understand faces, poses, and activities. 
  • Some datasets focus on objects, rather than entire scenes, to serve object-centric tasks such as non-rigid reconstruction, view synthesis, and 6D pose.

We focus on natural objects and natural scenes, which have limited coverage in existing work. Although natural objects do appear in many existing datasets (e.g. urban driving), they are mostly located in the periphery and have limited diversity. 

generate method . Most synthetic datasets are constructed by using static libraries of 3D assets, either externally sourced or produced in-house. The downside of static libraries is that synthetic data is more prone to overfitting.

  • Procedural generation involves some existing datasets or generators, but is limited in scope. Procedural generation is only available for object arrangements or subsets of objects, for example, only buildings and roads without cars.
  • Infinigen, by contrast, is completely procedural, from shape to texture, from macro structure to micro detail, without relying on any external resources.

accessibility . A synthetic dataset or generator is most useful if it has maximum accessibility, i.e. it provides free access to assets and code, and has minimal usage restrictions. However, few existing works are maximally accessible. Rendered images are often provided, but the underlying 3D assets are not available, free, or have significant usage restrictions. Also, the code generated by the program (if any) is usually not available.

Infinigen is designed for maximum accessibility. Its code is available under the GPL license. Anyone is free to use Infinigen to generate unlimited assets.

3. Method

Procedural generation . Procedural generation refers to the creation of data through common rules and simulators. An artist can manually create the structure of a tree by eye, while a procedural system creates infinite trees by encoding their structure and general growth. Developing procedural rules is a form of world modeling using a compact mathematical language.

Blender Basics . We primarily developed the rules of the program using Blender, an open source 3D modeling software that provides various primitives and utilities. Blender represents a scene as a hierarchy of pose objects. Users modify this representation by transforming objects, adding primitives, and editing meshes. Blender offers import/export of the most common 3D file formats. Finally, all operations in Blender can be automated using its Python API or by examining its open source code.

For more complex operations, Blender provides an intuitive node graph interface. Instead of editing shader code directly to define materials, artists edit shader nodes to combine primitives into photorealistic materials. Likewise, a Geometry node is defined as a mesh that uses nodes representing operations such as Poisson disk sampling, Mesh Boolean, Squeeze, etc. The resulting geometric node tree is a generalized parametric CAD model that produces a unique 3D object for each combination of its input parameters. These tools are intuitive and widely used by 3D artists.

Although we use Blender heavily, not all procedural modeling is done using node graphs; a significant portion of our procedural generation is done outside of, and only loosely interacts with, Blender.

Node Transpiler (Node Transpiler) . As part of Infinigen, we developed a new set of tools to speed up our procedural modeling. A notable example is our Node Transpiler, which automates the process of converting node graphs into Python code, as shown in Figure 3. The generated code is more general and allows us to randomize the graph structure rather than just the input parameters. This tool makes node graphs more expressive and allows easy integration with other program rules developed directly in Python or C++. It also allows non-programmers to contribute Python code to Infinigen by making node graphs. See Appendix E for details.

Builder subsystem . Infinigen is organized into generators, which are probabilistic programs, each dedicated to generating a subclass of assets (for example, mountains or fish). Each has a set of high-level parameters (for example, the overall height of the hill), reflecting user-controllable external degrees of freedom. By default, we randomly sample these parameters from a distribution tuned to reflect the natural world, requiring no user input. However, users can also override any parameter using our Python API for fine-grained control over data generation. 

Every probabilistic procedure involves many additional internal low-level degrees of freedom (e.g. the height of each point on a hill). Randomizing the internal and external degrees of freedom results in a distribution of assets from which we can sample infinitely generated. Table 2 summarizes the number of human-interpretable degrees of freedom in Infinigen, with the caveat that these numbers may be overestimated since not all parameters are perfectly independent. Note that it is difficult to quantify the internal degrees of freedom, so the external degrees of freedom serve as a lower bound on the total degrees of freedom of our system.

Material generator . We provide 50 procedural material generators (Fig. 5). Each consists of a random shader (specifying color and reflectivity) and a local geometry generator (generating corresponding fine geometric details). 

The ability to produce accurate true geometry is a key feature of our system. This prevents the use of many common graphics techniques, such as bump mapping and Phong interpolation. Both manipulate face normals to give the illusion of detailed geometric textures, but do so in a way that cannot be represented as a mesh. Likewise, artists often rely on image textures or alpha channel masking to give the illusion that high-resolution meshes don't exist. All such shortcuts are excluded from our system. See Figure 4 for an illustrative example of this distinction.

Terrain generator . We generated the terrain using SDF elements from fractal noise and a simulator (Fig. 6). We evaluate them as grids using marching cubes. We generate boulders by repeated extrusion (extrusion), and use Blender's built-in plugin to generate small stones. We used FLIP to simulate dynamic fluids (Fig. 7), the Nishita sky model to simulate sun/sky light, and Blender's particle system to simulate weather. 

Plant and underwater object generators .

  • We simulate tree growth through random walks and space colonization, resulting in a system that covers a variety of trees, shrubs and even some cacti (Fig. 9).
  • We provide generators for a variety of corals using Differential Growth, Laplacian Growth, and Reaction-Diffusion (Figure 10).
  • We use a geometric node graph to generate leaves (Figure 8), flowers, seaweed, kelp, molluscs, and jellyfish. 

Surface Scatter Generators . Some natural environments are characterized by dense coverage of smaller objects. To this end, we provide several decentralized generators that combine one or more existing assets in a dense layer (Fig. 11). In the forest floor example, we generate fallen trees by procedurally breaking whole trees from our tree system. 

Due to space limitations, all the specific implementation details mentioned above can be found in Appendix G. 

Creature generator . The genome of each organism is represented as a tree-like data structure (Fig. 12a). This mirrors the topology of real creatures, whose limbs do not form closed loops. Nodes contain part parameters and edges specify part attachments. We provide generators for 5 types of real biological genomes, as shown in Figure 12. We can also randomly combine biological parts, or insert similar genomes. See Appendix G.6 for details. 

Each part generator is either a translated node graph or a non-uniform rational basis spline (NURBS). The NURBS parameter space is high-dimensional, so we randomize the NURBS parameters under a lofting-inspired decomposition consisting of deviations from the central curve. To tune the random distribution, we modeled 30 example heads and bodies and made sure our distribution supported them. 

Dynamic resolution scaling . With a fixed camera position, we evaluate procedural assets at a precise level of detail such that each face is < 1 pixel in size when rendered. The process is shown in Figure 14 . For most assets, this requires evaluating parametric curves for a given pixel size, or using Blender's built-in subdivision or remeshing. For terrain, we perform moving cubes on SDF points in spherical coordinates. For densely dispersed assets (including all assets in Figure 11), we use instancing—that is, we generate a fixed number of assets of each type and reuse them in the scene with random transformations. Even though this effort has not been optimized, full scenes still have an average of 16 million polygons. 

Image rendering and ground truth extraction . We render the images using Cycles, Blender's physically based path tracing renderer. We provide code to extract ground truth for common tasks, as shown in Figure 2. 

Cycles individually traces the photons of light to accurately simulate diffuse and specular reflections, transparent refraction, and volumetric effects. We render at 1920 × 1080 resolution using 10,000 random samples per pixel, which is the standard for blender artists, and ensures that there is almost no sampling noise in the final image.

Previous datasets rely on Blender's built-in rendering passes for dense ground truth. However, these render passes are by-products of the rendering process and are not used to train the neural network. Specifically, they are often incorrect due to translucent surfaces, volumetric effects, motion blur, focus blur, or sampling noise. See Appendix C.2 for examples of these questions. 

Instead, we provide OpenGL code to extract surface normals, depth, and occlusion bounds directly from the mesh without relying on a blender. Besides accuracy, this solution has many advantages. Users can exclude objects that are not relevant to their task (such as water, clouds, or any other objects), whether they are rendered or not. Blender also apparently doesn't support many annotations such as occlusion borders. Finally, our implementation is modular, and we anticipate that users will generate task-specific ground truths not covered above through simple extensions to our code base.

running time . We benchmarked Infinigen with 1000 independent trials on 2 Intel(R) Xeon(R) Silver 4114 @ 2.20GHz CPUs and 1 NVidia-GPU. The wall time to generate a pair of 1080p images is 3.5 hours. The statistics are shown in Figure 15.

4. Experiment 

To evaluate Infinigen, we generate 30K image pairs along with the ground truth for corrected stereo matching. We train RAFTStereo on these images from scratch and compare the Middlebury validation (Table 3) and test set (Figure 16) results. See the appendix for qualitative results on nature photos in the wild.

reference

Raistrick A, Lipson L, Ma Z, et al. Infinite Photorealistic Worlds using Procedural Generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 12630-12641.

[65] Sergey I Nikolenko. Synthetic data for deep learning, volume 174. Springer, 2021.

S. Summary

S.1 Main idea

The author proposes Infinigen, a procedural generator for realistic 3D scenes in the natural world: it is a generator that can create an infinite variety of different shapes, textures, materials and scene combinations, and every asset (asset) from shape to texture is Generated from scratch by random mathematical rules, using no external sources and allowing infinite variations and combinations.

S.2 method

Procedural Generation: Create data through common rules and simulators. An artist can manually create the structure of a tree by eye, while a procedural system creates infinite trees by encoding their structure and general growth.

Infinigen is built on top of Blender.

Blender : A graphics system that provides many useful primitives for procedural generation. Using these primitives, the authors design and implement a library of procedural rules to cover a wide range of natural objects and scenes.

Blender represents a scene as a hierarchy of pose objects. Users modify this representation by transforming objects, adding primitives, and editing meshes.

For more complex operations, Blender provides an intuitive node graph interface. Instead of editing shader code directly to define materials, artists edit shader nodes to combine primitives into photorealistic materials.

Node Transpiler: The author has also developed utilities that facilitate the creation of procedural rules and enable all Blender users (including non-programmers) to contribute; these utilities include a Transpiler that automatically converts Blender nodes to Graphs (intuitive visual representations of procedural rules often used by Blender artists) are converted to Python code. The generated code is more general and allows us to randomize the graph structure rather than just the input parameters. This tool makes node graphs more expressive and allows easy integration with other program rules developed directly in Python or C++.

S.3 Scene Generation

The scene generation process is shown in the figure above. There are four steps in total:

  • Composite scene layout: mainly the background of the scene
  • Generates all necessary assets while displaying the color of each mesh face
  • Replace with procedural material
  • Rendering for photorealistic images

Guess you like

Origin blog.csdn.net/qq_44681809/article/details/131346207