Nvdiffrast high-performance differentiable rendering development kit

NSDT tool recommendationThree.js AI texture development kit - YOLO synthetic data generator - GLTF/GLB online editing - 3D model format online conversion - Programmable 3D scene editor -  REVIT export 3D model plug-in - 3D model semantic search engine

Nvdiffrast is a PyTorch/TensorFlow library that provides high-performance primitive operations for differentiable rasterization-based rendering. with previous libraries such as redner, SoftRas or  Compared to PyTorch3D), it is a lower-level library - nvdiffrast has no built-in camera model, lighting/material model, etc. Instead, the provided operations encapsulate only the most graphics-centric steps of modern hardware graphics pipelines: rasterization, interpolation, texturing, and antialiasing. All these operations (and their gradients) are GPU accelerated, either via CUDA or via the hardware graphics pipeline.

Example using nvdiffrast

1. Install nvdiffrast

The minimum requirements for installing nvdiffrast are as follows:

  • Linux or Windows operating system.
  • 64-bit Python 3.6.
  • PyTorch (recommended) 1.6 or TensorFlow 1.14. TensorFlow 2.x is not currently supported.
  • High-end NVIDIA GPUs, NVIDIA drivers, CUDA 10.2 toolkit.

To download nvdiffrast, go torepository to download the .zip file, or use git to clone the repository:

git clone https://github.com/NVlabs/nvdiffrast

1.1 Linux installation nvdiffrast

We recommend running nvdifrast on Docker. To build a Docker image with nvdiffrast and PyTorch 1.6 installed, run:

./run_sample.sh --build-container

We recommend using Ubuntu as some Linux distributions may not provide all required packages. Installation issues on CentOS have been reported, but herethere is a case that claims it can be successful.

To try some of the provided code examples, run:

./run_sample.sh ./samples/torch/cube.py --resolution 32

Alternatively, if you have taken care of all dependencies (see the attached Dockerfile for reference), you can install nvdiffrast in your local Python site-packages by running the following command in the project root:

pip install .

Additionally, you can add the repository root to PYTHONPATH.

1.2 Windows installation nvdiffrast

On Windows, nvdiffrast requires an external compiler to compile the CUDA kernel. Development was done using Microsoft Visual Studio 2017 Professional Edition, which works with PyTorch and TensorFlow versions of nvdiffrast. VS 2019 Professional Edition has also been confirmed to work with the PyTorch version of nvdiffrast. Versions of VS other than Professional (including Community Edition) should work, but have not been tested.

If the compiler binary (cl.exe) is not found in PATH, nvdifrast will heuristically search for it. If that fails, you may need to manually add

"C:\Program Files (x86)\Microsoft Visual Studio\...\...\VC\Auxiliary\Build\vcvars64.bat"

The exact path depends on the version and version of VS you have installed.

To install nvdiffrast in local site-packages, run:

# Ninja is required run-time to build PyTorch extensions
pip install ninja

# Run at the root of the repository to install nvdiffrast
pip install .

Likewise, you can add the repository root to PYTHONPATH.

2. Primitive operations

Nvdiffrast provides four differentiable rendering primitives: rasterization, interpolation, texturing, and antialiasing. The operation of the primitive is described here in a platform-independent manner. Platform-specific documentation can be found in the API reference section.

In this section, for clarity, we ignore the mini-batch axis and assume the mini-batch size is 1. However, all operations support small batches, more on this later.

2.1 Rasterization

The rasterization operation takes as input a tensor of vertex positions and a tensor of triples of vertex indexes specifying the triangle. Vertex positions are specified in clip space, that is, after model view and projection transformations. It is the user's responsibility to perform these conversions. In clipping space, a view frustum is a cube in homogeneous coordinates, where x/w, y/w, z/w are all between -1 and +1.

The output of the rasterization operation is a 4-channel float32 image containing tuples (u, v, z/w, triangle_id) for each pixel. The values ​​u and v are the barycenter coordinates within the triangle: the first vertex in the triplet of vertex indexes is (u, v) = (1, 0), the second vertex is (u, v) = (0, 1), and the third The vertices are (u, v) = (0, 0) . The normalized depth value z/w is later used by the anti-aliasing operation to infer occlusion relationships between triangles, and it does not propagate gradients to the vertex position input. Field triangle_id is the triangle index, offset 1. Pixels that do not rasterize the triangle will receive zeros in all channels.

The rasterizer is point sampled, i.e. the geometry is not smoothed, blurred or partially transparent in any way, in contrast to some previous differentiable rasterizers. The contents of a pixel always represent a single surface point located on the nearest surface visible along a ray passing through the center of the pixel.

Point sampling coverage does not produce vertex position gradients associated with occlusion and visibility effects. This is because movement of vertices does not change coverage in a continuous way - triangles are either rasterized into pixels or not. In nvdiffrast, occlusion/visibility related gradients are generated during an anti-aliasing operation, which typically occurs at the end of the rendering pipeline:

左:[..., 0:2] = barycentrics (u, v) 。 右:[..., 3] = triangle_id。

The image above shows the output of the rasterizer. The image on the left shows the contents of channels 0 and 1, the barycentric coordinates, shown in red and green respectively. The image on the right shows channel 3, the triangle ID, using a random color for each triangle.

2.2 Interpolation

Depending on the shading and lighting model, meshes typically have a number of properties specified at their vertices. These can include, for example, texture coordinates, vertex normals, reflection vectors, and material parameters. The purpose of the interpolation operation is to transfer these properties specified by the vertices to image space. In the hardware graphics pipeline, this happens automatically between the vertex shader and the pixel shader. Interpolation operations in nvdiffrast support any number of attributes.

Specifically, the interpolation operation takes as input a buffer generated by the rasterizer and a buffer of specified vertex attributes. The output is an image-sized buffer with as many channels as attributes. Pixels of unrendered triangles will contain all zeros in the output:

Texture coordinates (s,t)

Above is an example of interpolated texture coordinates visualized in the red and green channels. The image is created using the output of the rasterizer from the previous step and an attribute buffer containing texture coordinates.

2.3 Texturing

Texture sampling is a fundamental operation in the hardware graphics pipeline, and it is the same in nvdifrast. The basic principle is simple: given a vector of texture coordinates for each pixel, get a value from the texture and put it into the output. In nvdiffrast, textures may have any number of channels, which is useful when you want to learn an abstract field (for example, to serve as input to a neural network downstream of a pipeline).

When sampling textures, you usually need to use some form of filtering. Most previous differentiable rasterizers support at most bilinear filtering, where samples at texture coordinates between texel centers are linearly interpolated from the four closest texels. While this works well when viewing the texture up close, it produces severely aliased results when viewing the texture from a distance. To avoid this, the texture needs to be pre-filtered before sampling to remove frequencies that are too high compared to the sampling density.

Nvdiffrast supports pre-filtered texture sampling based on mipmapping. The required mipmap level can be generated internally in the texture operation, so the user only needs to specify the highest resolution (base level) texture. Currently the highest quality filtering mode is isotropic trilinear filtering. The lack of anisotropic filtering means that textures viewed at steep angles will not alias in any direction, but may appear blurry in non-extruded directions.

In addition to standard 2D textures, texture sampling operations also support cube maps. Cubemaps are addressed using 3D texture coordinates, and transitions between cubemap faces are properly filtered so there are no visible seams. Cubemaps support trilinear filtering similar to 2D textures. There is no explicit support for 1D textures, but they can be simulated efficiently using 1×n textures. All filtering, texture mapping, etc. apply to such textures just like they do to real 1D textures. 3D volume textures are not currently supported.

Left: Spot texture. Center: Texture sampling operation output. Right: Replace the background with white.

The middle image above shows the result of texture sampling using the interpolated texture coordinates from the previous step. Why is the background pink? The texture coordinates (s, t) read zero at these pixels, but this is a perfectly valid point to sample the texture. Spot's texture (left) has a pink color at its (0, 0) corners, so all pixels in the background get that color through the texture sampling operation. On the right, we replace the color of the empty pixels with white. Here's one way to do it in PyTorch:

img_right = torch.where(rast_out[..., 3:] > 0, img_left, torch.tensor(1.0).cuda())

where rast_out is the output of the rasterization operation. We simply test whether the triangle_id field (i.e. channel 3 of the rasterizer output) is greater than zero, indicating that a triangle was rendered in that pixel. If so, we get the color from the texture image, otherwise we take the constant 1.0.

2.4 Anti-aliasing

The last primitive operation in nvdiffrast is antialiasing. Based on the geometric input (vertex positions and triangles), it will smooth the discontinuities of the contour edges in the given image. Smoothing is based on a local approximation of coverage - the approximate integral over a pixel is calculated based on the precise location of the associated edge and the point sample color at the center of the pixel.

In this case, a silhouette is any edge that is connected to only one triangle or that connects two triangles so that one triangle folds behind the other. Specifically, this includes the contour of the background and the contour of another surface, unlike some previous methods (DIB-R) which only The former is supported.

It's worth discussing why we might want to go through this trouble to slightly improve the image. For example, if we are trying to match a real-world photo, slightly smoother edges may not match the captured image better than a jagged image. However, this is not the point of anti-aliasing operations—the real goal is to obtain gradients. Vertex position relative to occlusion, visibility, and coverage.

Remember, everything so far in the rendering pipeline is point sampled. In particular, the coverage (i.e. which triangle is rasterized to which pixel) varies discontinuously across rasterization operations.

This is why previous differentiable rasterizers applied non-standard image synthesis models with blur and transparency: there must be something that makes the coverage continuous with respect to vertex positions, if we wish to optimize vertex positions, camera, based on an image space loss Location etc. In nvdiffrast we point-sample everything so that we know that each pixel corresponds to a well-defined surface point. This allows us to perform arbitrary shading calculations without having to worry about things like accidentally blurring texture coordinates on outlines, or properties mysteriously leaning toward the background color when approaching the edge of an object. Only at the ends of the pipe, anti-aliasing operations ensure that movement of the vertex positions results in a continuous change of the contour.

Anti-aliasing operations support any number of channels in an image for anti-aliasing. So if your rendering pipeline generates an abstract representation and feeds it to a neural network for further processing, then this is not a problem.

Left: Anti-aliased image. Center: Close-up view before anti-aliasing. Right: Close-up view after anti-aliasing

The image on the left above shows the resulting image after performing anti-aliasing. The effect is very small—some border pixels become less jagged, as shown in the close-up.

It's worth noting that not all border pixels are anti-aliased, as shown in the left image below. This is because the accuracy of the anti-aliasing operation in nvdiffrast depends on the rendering size of the triangles: since we only store knowledge of one surface point per pixel, anti-aliasing can only be done if the triangle containing the actual geometric outline edge is visible in the image. Sawtooth. The example image is rendered at a very low resolution and the triangles are small compared to the pixels. Therefore, triangles can easily get lost between pixels.

This causes the anti-aliasing to look patchy, and the gradients provided by the anti-aliasing become noisier when edge triangles are lost. Therefore, it is recommended to render the image at a resolution where the triangles are large enough to be visible in the image at least most of the time:

Left: Anti-aliased pixels, native resolution. Right: Rendered at 4×4 higher resolution and downsampled

The left image above shows which pixels in the example were modified by the anti-aliasing operation. On the right, we perform the rendering at a higher resolution of 4×4 and downsample the final image back to the original size. This results in a more accurate positional gradient relative to the contour, so if you suspect that the positional gradient is too noisy, you may want to try simply increasing the resolution at which rasterization and anti-aliasing is done.

For shape optimization purposes, it might be perfectly fine for the left side to look sparse. Even if the gradients are somewhat sparse, they will still point in the right direction, and you'll need to use some kind of shape regularization anyway, which will greatly increase the tolerance to noisy shape gradients.

3. Beyond primitive operations

Rendering images with nvdiffrast is easy, but there are some practical things you need to consider. The topics in this section explain the operation and use of nvdiffrast in more detail, hopefully helping you avoid any potential misunderstandings and pitfalls.

3.1 Coordinate system

Nvdifrast follows OpenGL's coordinate system and other conventions. This is partly because we support OpenGL to speed up rasterization operations, but mostly because there is a standard to follow.

  • In the OpenGL convention, the perspective projection matrix (such as that implemented in utils.projection() in our example and glFrustum() in OpenGL) transforms the view space into z is considered to be increasing toward the observer. However, after multiplying by the perspective projection matrix, the homogeneous clip space coordinates z/w increase away from the viewer. Therefore, larger depth values ​​in the rasterizer output tensor also correspond to surfaces that are farther away from the observer.
  • The memory ordering of image data in OpenGL and in nvdifrast is bottom-up. This means that row 0 of the tensor containing the image is the bottom row of the texture/image, as opposed to the more common scanline order. If you want to keep the image data in traditional top-down order in your code, but logically make it the right way up in nvdiffrast, you'll need to vertically flip the image when crossing a boundary.
  • For 2D textures, the coordinate origin (s, t) = (0, 0) is located in the lower left corner, s increases to the right, and t increases upward. When specifying faces of a cubemap texture, the orientation will vary between faces, but nvdifrast also follows OpenGL conventions.

As a suggestion, it is best to have a grasp of the coordinate system and orientation used in the program. When a problem occurs, it's much better to identify and fix the root cause than randomly flipping coordinates, images, buffers, and matrices until the immediate problem goes away.

3.2 Geometry and Mini-Batches: Range Mode vs. Instance Mode

As mentioned previously, all operations in nvdifrast effectively support mini-batch axes. Related to this, we support two ways of representing geometry: range mode and instanced mode. If you want to render different meshes in each mini-batch index, you need to use range mode. However, if you are rendering the same mesh in each mini-batch index, but with potentially different viewpoints, vertex positions, attributes, textures, etc., then instanced mode will be more convenient.

In range mode, specifies the triangle index triplet as a 2D tensor of shape [num_triangles, 3] and the vertex position as shape [num_vertices, 4] 2D tensor. In addition to this, the rasterization operation requires an additional 2D range tensor of shape [minibatch_size, 2] , where each row specifies a starting index and counts among the triangle tensors. Therefore, the rasterizer renders triangles within the specified range into each mini-batch index of the output tensor. If you have multiple meshes, you should put them all into vertex and triangle tensors, and then select which mesh to rasterize into each mini-batch index via the contents of the range tensor. Attribute tensors in interpolation operations are treated the same as position, and in range mode it must have shape [num_vertices, num_attributes].

In instance mode, the topology of the grid will be shared for each mini-batch index. The triangle tensor is still a 2D tensor of shape [num_triangles, 3] , but the vertex positions are specified using a 3D tensor of shape  [minibatch_size, num_vertices, 4] . For 3D vertex position tensors, the rasterizer does not require a range tensor input, but takes the mini-batch size from the first dimension of the vertex position tensor. The same triangle is rendered to each mini-batch index, but the vertex positions are taken from the corresponding slice of the vertex position tensor. In this mode, the attribute tensor in the interpolation must be a 3D tensor similar to the position tensor, i.e. of shape [minibatch_size, num_vertices, num_attributes]. However, you can provide an attribute tensor with a mini-batch size of 1 and it will be broadcast throughout the mini-batch.

3.3 Image space differentiation

We sidestepped a very basic issue in our description of texture operations above. In order to determine the appropriate pre-filtering of texture samples, we need to know the density of the samples. But how can we know the sampling density when only one surface point is known per pixel?

The solution is to track the image space derivatives of everything that leads to the texture sampling operation. These are different from the gradients used in backward passes, although they both involve differentials! Consider the center of gravity produced by the rasterization operation (u, v). They change to some extent when moved horizontally or vertically in the image plane. If we express the image space coordinates as  (X, Y), the image space derivatives of the center of gravity will be ∂u/∂X∂u/∂Y∂v/∂X and  ∂v/∂Y. We can organize them into a 2×2 Jacobian matrix describing the local relationship between (u, v) and (X, Y) . This matrix is ​​usually different for each pixel. For the purposes of image spatial derivatives, the units of X and Y are pixels. Therefore, ∂u/∂X is a local approximation of how u changes when moved horizontally by a distance of one pixel, and so on.

Once we know how the center of gravity changes with respect to pixel position, the interpolation operation can use this to determine how the attribute changes with pixel position. When the attributes are used as texture coordinates, we can therefore know how the texture sampling position (in texture space) changes as it moves within the pixel (i.e. a locally linear approximation is achieved). This texture footprint tells us the proportion at which the texture should be pre-filtered. More practically, it tells us which mipmap level to use when sampling the texture.

In nvdiffrast, the rasterization operation outputs the image space derivatives of the center of gravity in an auxiliary 4-channel output tensor, in order (∂u/∂X, ∂u/∂Y, ∂v/∂X, ∂v/∂Y) from channels 0 to 3. Interpolation operations can take this auxiliary tensor as input and compute the image space derivatives of any interpolated attribute set. Finally, the texture sampling operation can use the image space derivative of the texture coordinates to determine the amount of prefiltering.

There is nothing magical about these image space derivatives. They are tensors, such as the texture coordinates themselves, they propagate gradients backwards, etc. For example, if you want to artificially blur or sharpen a texture while sampling, you can simply multiply the tensor with the image space derivative of the texture coordinates ∂{s, t}/∂{X, Y} before feeding it into the texture sampling operation. as a scalar value. This scales the texture footprint, thereby adjusting the amount of pre-filtering. If your loss function prefers different levels of sharpness, the multiplier will receive a non-zero gradient.

Update: Starting with version 0.2.1, texture sampling operations also support a separate mip-level bias input, which is better suited for this specific task, but the gist is the same.

One might wonder, wouldn't it be easier to determine the texture footprint based solely on texture coordinates in neighboring pixels, and skip all this differential crap? In simple cases the answer is yes, but contouring, occlusion and discontinuous texture parameterization will make this approach rather unreliable in practice. Computing image spatial derivatives analytically keeps everything point-like, local, and well-behaved.

It should be noted that calculating the gradient related to the image space derivative is somewhat complex and requires additional calculations. At the same time, they are generally not important for convergence of training/optimization. Therefore, primitive operations in nvdifrast provide the option to disable these gradient calculations. We're talking about things like ∂Loss/∂(∂{u, v}/∂{X, Y}) which may appear to be second order, but are not.

3.4 Mipmap and texture size

Prefiltered texture sampling mode requires mipmaps, which are downsampled versions of textures. The texture sampling operation can build these internally, or you can provide your own mipmap stack, but there are texture size constraints to consider.

When a mipmap is constructed internally, each mipmap level is constructed by averaging a 2×2 block of pixels from the previous level (or the texture itself for the first mipmap level). Therefore, the buffer size to be averaged must be divisible by 2 in both directions. There is one exception: an edge length of 1 is valid and will remain 1 during downsampling operations.

For example, a 32×32 texture will produce the following mipmap stack:

A 32×8 texture with both sides being powers of 2 but not equal would result in:

For texture sizes like this, everything works automatically and mipmaps are built as small as 1×1 pixel size. Therefore, if you wish to use pre-filtered texture sampling, you should scale the texture to a power of two dimensions, but not necessarily equal.

What about texture atlas? You might have an object whose texture is composed of individual patches, or a collection of texture meshes each with a unique texture. Suppose we have a texture atlas consisting of five 32×32 sub-images, i.e. a total size of 160×32 pixels. Now we can't compute mipmap levels all the way to 1×1 size, because there is a 5×1 mipmap that cannot be downsampled (because 5 is not an even number):

Scaling the atlas to 256×32 pixels would feel silly since the sub-images are perfectly sized, and downsampling the different sub-images together (which would happen after 5×1 resolution) would not make sense anyway . Therefore, texture sampling operations allow the user to specify the maximum number of mipmap levels to be constructed and used. In this case, setting max_mip_level=5 will stop at a 5×1 mipmap and prevent the error.

This is a deliberate design choice, nvdiffrast not only automatically stops at a mipmap size that cannot be downsampled, but also requires the user to specify a limit when the texture size is not a power of 2. The goal is to avoid bugs where pre-filtered texture sampling mysteriously doesn't work due to weird texture dimensions. It would be confusing if the 256×256 textures provided nice pre-filtered texture samples, the 255×255 textures suddenly had no pre-filtering at all, and the 254×254 textures had just a little bit of pre-filtering (one level), but nothing more Puzzled.

If you calculate your own mipmaps, their sizes must follow the scheme above. There is no need to always specify the mipmap to a 1×1 resolution, but the stack can end at any point and works equivalently to an internally constructed mipmap stack with a max_mip_level limit. Importantly, the gradients of user-supplied mipmaps are not automatically propagated to the underlying texture - naturally, since nvdiffrast knows nothing about their relationship. Instead, tensors at specified mip levels in a user-supplied mipmap stack will receive their own gradients.

3.5 Rasterization using CUDA and OpenGL

Starting with version 0.3.0, nvdifrast on PyTorch supports rasterization operations using CUDA or OpenGL. Earlier versions and Tensorflow bindings only support OpenGL.

When performing rasterization on OpenGL, we use the GPU's graphics pipeline to determine which triangles fall on which pixels. GPUs have very efficient hardware for this task—it's their raison d'être in the first place—so it makes sense to take advantage of it. Unfortunately, some computing environments are not designed with this in mind, making it difficult to get OpenGL to work correctly and interoperate cleanly with CUDA. On Windows, compatibility is generally good because the GPU drivers required to run CUDA also include OpenGL support. Linux is more complex because various drivers can be installed separately and there is no standardized way to access the hardware graphics pipeline.

Rasterization in CUDA almost reverses these considerations. Compatibility is obviously not an issue on any platform that supports CUDA. On the other hand, it is not trivial to implement the rasterization process correctly and efficiently on large-scale data-parallel programming models. The CUDA rasterizer in nvdiffrast follows the approach described in Laine and Karras' HPG 2011 research paper "High-Performance Software Rasterization on GPUs." Our code is based on the paper's publicly released CUDA kernel, with extensive modifications to support current hardware architectures and match the needs of nvdiffrast.

The CUDA rasterizer does not support output resolutions larger than 2048×2048, and both dimensions must be multiples of 8. Additionally, the number of triangles that can be rendered in a batch is limited to around 16 million. Subpixel precision is limited to 4 bits, and depth stripping is not as accurate as OpenGL. Memory consumption depends on many factors.

It is difficult to predict which rasterizer will provide better performance. For complex meshes and high resolutions, OpenGL will likely outperform the CUDA rasterizer, although it has some overhead that the CUDA rasterizer does not have. For simple grids and low resolutions, the CUDA rasterizer may be faster, but it also has its own overhead. Measuring actual data, target platform, and overall program performance is the only way to know for sure.

To run rasterization in CUDA, create a RasterizeCudaContext and provide it to the rasterize() operation. For OpenGL, use RasterizeGLContext instead. Simple!

3.6 Running on multiple GPUs

Nvdiffrast supports computation on multiple GPUs in PyTorch and TensorFlow. By convention in PyTorch, operations are always performed on the device where the input tensor resides. All GPU input tensors must reside on the same device, and it is no surprise that the output tensors will eventually reside on the same device. Additionally, rasterization operations require that their context be created for the correct device. In TensorFlow, the first time a rasterization operation is performed, a rasterizer context is automatically created on the device where the operation is being performed.

The remainder of this section applies only to the OpenGL rasterizer context. CUDA rasterizer contexts require no special considerations other than ensuring they are on the correct device.

On Windows, nvdiffrast implements OpenGL device selection in a way that it can be executed only once per process - once a context is created, all future contexts will be on the same GPU. Therefore, you cannot expect to use an OpenGL context to run rasterization operations on multiple GPUs in the same process. Attempting to do this will either result in a crash or severe performance loss. However, with PyTorch, computation is usually distributed across GPUs by launching a separate process for each GPU, so this is not a big issue. Note that any OpenGL context created in the same process (even something like a GUI window) will prevent future changes to the device. Therefore, if you want to run rasterization operations on a GPU other than the default GPU, be sure to create its OpenGL context before initializing any other OpenGL-supported libraries.

On Linux, everything works fine and you can create OpenGL rasterizer contexts on multiple devices within the same process.

Notes on torch.nn.DataParallel:

PyTorch provides the torch.nn.DataParallel wrapper class for splitting the execution of small batches across multiple threads. Unfortunately, this class is fundamentally incompatible with operations that rely on OpenGL, as it spawns a new set of threads on each call (at least as of PyTorch 1.9.0). Using previously created OpenGL contexts in these new threads, even if care is taken not to use the same context in multiple threads, will cause them to be migrated around, resulting in growing GPU memory usage and extremely low GPU utilization. Therefore, we recommend against using torch.nn.DataParallel for OpenGL context-dependent rasterization operations.

It is worth noting that the child processes generated by torch.nn.DistributedDataParallel are more persistent. Child processes must create their own OpenGL context as part of initialization, so they do not suffer from this problem.

3.7 Rendering multiple depth layers

Sometimes it is necessary to render a scene with partially transparent surfaces. In this case, it's not enough to just find the surfaces closest to the camera, since you might also need to know what's behind them. To this end, nvdiffrast supports depth peeling, which allows you to extract multiple closest surfaces for each pixel.

For depth peeling, we first rasterize the nearest surface as usual. We then perform a second rasterization using the same geometry, but this time we cull all previously rendered surface points at each pixel, effectively extracting the second closest depth layer. This can be repeated as many times as needed so that we can extract any number of depth layers. See the image below for an example of depth peeling results with each depth layer shaded and anti-aliased:

Left: First depth layer. Center: Second depth layer. Right: Third depth layer.

The deeply stripped API is based on the DepthPeeler object and its rasterize_next_layer methods that act as context managers. The first call to rasterize_next_layer is equivalent to calling the traditional rasterize function, and subsequent calls will report more depth levels. The parameters for rasterization are specified when instantiating the DepthPeeler object. Specifically, the code might look like this:

with nvdiffrast.torch.DepthPeeler(glctx, pos, tri, resolution) as peeler:
  for i in range(num_layers):
    rast, rast_db = peeler.rasterize_next_layer()
    (process or store the results)

If you end up extracting only the first depth layer, there is no performance penalty compared to a basic rasterization operation. In other words, the code above num_layers=1 runs as fast as calling rasterize once.

Only the PyTorch version of nvdiffrast supports deep stripping. For implementation reasons, depth stripping preserves the rasterizer context so that no other rasterization operations can be performed while stripping is in progress (i.e., within a with block). Therefore, you cannot initiate a nested deep stripping operation or call rasterize inside a with block unless using a different context.

For the sake of completeness, let's note the following small caveat:

Depth peeling relies on depth values ​​to distinguish surface points. So culling "previously rendered surface points" actually means culling all surface points that are the same or closer to the depth that was rendered into the pixel in the previous pass. This only matters if you have multiple layers of geometry at match depth - if your geometry consists of just two completely overlapping triangles, you'll see one of them on the first pass, but never on subsequent passes The other one won't be seen as it's at the exact depth it's already considered complete.

3.8 Differences between PyTorch and TensorFlow

Nvdifrast is available in PyTorch and TensorFlow 1.x; the latter may be changed to TensorFlow 2.x if demand arises. These frameworks operate slightly differently, which is reflected in their respective APIs. To simplify things a bit, in TensorFlow 1.x you can build a persistent graph with persistent nodes and run multiple batches of data through it. In PyTorch, there is no persistent graph or nodes, instead a new temporary graph is built for each batch of data and destroyed immediately afterwards. Therefore, operations also have no persistent state. There is torch.nn.Module abstraction for decorating operations with persistent state, but we don't use it.

Therefore, content that is part of the persistent state of an nvdifrast operation in TensorFlow must be stored in PyTorch by the user and provided to the operation as needed. In reality, this is a very small difference, amounting to only a few lines of code in most cases.

As an example, consider the OpenGL context used by rasterization operations. In order to use hardware accelerated rendering, an OpenGL context must be created and switched to before issuing OpenGL commands internally. Creating a context is an expensive operation, so we don't want to create and destroy the context every time the rasterization operation is called. In TensorFlow, a rasterization operation creates a context the first time it is executed and stores it in a persistent state for later reuse. In PyTorch, the user must create the context using a separate function call and provide it as a parameter to the rasterization operation.

Likewise, if you have constant textures and want to use pre-filtered texture sampling mode, the mipmap stack only needs to be calculated once. In TensorFlow, you can specify that the texture is constant, in which case the texture sampling operation only computes the mipmap stack the first time it is executed and stores it internally. In PyTorch, you can use a separate function call to calculate the mipmap stack once and then feed it to the texture sampling operation each time. If this is not done, the operation will internally calculate the mipmap stack and discard it later. If your texture changes on each iteration, which is exactly what you want, there's nothing wrong with having a constant texture, just a bit inefficient.

Finally, the same applies to something called topology hashing, which is used by anti-aliasing operations to identify underlying contour edges. Its contents only depend on the triangle tensors and not on the vertex positions, so if the topology is constant, this auxiliary structure only needs to be constructed once. As before, in TensorFlow this is handled internally, while in PyTorch a separate function is provided for offline builds.

3.9 Manual OpenGL context in PyTorch

First, please note that manually handling OpenGL context is a very minor optimization. Unless you have passionately analyzed and optimized your code, and your mission is to squeeze out every bit of performance possible, it will almost certainly not be relevant.

In TensorFlow, the only option is to let nvdifrast handle OpenGL context management internally. This is because TensorFlow uses multiple CPU threads under the hood, and the active OpenGL context is a thread-local resource.

PyTorch is not that unpredictable and stays in the same CPU thread by default (although things like torch.utils.data.DataLoader do call additional CPU threads). Therefore, nvdifrast allows the user to choose to handle OpenGL context switches in automatic or manual mode. Defaults to automatic mode, where the rasterization operation always sets/releases the context at the beginning/end of each execution, just like we do in TensorFlow. This ensures that the rasterizer will always use the context you provide, and the context doesn't stay alive so no one else can mess with it.

In manual mode, the user assumes responsibility for setting and releasing the OpenGL context. Most of the time, if you don't have any other libraries using OpenGL, just set the context once after you create it, and keep it set until the program exits. However, keep in mind that the active OpenGL context is a thread-local resource, so it needs to be set in the same CPU thread that uses it, and cannot be set in multiple CPU threads simultaneously.

4. nvdiffrast example

Nvdiffrast comes with a set of examples designed to support research papers. Each example has PyTorch and TensorFlow versions. Details such as command line parameters, logging formats, etc. may differ between versions, and PyTorch versions should generally be considered deterministic. The command line examples below are for the PyTorch version.

All PyTorch examples support choosing between CUDA and OpenGL rasterizer contexts. The default is to rasterize in CUDA and switch to OpenGL by specifying the command line option --opengl .

Enabling interactive display using the --display-interval parameter on Linux may fail when using OpenGL rasterization. This is because the interactive display window is displayed using OpenGL, and on Linux this conflicts with the internal OpenGL rasterization in nvdiffrast. Assuming OpenGL (which is used to display the window) is properly installed on the system, using a CUDA context should work. Our Dockerfile is set up to only support headless rendering, so the interactive results window cannot be displayed.

4.1 triangle.py

This is a minimal example that renders a triangle and saves the resulting image to a file (tri.png) in the current directory. Running this command should be the first step to verify that all settings are correct. Rendering is done using rasterization and interpolation operations, so getting the correct output image means that both OpenGL (if specified on the command line) and CUDA are working as expected behind the scenes.

This is the only example where you must specify --cuda or --opengl on the command line. Other examples use CUDA rasterization by default and only provide the --opengl option.

Command line example:

python triangle.py --cuda
python triangle.py --opengl

The desired output is as follows:

4.2 cube.py

In this example, we optimize the vertex positions and colors of a cubic mesh starting from a semi-random initialization state. This optimization is based on the loss of image space at very low resolutions (e.g. 4×4, 8×8 or 16×16 pixels). The goal of this example is to examine the geometric convergence rate when the triangle size is only a few pixels. It illustrates that the anti-aliasing operation, although approximate, can produce positional gradients good enough even at 4×4 resolution to guide optimization to the target.

Command line example:

python cube.py --resolution 16 --display-interval 10

The desired result is as follows:

Left: Interactive view. Right: Rendering pipeline.

The image above shows a live view of the example. The top row shows the low-resolution rendered image and the reference image from which the image space loss is calculated. The bottom row shows the current mesh (and color) and the reference mesh in high resolution so that convergence can be more easily seen visually.

In the pipeline diagram, green boxes represent nvdifrast operations, while blue boxes are other calculations. The red boxes are learned tensors, and the gray boxes are unlearned tensors or other data.

4.3 earth.py

The goal of this example is to compare texture convergence with and without pre-filtered texture sampling. Textures are learned based on an image-space loss of high-quality reference renderings at random orientations and random distances. When prefiltering is disabled, textures are not learned correctly due to aliasing which causes unstable gradient updates. This shows worse PSNR for textures compared to learning with pre-filtering enabled. See the paper for further discussion.

Command line example:

#No prefiltering, bilinear interpolation.
python earth.py --display-interval 10	

#Prefiltering enabled, trilinear interpolation.
python earth.py --display-interval 10 --mip	 

Left: Interactive view, pre-filtering disabled. Right: Rendering pipeline.

The interactive view shows the current texture mapped onto the mesh, with or without pre-filtered texture samples specified via command line arguments. In this example, no anti-aliasing is performed because we are not learning the vertex positions and therefore do not need the gradients associated with them.

4.4 envphong.py

A more complex shading model is used in this example compared to the previous vertex colors or pure textures. Here we learn the reflection environment map given a known mesh and the parameters of the Phong BRDF model. The optimization is based on the image space loss of reference rendering in random directions. The specular plus Phong BRDF shading model doesn't make sense physically, but it works as a fairly simple straw man that wouldn't have been possible with the previous differentiable rasterizers that bundled rasterization, shading, lighting, and texturing. The example also illustrates how to use cube mapping to represent learned textures in a spherical domain.

Command line example:

python envphong.py --display-interval 10

Left: Interactive view. Right: Rendering pipeline.

In the interactive view we see the rendering using the current environment map and Phong BRDF parameters, both of which were gradually improved during the optimization process.

4.5 pose.py

Pose fitting based on image space loss is a classic task in differentiable rendering. In this example, we use a simple cube with different colored sides to solve a pose optimization problem. We describe the optimization method in detail in the paper, but in short, it combines gradient-free greedy optimization in the initialization phase and gradient-based optimization in the fine-tuning phase.

Command line example:

python pose.py --display-interval 10

The expected result is as follows:

interactive view

The interactive view shows from left to right: target pose, best found pose, and current pose. When viewed in real time, the two stages of optimization are clearly visible. In the first phase, the best pose is updated intermittently when a better initialization is found. In the second stage, the solution converges smoothly to the target through gradient-based optimization.


Original link:Nvdiffrast differentiable rendering library - BimAnt

Guess you like

Origin blog.csdn.net/shebao3333/article/details/134917920