Neural Point-Based Graphics

Neural Point-Based Graphics: Neural Point-Based Graphics
Abstract: This paper proposes a point-based approach to modeling the appearance of real scenes. The method uses the raw point cloud as a geometric representation of the scene, and augments each point with a learnable neural descriptor that encodes local geometry and appearance . A deep rendering network is learned in parallel with the descriptors, making it possible to obtain new viewpoints of a scene by passing a rasterization of a point cloud from a new viewpoint through the network. Input rasterization uses the learned descriptors as point pseudocolors.

Rasterize:

https://zhuanlan.zhihu.com/p/544088415?utm_id=0
https://www.jianshu.com/p/54fe91a946e2

Introduction:
The paper combines ideas from image-based rendering, point-based graphics, and neural rendering into one simple approach. The method uses the raw point cloud as the scene geometry representation, thus eliminating the need for surface estimation and meshing. Similar to other neural rendering methods, a deep convolutional neural network is used to generate photorealistic renderings from new viewpoints. Facilitates rendering realism by estimating latent vectors (neural descriptors) that describe the geometric and photometric properties of the data. These descriptors are learned directly from the data, and this learning occurs in conjunction with the learning of the rendering network, which together learns neural descriptors of surface elements.

1、METHOD

The system pipeline is shown in the figure below. Given a point cloud P with a neural descriptor D and a camera parameter C, we use the descriptor as a false color and use a z-buffer to rasterize the points at several resolutions. We then pass the rasterization through the U-netlike rendering network to obtain the final image. Our model adapts to new scenes by optimizing the parameters of the rendering network and backpropagating the neural descriptors of the perceptual loss function.
insert image description here
In simple terms, the input is a point cloud, including the camera pose and the initial embedded neural descriptor, which is trained to obtain a new perspective image in the scene.

2、Rendering

This part of the paper explains how rendering of new views is performed given a point cloud with a learned neural descriptor and a learned rendering network.
Suppose the point cloud P = {p 1 , p 2 ,...,p N }, each point corresponds to an M-dimensional descriptor, then the set is recorded as D = {d 1 , d 2 ,..., d N }, the new perspective C Camera pose (extrinsic and extrinsic parameters). Suppose the target image has a pixel grid of size W×H and its viewpoint is located at point p0.
The rendering process first projects the points onto the target view , using the descriptor as pseudocolor, and then uses a rendering network to convert the pseudocolor image into a photorealistic RGB image. Create an M-channel raw image S(P,D,C) of size W×H, and for each point p i projected to (x,y), set S(P,D,C)[[x ] ,[y]] = di (where [a] denotes the nearest integer of a ∈ R). Since many points may be projected onto the same pixel, a z-Buffer (depth buffer) is used to remove occluded points. Use a buffer to record the depth of z for each pixel, and finally select the closest point, regardless of the following points.
However, the lack of topological information in point clouds leads to hole-prone representations, where there is a problem of projecting through the foreground into the image, so that points from occluded surfaces and backgrounds can be seen through the front surface (bleeding). This problem is traditionally solved by splatting, that is, using a set of points in a certain neighborhood as output to reconstruct a continuous geometric surface. This paper proposes an alternative rendering scheme that does not depend on the choice of disc radius.

Progressive rendering Progressive rendering

The paper proposes to use multi-scale (progressive) rendering to render a point cloud T times to canvas pyramids with different spatial resolutions. By performing the simple point cloud projection described above, image sequences S[1], S[2] are obtained. . . S[T], where the i-th image has size W/ 2t × H/2t . It can be seen that the highest resolution original image S[1] contains the largest amount of details, but the bleeding phenomenon is also serious. The lowest resolution image S[T] has coarse geometric details but the least bleeding, while the intermediate raw images S[2], , S[T−1] achieve a different detailing-bleeding tradeoff. Finally, a rendering network R θ with learnable parameters θ is used to map all original images to three-channel RGB images I:
insert image description here
The rendering network is based on the convolutional U-Net architecture and has gated convolutions for better handling of latent sparse input. The encoder part of U-Net contains several downsampling layers interleaved with convolutions and nonlinearities. Then, the original image S[i] is concatenated to the first block of the U-Net encoder at the corresponding resolution. This gradual (coarse-to-fine) mechanism is reminiscent of textures and many other coarse-to-fine/varying level-of-detail rendering algorithms in computer graphics. Rendering networks provide a mechanism for implicit LOD selection.

2、Model creation

Suppose K different scenarios are available during fitting. For the kth scene, the point cloud P k and the set of L k training ground truth RGB images I k = {I k, 1 , I k, 2 ,, I k, Lk }, known camera parameters {C k, 1 , C k , 2 , . . . . Ck , Lk } expressions. Then, the fitting objective L corresponds to the loss between the rendered and ground-truth RGB images: insert image description here
D k denotes the set of neural descriptors for the point cloud of the k-th scene, and ∆ denotes the loss between the two images (ground-truth and rendered). mismatch. Neural descriptors are updated by (1) backpropagation of loss derivatives with respect to S(P,D,C) to di.
Thus, while we can perform the fitting on a single scene, the results for new viewpoints tend to be better when the rendering network is fitted to multiple scenes of similar type. In the experimental validation, unless otherwise stated, the rendering network is fitted in a two-stage process. First pre-train the rendering network on a certain type of scene family. Second, adapt (fine-tune) the rendering network to the new scene . At this stage, the learning process (2) starts with the zero descriptor value of the new scene and the weights of the pretrained rendering network

A neural point-based approach for complex scene modeling is proposed. Similar to classical point-based methods, 3D points are used as modeling primitives. Each point is associated with a local descriptor, which contains information about the local geometry and appearance. A rendering network that rasterizes points into realistic views while taking learned descriptors as input point pseudocolors. Thus, it is demonstrated that point clouds can be successfully used as geometric proxies for neural rendering, while deep rendering networks can gracefully handle missing information about connectivity as well as geometric noise and holes.

Reference:
https://zhuanlan.zhihu.com/p/158945862

Guess you like

Origin blog.csdn.net/qq_44708206/article/details/129648068