[Fine turn] DynamicFusion: real-time reconstruction of the scene and tracking non-rigid

DynamicFusion: reconstruction and real-time tracking of non-rigid scene

Online rarely see someone DynamicFusion detailed interpretation, so plan to translate text, beginner SLAM, errors and omissions also pointed out that hope.

Summary

By fusing RGBD depth image consumer cameras scanned the first time we realized the real-time system to support dense SLAM non-rigid deformation scene. DynamicFusion can reconstruct the scene geometry, while continuing to estimate an intensive six-dimensional playground, which will be mapped to the geometry of the estimated real-time frame. Like KinectFusion as the use of various means, our system can continue to generate noise reduction, refinement, the final results of complete reconstruction, while real-time display model updates. Since no other template or a scene model a priori, this method is widely used in the scene and moving objects.

text

Traditional 3D scanning includes separate capture and offline processing stage, need to be very careful planning to ensure that the scanning process to cover all surfaces. In fact, avoid empty is very difficult and requires multiple iterative acquisition, reconstruction, identify empty, and re-scan the missing part to ensure the integrity of the model. Like KinectFusion same time three-dimensional reconstruction of the system allows the user to continue to see the results of the reconstruction area to be scanned and automatically identify, is a great progress. KinectFusion led to a series of studies on improving the tracking robustness, scalability can rebuild the space size.

However, as all the traditional SLAM and dense reconstruction system, the most basic assumptions behind KinectFusion is observed scene is almost static. In this paper we deal with the core issue is how to KinectFusion to promote reconstruction and real-time tracking of dynamic, non-rigid scenes . To this end, we propose DynamicFusion, a disintegration method of seeking based flow field (volumetric flow filed) of - converting fluid flow field at each time state of the scene to a fixed frame specification. For example, a moving person, sports figures are reversed, wherein each body posture will be mapped onto the first frame. After mapping, the scene is in fact rigid and can be updated using standard methods KinectFusion low-noise high-quality reconstruction results. This gradual reconstruction results of noise reduction may be reversed FIG turn into real-time frame; specification point of each frame are converted to a position corresponding to the real-time frame.

Definition of a dynamic moving scene specification "rigid" space is not intuitive. A key contribution of our work is for non-rigid transformation and integration, our approach can keep the best features of the original volume scanning integration proposed by rigid scene. Start scene motion reversal point is to integrate all the observations are on a fixed frame, which can be efficiently implemented by separately calculating reversed FIG. In this transformation, each point projected to a standardized real-time line of sight of the camera frame. Because (for rigid scene) optimization of parameters and only about sight, we can optimize the results generalize to non-rigid scenarios.

Our second contribution is the key to efficiently represent warpage (volumetric warp) and real-time computing. In fact, even a relatively low resolution, 256 ^ 3 voxels molded in order to calculate the frame rate corresponding to the one hundred million also required transformation parameters. We rely on the combination of adaptive solving sparse volume and layered functions, and innovative algorithms on consumer-level hardware for real-time calculation invention. All in all, DynamicFusion was the first to be able to use a single depth camera system for real-time dynamic scene dense reconstruction.

The remaining part of the structure of the thesis is as follows: After discussing related work, we give an overview of DynamicFusion in the second quarter, given the technical details in the third quarter. Section IV shows the experimental results, summarized in section V is given.

Although not in real time, no templates, non-rigid reconstruction of previous work, there are two types of similar work: 1) non-rigid real-time tracking algorithm, 2) offline dynamic reconstruction.

Real-time tracking of non-rigid template . Numerous studies of non-rigid track are concentrated in parts of the body, for this purpose will be prior learning or special shape and movement design template corresponding artificial. Several best work to achieve high-precision real-time capture of the face, hands, body or some complete clear the object.

Other technologies directly track and forming more general mesh model. [Article 12] can follow a low-resolution shape template has been static, and constantly update the appearance of high-frequency geometric details of the original model does not have. Recently, [37 papers] use GPU acceleration optimized to achieve a real-time version of similar technology impressive. In this system, the dense surface model of the object is captured in the static, generate a template for their real-time tracking process. The template generation and tracking system can only handle so that isolating geometry reconstruction phase completely stationary objects and scenes, not rebuild an object like a child or a pet that will tamper with.

Offline continuous tracking and dynamic scene reconstruction. In the off-line technical literature non-rigid tracking and reconstruction of more and more. Some researchers ICP algorithm can be extended to handle small non-rigid deformation. Scanning the pair of 3D shape and alignment of large deformation on the actual use of the improved reduction of the deformation model parameters. In particular, FIG embedded deformation transform using a set of basis functions sparsely sampled, these functions can be efficiently and spatially interpolated densely. Stiff reconstruction work is also relevant. Non-rigid body may be achieved using known mixing system noise moving structure. Other work in conjunction with a non-rigid grid template tracking, temporary noise reduction and completion, but did not get a single continuous representation of the scene.

And our job is to work closest template-independent technology. An interesting method in the non-rigid template-free non-aligned scan regarded as a rigid geometric observation 4D and 4D shape reconstruction. [Papers 30, 29] fixed topology reconstruction aligned by the paired scan geometry. [Papers 24] using the temporal solid incompressible flow, which leads to a watertight reconstruction, and efficient processing of a noisy input point cloud data. [Papers 28] Introduction of the drawing animation, the animation drawing further estimating the shape and each frame corresponding to deformation by developing intensive matching scheme that matches the sparse landmark seed. Recent use of multiple cameras fixed kinect work through dense track, as well as the integration of all of the depth map data to a new direction proposed non-rigid reconstruction of a greater scale distance function representation.

Compared with the real-time requirements, all of these technologies are three to four orders of magnitude more than the time required.

2.DynamicFusion概览 DynamicFusion Overview

The non-rigid deformation DynamicFusion scene is decomposed into a potential geometric surface, and reconstructing rigid specifications space \ ({\ BF {S}} \ {in \} ^ of Bbb. 3} {R & lt \) ; each frame member Alice (volumetric warp field) field of the curved surface is converted to real-time frame. Three core algorithm components of the system in order to perform each new frame arrives depth:

1. Field warpage estimated model parameters to a frame (Section 3.3)

2. warp field estimated depth map of the current frame fused to the space specification (Section 3.2)

3. Adjust the warp field to capture the structure of the newly added geometric features (Section 3.4)

3. Technical details Technical Details

Our various parts DynamicFusion now described in detail. First, we introduce Warping field of dense bodies. This allows us to model the deformation of each frame within the scene. Warp field is the key to the reconstruction and expansion of the traditional SLAM system static spatial representation, which is estimated so that the non-rigid tracking and scene reconstruction possible.

3.1. Dense non-rigid warping field Dense Non-rigid Warp Field

We warpage dynamic scenes with motion field, which provides a six-dimensional mapping for each point: \ ({\ CAL {}} W is: {\ BF {S}} \ {mapsto \ SE BF {}} (. 3 ) \) . Despite the dense three-dimensional field of translation is sufficient geometric description of timing, we see the use of translation and rotation to better represent real-world objects and reconstruction will track the results. For each specification point \ (V_c \ {in \ BF {S}} \) , \ ({\ BF {T}} _ {LC} = {\ CAL {}} W is (V_c) \) the point from Standard space conversion to real-time deformation of non-rigid frame of reference.

Since we need to estimate each new frame to warp functions \ ({\ CAL W is {T}} _ \) , representing optimization must be efficient. One possible solution is to voxels densely sampled, such as resolution to TSDF (truncated signed distance function cutoff distance function) represents a geometric representation quantized \ ({\ bf {SE} } (3) \) field. However, even at relatively low resolution ( \ (256 ^ 3 \) ) of a typical TSDF voxel reconstruction, the number of parameters needed to solve each frame is \ (6 \ times256 ^ 3 \) , is just about the estimated original KinectFusion a rigid transform algorithm 1000 times. Obviously, warping function fully dense parameterization is not feasible. In reality, the surface smoothness tends to move in space, so we can use a set of sparse transform based interpolation is defined by warp functions dense body. For the interpolation computing performance and quality considerations, we use dual quaternion mixed interpolation method (dual-quaternion blending, DQB) define warp functions:

\({\cal{W}}(x_c)\equiv SE3({\bf{DQB}}(x_c))\)

Unit weighted dual quaternion converting the average value is \ ({\ bf {DQB} } (x_c) \ equiv \ frac {\ sum_ {k \ in N (x_c)} {\ bf {w}} _ k (x_c) {\ hat {\ bf {q }}} _ kc} {\ | {\ sum_ {k \ in N (x_c)} {\ bf {w}} _ k (x_c) {\ hat {\ bf {q}}} } _kc \ |} \) , each unit dual quaternion \ (\ Hat {\ BF {Q}} _ {KC} \ {in \ {R & lt of Bbb. 8}} ^ \) . \ (N (x) \) is a point \ (X \) a \ (K \) nearest neighbors transformation point, \ ({\ BF {W}} _ K \) is described radial affected \ ({\ Bbb {R}} ^ 3 \ mapsto {\ Bbb {R}} \) of the weight function, \ (SE3 (.) \) represents the quaternion converting back \ ({\ bf {SE} } (3) \ ) transformation matrix. Warp field at time \ (T \) state when \ ({\ cal {W} } _ t \) may be defined as a set of deformation nodes \ ({\ cal {N} } ^ t_ {warp} = \ {dg_v , dg_w, SE3 dg_ {} \} _ T \) . The first \ (I \) nodes ( \ (. 1 ... n-I = \) ) in the position space specification is\ (DG ^ I_V \ in {\ of Bbb {R & lt}} ^. 3 \) , the corresponding transformation matrix \ ({\ BF {T}} _ {IC} = DG ^ I_ {SE3} \) , \ (DG ^ I_W \) is the control of the deformation of the scope of the weight, \ (W_i (x_c) = exp (\ FRAC {- \ | DG ^ I_V-x_c \ | ^ 2} {2 (DG ^ I_W) ^ 2}) \) . Radial weight \ (dg ^ i_w \) to ensure that the deformation is sampled node can affect other nodes in its vicinity, the node matrix related to the degree of sparse sampling. Since the warp functions for all supported spatial transformation defines a rigid body, the position of the space and any associated direction will be converted, for example, surface points (V_c \) \ and a direction \ (N_C \) , conversion in the following manner the real-time frame \ ((V_T,. 1) ^ T = {\ CAL {W is}} _ T (V_c) (V ^ T_c,. 1) ^ T \) , \ ((N_T, 0) ^ T = {\ CAL { }} _ T W is (V_c) (n-T_c ^, 0) ^ T \) . It should be noted that spatial zoom function may also be represented by such a warpage, because the expansion and compression spaces are moved by the converging and diverging along the direction indicated by adjacent dots. Finally, we can also extract a common rigid body transformation of volume in all points, such as the camera moves. Thus the introduction of the implicit conversion from real camera model into warp space \ ({\ BF {T}} _ {LW} \) , and the resulting complete composite warp functions warp field:

\({\cal{W}}_t(x_c)={\bf{T}}_{lw}SE3({\bf{DQB}}(x_c))\)

3.2. Dense non-rigid surface fusion Dense Non-Rigid Surface Fusion

Guess you like

Origin www.cnblogs.com/parachutes/p/11698356.html