Interpretation of PHORHUM (CVPR2022)-3D reconstruction paper


Paper: "Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing"
github: not yet open source

Innovation

This paper proposes PHORHUM, which can complete 3D human body reconstruction using only RGB images, and perform surface coloring for the first time; only using
3D supervision is not enough to generate high-quality color reconstruction, the author introduces patch-based rendering loss, so that visible parts can complete the color Reconstruction, realistic color estimation of invisible parts;
previous work was limited by feature geometry, reflectivity, and lighting effects, and the end-to-end method in this paper can effectively decouple these influencing factors;
for geometry and color reconstruction, the author uses different evaluation indicators to verify this method;

The author proposes an end-to-end solution to predict the appearance and geometric structure; the appearance is modeled as surface color reflectance, but there is no scene-specific lighting effect, so the author's method also predicts scene lighting information, which is used to recolor the estimated scan, so that Putting additional people into the existing scene becomes more realistic; the author found that only using sparse 3D information is not enough to produce satisfactory results, so a rendering loss is introduced to improve the appearance quality; the author's contribution is summarized as follows
:

  1. Propose a human body digitization end-to-end training system;
  2. Introduced albedo and shadow information for the first time;
  3. Rendering loss improves visual effects;
  4. Results are more accurate and detailed;

algorithm

insert image description here
The PHORHUM algorithm is shown in Figure 2, using a single image III do 3D modelingSSS , such as formula 1,
insert image description here
fff indicates the use of neural networks for signed distance calculation (SDF),
feature networkGGG generates the input graph atxxx spatial alignment featurezx z_xzx, such as formula 2,
insert image description here
fff generates signed distanceddd (the distance between the true mesh and the estimated surface) and the color reflectanceaaa , such as formula 3,
insert image description here
is to decouple shading and surface color, shading networksss is used to estimate surface shading, such as formula 4,nx n_xnxis the gradient of the estimated distance, insert image description here
lll is the scene lighting model;
insert image description here
finally coloring,c = s ∘ ac = s \circ ac=sa , where∘ \circ display element-wise multiplication;

loss function

geometric color loss

L g L_g LgIndicates that the distance between the true mesh and the estimated surface is 0, and the distance gradient is consistent with the true mesh gradient (surface normal), OOO comes fromthe truth mesh M;

insert image description here
L l L_lLlIndicates the extra sample FF around the supervised mesh surfaceF 's sign,lll means inside or outside the surface,ϕ \phiϕ represents the sigmoid activation function, such as formula 6, wherekkk can be learned;
insert image description here
L e L_eLeIndicates the geometric regularization term, and normalizes the distance gradient of the predicted surface point to 1, as shown in formula 7;
insert image description here
L a L_aLaIndicates the aa calculated by the supervisory color reflectance and mesh textureThe distance between a , supervising the samples around the surface and on the surface,for the samples around the surface, the true value is replaced by the nearest point on the surface;
insert image description here

rendering loss

Starting from the camera position, calculate the sign of the minimum distance value along the ray, such as formula 9, where rrr is a ray,ooo is the camera position
insert image description here
Find the sampleR s falling on the surface surface ( σ < 0.5 and l = 0 ) R_s(\sigma<0.5 \ and\ l=0)Rs( p<0.5 and l=0 ) , for the subsetR s R_sRsUse spherical tracing to locate surface points, use ttt round intersection pointx ^ \hat xx^ can be differentiable to network parameters, such as formula 10;
insert image description here
x ^ f \hat x^fx^f means front intersection,x ^ b \hat x^bx^b represents the back, the intersection of the back; use Lr to strengthen and correct the surface color, such as formula 11
insert image description here
usingL c L_cLcPerform supervised coloring, such as formula 12, ppp is pictureIIThe corresponding pixel value in I ;
insert image description here
L s L_sLs: The authors find that using the truth value n ˉ \bar nnˉ and reflectancea ˉ \bar aaˉSupervision pictureIIColoring all pixels in I also has a role, such as formula 13
insert image description here

data set

insert image description here
As shown in Figure 3, the author uses 217 scans. By enhancing the color of 100 scans and enhancing the pose of 38 scans, the final generated data set contains about 19W data. Each picture depicts a random HDRI (high dynamic range images) background and random The rendering scan of the placement position;

Additional details

1. Feature extraction network GGG is a 13-layer U-Net;
2. Geometric networkfff is 8 fully connected layers of 512 dimensions;
3, shading networksss consists of three 256-dimensional fully connected layers;

experiment

Table 2 shows the front and back IS scores of 3D reconstruction;
insert image description here
Table 3 shows the comparison with other single-view reconstruction schemes and the ablation experiments whether to use rendering loss and shading estimation;
insert image description here
Figure 6 shows the use of rendering loss to improve refractive index estimation, only using sparse 3D The supervised color is unnatural;
insert image description here
Figure 4 shows the quality comparison between PHORHUM and SOTA methods;
insert image description here
Figure 5 shows the comparison between PHORHUM and SOTA methods and the real value effect;
insert image description here
Figure 7 shows the synthetic image, and the estimated light intensity is applied to the reconstructed target
insert image description here

in conclusion

limit

Figure 8 shows the limitations of PHORHUM. When the input of human clothes or posture deviates too much from the distribution of the dataset used for training, the effect is relatively poor, so the distribution of the dataset should be consistent;
insert image description here

application

Virtual fitting, AR, VR, human-computer interaction, etc.

in conclusion

PHORHUM can complete the 3D reconstruction of a clothed human body by inputting a human body photo. It is the first method to jointly calculate 3D geometry, surface reflectance and shadow for end-to-end model training, where rendering loss is crucial to the influence of surface color;
Subsequent authors will study semi-supervised rendering methods, based on various human data sets, in which 3D true value is not available;

Guess you like

Origin blog.csdn.net/qq_41994006/article/details/126395370