Article directory
Paper: "Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing"
github: not yet open source
Innovation
This paper proposes PHORHUM, which can complete 3D human body reconstruction using only RGB images, and perform surface coloring for the first time; only using
3D supervision is not enough to generate high-quality color reconstruction, the author introduces patch-based rendering loss, so that visible parts can complete the color Reconstruction, realistic color estimation of invisible parts;
previous work was limited by feature geometry, reflectivity, and lighting effects, and the end-to-end method in this paper can effectively decouple these influencing factors;
for geometry and color reconstruction, the author uses different evaluation indicators to verify this method;
The author proposes an end-to-end solution to predict the appearance and geometric structure; the appearance is modeled as surface color reflectance, but there is no scene-specific lighting effect, so the author's method also predicts scene lighting information, which is used to recolor the estimated scan, so that Putting additional people into the existing scene becomes more realistic; the author found that only using sparse 3D information is not enough to produce satisfactory results, so a rendering loss is introduced to improve the appearance quality; the author's contribution is summarized as follows
:
- Propose a human body digitization end-to-end training system;
- Introduced albedo and shadow information for the first time;
- Rendering loss improves visual effects;
- Results are more accurate and detailed;
algorithm
The PHORHUM algorithm is shown in Figure 2, using a single image III do 3D modelingSSS , such as formula 1,
fff indicates the use of neural networks for signed distance calculation (SDF),
feature networkGGG generates the input graph atxxx spatial alignment featurezx z_xzx, such as formula 2,
fff generates signed distanceddd (the distance between the true mesh and the estimated surface) and the color reflectanceaaa , such as formula 3,
is to decouple shading and surface color, shading networksss is used to estimate surface shading, such as formula 4,nx n_xnxis the gradient of the estimated distance,
lll is the scene lighting model;
finally coloring,c = s ∘ ac = s \circ ac=s∘a , where∘ \circ∘ display element-wise multiplication;
loss function
geometric color loss
L g L_g LgIndicates that the distance between the true mesh and the estimated surface is 0, and the distance gradient is consistent with the true mesh gradient (surface normal), OOO comes fromthe truth mesh M;
L l L_lLlIndicates the extra sample FF around the supervised mesh surfaceF 's sign,lll means inside or outside the surface,ϕ \phiϕ represents the sigmoid activation function, such as formula 6, wherekkk can be learned;
L e L_eLeIndicates the geometric regularization term, and normalizes the distance gradient of the predicted surface point to 1, as shown in formula 7;
L a L_aLaIndicates the aa calculated by the supervisory color reflectance and mesh textureThe distance between a , supervising the samples around the surface and on the surface,for the samples around the surface, the true value is replaced by the nearest point on the surface;
rendering loss
Starting from the camera position, calculate the sign of the minimum distance value along the ray, such as formula 9, where rrr is a ray,ooo is the camera position
Find the sampleR s falling on the surface surface ( σ < 0.5 and l = 0 ) R_s(\sigma<0.5 \ and\ l=0)Rs( p<0.5 and l=0 ) , for the subsetR s R_sRsUse spherical tracing to locate surface points, use ttt round intersection pointx ^ \hat xx^ can be differentiable to network parameters, such as formula 10;
x ^ f \hat x^fx^f means front intersection,x ^ b \hat x^bx^b represents the back, the intersection of the back; use Lr to strengthen and correct the surface color, such as formula 11
usingL c L_cLcPerform supervised coloring, such as formula 12, ppp is pictureIIThe corresponding pixel value in I ;
L s L_sLs: The authors find that using the truth value n ˉ \bar nnˉ and reflectancea ˉ \bar aaˉSupervision pictureIIColoring all pixels in I also has a role, such as formula 13
data set
As shown in Figure 3, the author uses 217 scans. By enhancing the color of 100 scans and enhancing the pose of 38 scans, the final generated data set contains about 19W data. Each picture depicts a random HDRI (high dynamic range images) background and random The rendering scan of the placement position;
Additional details
1. Feature extraction network GGG is a 13-layer U-Net;
2. Geometric networkfff is 8 fully connected layers of 512 dimensions;
3, shading networksss consists of three 256-dimensional fully connected layers;
experiment
Table 2 shows the front and back IS scores of 3D reconstruction;
Table 3 shows the comparison with other single-view reconstruction schemes and the ablation experiments whether to use rendering loss and shading estimation;
Figure 6 shows the use of rendering loss to improve refractive index estimation, only using sparse 3D The supervised color is unnatural;
Figure 4 shows the quality comparison between PHORHUM and SOTA methods;
Figure 5 shows the comparison between PHORHUM and SOTA methods and the real value effect;
Figure 7 shows the synthetic image, and the estimated light intensity is applied to the reconstructed target
in conclusion
limit
Figure 8 shows the limitations of PHORHUM. When the input of human clothes or posture deviates too much from the distribution of the dataset used for training, the effect is relatively poor, so the distribution of the dataset should be consistent;
application
Virtual fitting, AR, VR, human-computer interaction, etc.
in conclusion
PHORHUM can complete the 3D reconstruction of a clothed human body by inputting a human body photo. It is the first method to jointly calculate 3D geometry, surface reflectance and shadow for end-to-end model training, where rendering loss is crucial to the influence of surface color;
Subsequent authors will study semi-supervised rendering methods, based on various human data sets, in which 3D true value is not available;