[Computer Vision|Face Modeling] Survey Report on 3D Face Reconstruction in the Deep Learning Era

This series of blog posts are notes for deep learning/computer vision papers, please indicate the source for reprinting

标题:3D Face Reconstruction in Deep Learning Era: A Survey

链接:3D Face Reconstruction in Deep Learning Era: A Survey - PubMed (nih.gov)

Summary

With the advent of deep learning and the wide application of graphics processing units, 3D face reconstruction has become the most fascinating subject of biometric identification. This paper explores various aspects of 3D face reconstruction techniques. Five techniques are discussed in the paper, namely

  • deep learning (DL, deep learning)
  • epipolar geometry (EG, epipolar geometry, epipolar geometry)
  • one-shot learning (OSL, single-shot learning, single-shot learning)
  • 3D morphable model (3DMM, 3D deformable model)
  • shape from shading methods (SFS, reconstruction based on shadow shape, restore depth from grayscale)

This paper provides an in-depth analysis of 3D face reconstruction using deep learning techniques. The performance analysis of different face reconstruction techniques is discussed from the perspective of software, hardware, advantages and disadvantages. The challenges and future development directions of 3D face reconstruction technology are also discussed.

1 Introduction

3D face reconstruction is a problem in biometric identification whose development speed has been accelerated by the advent of deep learning models. Many contributors to 3D face recognition research have made progress over the past five years (see Figure 1). Various applications such as re-enactment and voice-driven animation , facial manipulation , video dubbing , virtual makeup , projection mapping , facial aging , and face replacement have been developed [1].

Figure 1: Number of research papers published in 3D face reconstruction, 2016-2021

3D face reconstruction faces many challenges, such as occluder removal , makeup removal , expression transfer , and age prediction .

Obstructions can be internal or external . Some well-known internal occluders include hair, beards, mustaches, and profiles. External occlusions occur when other objects/people obscure part of the face, such as glasses, hands, bottles, paper, and masks [2].

The main reasons driving the growth of 3D face reconstruction research are multi-core central processing units (CPUs), smartphones, graphics processing units (GPUs), and cloud applications such as Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure availability [3-5].

3D data for

  • voxels (voxels, voxels, pixel+volume+element)
  • point cloud
  • a 3D mesh that GPUs can process (3D mesh that can be processed by GPU)

Indicates (see Figure 2). Recently, researchers have started research on 4D face recognition [6, 7]. Figure 3 demonstrates the classification of 3D face reconstruction.

Figure 2: 3D face image: a RGB image, b depth image, c grid image, d point cloud image, e voxel image

Figure 3: Classification of 3D face reconstruction

1.1 General Framework for 3D Face Reconstruction

The 3D reconstruction based facial recognition framework involves preprocessing , deep learning and prediction . Figure 4 shows the stages involved in 3D face inpainting techniques, which can acquire various forms of 3D images, all of which have different preprocessing steps based on needs.

Figure 4: A general framework for the 3D face reconstruction problem [9]

Face alignment may or may not be done before sending to the reconstruction stage. Sharma and Kumar [2, 8, 9] do not use face alignment in their reconstruction techniques.

Facial reconstruction can be performed using various techniques, such as 3DMM-based reconstruction, EG-based reconstruction, OSL-based reconstruction, DL-based reconstruction, and SFS-based reconstruction. Furthermore, a prediction stage is required as a result of face reconstruction. Predictions can be used in facial recognition, emotion recognition, gender recognition or age estimation applications.

1.2 Word cloud

The word cloud shows the top 100 keywords for 3D face reconstruction (see Figure 5).

Figure 5: Word Cloud of 3D Face Reconstruction Literature

From this word cloud, keywords related to facial reconstruction algorithms such as "3D face", "pixel", "image" and "reconstruction" were widely used. The keyword "3D face reconstruction" attracts researchers as a problem domain for facial recognition technology.

Facial reconstruction involves completing images of occluded faces . Most 3D face reconstruction techniques use 2D images in the reconstruction process [10-12]. Recently, researchers have started to study grid and voxel images [2, 8]. Generative Adversarial Networks (GANs) are used for face swapping and facial feature modification for 2D faces [13]. These are yet to be explored using deep learning techniques.

This paper is motivated by a detailed research survey on deep learning of 3d point clouds [14] and person re-identification [15]. As shown in Figure 1, 3D face research has grown over time over the past five years. Most reconstruction studies prefer to use GAN-based deep learning techniques. This paper aims to investigate the use of deep learning techniques for 3D face reconstruction and its application in real-world scenarios.

The contributions of this paper include:

  1. The advantages and disadvantages of various 3D facial reconstruction techniques are discussed.
  2. The hardware and software requirements for 3D facial reconstruction techniques are presented.
  3. Datasets, performance evaluation metrics, and applicability for 3D face reconstruction are investigated.
  4. The challenges of current and future 3D facial reconstruction techniques are explored.

The rest of the paper is organized as follows: Section 2 introduces variants of the 3D face reconstruction technique. Section 3 discusses the performance evaluation metrics, and Section 4 presents the datasets used for the reconstruction techniques. Section 5 discusses tools and techniques for the reconstruction process. Section 6 explores potential applications of 3D face reconstruction. Section 7 summarizes current research challenges and future research directions. Section 8 provides concluding comments.

2 3D face reconstruction technology

3D face reconstruction techniques are broadly classified into five main categories, including 3D deformable model (3DMM)-based reconstruction, deep learning (DL)-based reconstruction, epipolar geometry (EG)-based reconstruction, single-shot learning-based (OSL) reconstruction and shadow shape (SFS) based reconstruction. Figure 6 demonstrates the 3D facial reconstruction technique. Most researchers are working on hybrid facial reconstruction techniques and are considered the sixth category.

Figure 6: 3D face reconstruction technology

2.1 3DMM-based reconstruction

3D deformable models (3DMMs) are generative models for facial appearance and shape [16]. All faces to be generated are in a dense point-to-point correspondence, which can be achieved through a face registration process. Morphs are generated through dense correspondences. The technique focuses on separating facial color and shape from other factors such as lighting, brightness, contrast, etc. [17].

3DMM was introduced by Blanz and Vetter [18]. Variants of 3DMM are available in the literature [19-23]. These models use low-dimensional representations for facial expressions, textures, and identities. Basel Face Model (BFM) is one of the publicly available 3DMM models. The model is constructed by registering template meshes corresponding to scanned faces obtained from iterative closest point (ICP) and principal component analysis (PCA) [24].

Figure 7 shows the gradual improvement of 3DMM over the past 20 years [18, 25–28]. The figure presents results from the original paper by Blanz and Vetter in 1999 [18], the first publicly available deformable model in 2009 [25], state-of-the-art facial rendition results [28] and GAN models [27].

Figure 7: Stepwise improvements in 3DMM over the past two decades [17]

Maninchedda et al. [29] proposed an automatic reconstruction method based on 3D epipolar geometry to solve the situation where the face is occluded by glasses. They propose a variational segmentation model that can represent a wide variety of glasses.

Zhang et al. [30] proposed a method for reconstructing dense 3D face point clouds from a single data frame captured by an RGB-D sensor. An initial point cloud of the facial region was captured using the K-Means clustering algorithm. The neighborhood of the point cloud is then estimated using an Artificial Neural Network (ANN).

Furthermore, Radial Basis Function (RBF) interpolation is used to achieve the final approximation of the 3D face centered on the point cloud.

Jiang et al. [31] proposed a 3D face restoration algorithm (PIFR) based on 3DMM. The input images are normalized to obtain more information about the visibility of facial landmarks. The advantage of this method is the pose-invariant facial reconstruction capability. However, reconstruction needs improvement at large poses.

In the field of computer vision, a large pose (large pose) usually refers to the situation in which the orientation, angle or rotation angle of a face or object in an image changes greatly, such as rotation, scaling and translation. In face reconstruction, a large pose usually refers to the situation where the face is placed in a non-frontal orientation, or the face is partially occluded. These situations increase the difficulty of facial recognition and reconstruction.

Wu et al. [32] proposed a technique for 3D facial expression reconstruction using a single image. The parameters of the 3DMM were calculated using the cascaded regression framework. In the feature extraction stage, Histogram of Oriented Gradients (HOG) and keypoint offsets are used.

Kollias et al. [33] proposed a new technique for synthesizing facial expressions and positive/negative emotion levels. Based on the value-awakening (VA) technique, 600K frames are annotated from the 4DFAB dataset [34]. The technique works on in-the-wild exterior datasets. However, the 4DFAB dataset is not publicly available.

Lyu et al. [35] proposed a Pixel-Face dataset that generates high-resolution images from 2D images. For 3D face reconstruction, Pixel-3DM is proposed. However, this study did not consider external occlusion situations.

2.2 DL-based reconstruction

3D Generative Adversarial Network (3DGAN) and 3D Convolutional Neural Network (3DCNN) are deep learning techniques for 3D face reconstruction [27]. The main advantages of these methods are high fidelity and higher accuracy and mean absolute error (MAE) performance. However, training a GAN takes a long time. Face reconstruction from canonical views can be performed by the Facial Identity Preserving (FIP) method [36].

Tang et al. [37] introduced a multi-layer generative deep learning model for generating images in novel lighting situations. In face recognition, the training corpus is responsible for providing labels for multi-view perceptrons. Augment synthetic data from a single image using facial geometry [38].

Richardson et al. [39] proposed an unsupervised version of the above reconstruction. Facial animation task [40] is implemented using a supervised CNN. Use deep convolutional neural networks (DCNNs) to recover 3D texture and shape. In [41], facial texture restoration provides better details than 3DMM [42].

Figure 8 illustrates the different stages of 3D face recognition using the recovery of occluded regions.

Figure 8: Different stages of 3D facial recognition using restoration techniques [9]

Kim et al. [26] proposed a 3D face recognition algorithm based on deep convolutional neural network. Using 3D face enhancement technology, various facial expressions can be synthesized using a single scan of the 3D face. Model training based on transfer learning is faster. However, when 3D point cloud images are converted to 2.5D images, 3D data is lost.

2.5D usually means that the depth information is limited to a single plane (such as a 2D image), and each pixel on this plane has a depth value associated with it. In 3D face recognition, the process of converting 3D facial data into a 2.5D image is to map the depth value of each 3D point to the corresponding pixel on the 2D image, thereby obtaining the depth information of each pixel. This method can reduce the dimensionality of data, simplify calculations and reduce storage space requirements, but part of the 3D information will be lost, which may affect the accuracy of facial recognition.

Gilani et al. [43] proposed a technique for developing a large corpus of annotated 3D faces. They trained a facial recognition 3D convolutional neural network (FR3DNet) to recognize 3.1 million 3D faces of 100K people. The test is based on 31,860 images of 1853 people.

Thies et al. [44] proposed a neural voice puppetry technique for generating realistic output video from source input audio. This is based on a DeepSpeech recurrent neural network using the latent 3D model space. Audio2ExpressionNet is responsible for converting input audio to specific facial expressions.

Li et al. [45] proposed SymmFCNet, a symmetric consistent convolutional neural network for reconstructing missing pixels using the other half of the face. SymmFCNet includes illumination reweighting deformation and generative reconstruction subnetworks. Reliance on multiple networks is a significant disadvantage.

Han et al. [46] proposed a sketching system to create 3D caricature photos by modifying facial features. An unconventional deep learning approach is devised to obtain vertex saliency maps. They use the FaceWarehouse dataset [20] for training and testing. The advantage is converting 2D images into 3D facial caricature models. However, with glasses, the manga quality suffers. Furthermore, the reconstruction is affected by different lighting conditions.

[47] implemented an autoencoder such as 3DFaceGAN for modeling 3D facial surface distribution. Reconstruction loss and adversarial loss are used for generator and discriminator. The disadvantage is that GAN is difficult to train and cannot be applied to real-time 3D face solutions.

2.3 EG-based reconstruction

Face reconstruction methods based on epipolar geometry use multiple non-synthetic perspective images of the same subject to generate a single 3D image [48]. The main advantage of these techniques is good geometric fidelity. Calibrating cameras and orthographic images are two major challenges faced by these techniques. Figure 9 shows the horizontal and vertical epipolar plane images (EPIs) obtained from the central view and subaperture images.

Figure 9: a Epipolar plane image corresponding to 3D facial curve, b horizontal EPI, c vertical EPI [48]

Anbarjafari et al. [49] proposed a novel technique for generating 3D faces captured by mobile phone cameras. A total of 68 face landmarks are used to divide the face into four regions. Different stages are used during texture creation, weighted area creation, model morphing and compositing. The main advantage of this technique is the good generalization ability obtained from feature points. However, it relies on datasets with good head shapes, which affects the overall quality.

2.4 OSL-based reconstruction

One-shot learning-based reconstruction methods use a single image of an individual to recreate a 3D recognition model [50]. The technique utilizes a single image of each subject to train the model. Consequently, these techniques train faster while also yielding promising results [51]. However, this approach cannot be generalized to videos. Now, 3D reconstruction based on single-shot learning is an active research area.

In order to train a mapping model from 2D to 3D images, realistic 3D models are required. Some researchers use depth prediction to reconstruct 3D structures [52, 53]. While other techniques directly predict 3D shape [54, 55]. Few studies have performed 3D face reconstruction by utilizing one 2D image [38, 39].

Optimal parameter values ​​for 3D faces can be obtained by using deep neural networks and model parameter vectors. Major improvements have been achieved on [56, 57]. However, this approach cannot adequately handle pose variations. The main disadvantage of this technique is the creation of multi-view 3D faces and reconstruction degradation. Figure 10 demonstrates the general framework of single-shot based facial reconstruction techniques.

Figure 10: Overall framework of OSL-based 3D face reconstruction

Xing et al. [58] proposed a technique for 3D face reconstruction using a single image without considering the real 3D shape. Facial model rendering is used in the reconstruction process. Use the fine-tuning guided method to send feedback to further improve the rendering quality. This technique provides methods for reconstructing 3D shapes from 2D images. However, the downside is the use of rigid body transformations for preprocessing.

2.5 SFS-based reconstruction

Shape restoration (SFS) methods are based on recovering 3D shape from shadow and lighting cues [59, 60]. It uses images that produce models of good shape. However, occlusion cannot be handled when the shape estimation interferes with the object's shadow. It works well with lighting in non-frontal face views (see Figure 11).

Figure 11: 3D facial shape recovery a 2D image, b 3D depth image, c texture projection, d albedo histogram [59]

The method of Jiang et al. [61] is inspired by using RGB-D and monocular video for facial animation. Computation to make a rough estimate of the target 3D face is done by fitting a parametric model to the input image. The main disadvantage of this technique is the reconstruction of a 3D image from a single 2D image. In contrast, SFS techniques rely on predefined knowledge about facial geometry, such as facial symmetry.

2.6 Reconstruction based on hybrid learning

Hybrid Learning‑based Reconstruction

Richardson et al. [38] proposed a technique to generate a database with realistic facial images using geometric shapes. The proposed network is constructed using the ResNet model [62]. This technique cannot recover images with different facial attributes. It fails to generalize the training process to new face generation.

Liu et al. [63] proposed a technique for 3D face reconstruction using a 3DMM hybrid and shape restoration method. The mean absolute error (MAE) was plotted for convergence of the reconstruction error.

Richardson et al. [39] proposed a one-shot learning model for extracting coarse-to-fine facial shapes. Coarse facial feature recovery using CoarseNet and FineNet. High-detail facial reconstruction including wrinkles in a single image. However, it cannot generalize to facial features available in the training data. Reliance on synthetic data is another drawback.

Jackson et al. [51] proposed a CNN-based model for reconstructing 3D facial geometry using a single 2D facial image. This method does not require any kind of face alignment. It works well with all types of expressions and poses.

Tewari et al. [64] proposed a generative model based on a convolutional autoencoder network for face reconstruction. They used AlexNet [65] and VGGFace [66] models. However, the method fails in occlusions such as beards or external objects.

Dou et al. [67] proposed a deep neural network (DNN) based technique for end-to-end 3D face reconstruction using a single 2D image. Multi-task loss function and fused CNN are mixed for face recognition. The main advantage of this approach is a simplified framework with an end-to-end model. However, this method has the disadvantage of relying on synthetic data.

Han et al. [68] proposed a CNN deep learning based sketching system for 3D face and cartoon modeling. Usually, rich facial expressions are generated by MAYA and ZBrush. However, it includes gesture-based user interaction. The shape-level input is combined with the output of the fully connected layer to generate a bilinear output.

Hsu et al. [69] proposed two different cross-pose face recognition methods. One technique is based on 3D reconstruction and the other is built using deep CNNs. The face components are built from the 2D face library. The 3D surfaces are reconstructed using 2D facial components. CNN-based models can easily handle in-the-wild features. 3D component based methods do not generalize well.

Feng et al. [48] developed FaceLFnet to recover 3D faces using Epipolar Plane Images (EPI). They recover vertical and horizontal 3D facial curves using CNN. Realistic light field images were synthesized using 3D faces. 14K face scans of 80 different people were used during training, totaling 11 million facial curves/EPI. This model is a preferred choice for medical applications. However, this technique requires a large number of epipolar plane image curves.

Zhang et al. [70] proposed a 3D face reconstruction technique using a combination of plastic faces and sparse photometric stereo. Optimization techniques are used for the lighting direction of each pixel as well as high-precision lighting. Semantic segmentation is performed on input images and geometric proxies to reconstruct details such as wrinkles, eyebrows, moles, and pores. The mean geometric error was used to verify the reconstruction quality. This technique relies on shining light on the face.

Tran et al. [71] proposed a technique for 3D face reconstruction based on convexity mapping. Estimate convex maps using a convolutional encoder-decoder approach. Max pooling and Rectified Linear Units (ReLU) are used with convolutional layers. The main disadvantage of this technique is that unoptimized soft symmetry is slower to implement.

Feng et al. [72] proposed a benchmark dataset consisting of 2K facial images of 135 individuals. Five different 3D face reconstruction methods are evaluated on the proposed dataset.

Feng et al. [73] proposed a 3D face reconstruction technique based on UV position maps of texture coordinates, called Position Map Regression Network (PRN). CNN regresses 3D shape from a single 2D image. Weighted loss functions use different forms of weights during convolution, namely weight masks. UV position maps can also be generalized. However, it is difficult to apply in practical scenarios.

[74] proposed an encoder-decoder based network for regressing 3D face shape from 2D images. The joint loss is calculated based on 3D face reconstruction and recognition errors. However, the joint loss function affects the quality of face shape.

Chinaev et al. [75] developed a CNN-based model for 3D face reconstruction using mobile devices. MobileFace CNN was used during the testing phase. This approach is fast to train on mobile devices and can be applied in real time. However, annotating 3D faces with plastic models in the preprocessing stage is expensive.

Gecer et al. [27] proposed a 3D face reconstruction technique based on DCNN and GAN. In UV space, GAN is used to train the generator to generate facial textures. A non-traditional 3DMM fitting strategy is formulated on a differentiable renderer and GAN.

Deng et al. [76] proposed a CNN-based single-shot face reconstruction method for weakly supervised learning. Combined perceptual-level and image-level losses. The advantages of this technique are large pose and occlusion invariance. However, in the prediction stage, the confidence of the model is low in terms of occlusion.

Yuan et al. [77] proposed a 3D face restoration technique using 3DMM and GAN for occluded faces. Validate the quality of 3D faces using a local discriminator and a global discriminator. Semantic mapping of facial landmarks leads to the generation of synthetic faces under occlusion. In contrast, multiple discriminators increase the time complexity.

Luo et al. [78] implemented a Siamese CNN approach for 3D face restoration. They validate the quality of the reconstruction method using weighted parametric distance cost (WPDC) and contrastive cost functions. However, face recognition was not tested in the wild and the number of training images is small.

[79] proposed a GAN-based method for synthesizing high-quality 3D faces. Expression enhancement using conditional GAN. 10K new individual identities were randomly synthesized from the 300W-LP dataset. This technology produces high-quality 3D faces with fine details. However, GANs are difficult to train and cannot be applied to real-time solutions.

Chen et al. [80] proposed a 3D face reconstruction technique using a self-supervised 3DMM trainable VGG encoder. Regression of 3DMM parameters using a two-stage framework to reconstruct facial details. Generates faces with good quality under normal occlusion. Use UV space to capture facial details. However, the model fails on extreme occlusions, expressions and large poses. The CelebA [81] dataset is used for training, and CelebA is used together with the LFW [82] dataset for the testing process.

Ren et al. [83] developed an encoder-decoder framework for 3D face point video deblurring. Predicting Identity Knowledge and Facial Structure via Rendering Branch and 3D Facial Reconstruction. Face deblurring is a challenge when dealing with pose-varying videos. The main disadvantage of this technique is the high computational cost.

Tu et al. [10] developed a 2D-assisted self-supervised learning (2DASL) technique for 2D facial images. Use noise information from keypoints to improve the quality of 3D facial models. Self-critical learning is developed to improve 3D facial models. Two datasets, namely AFLW-LFPA [84] and AFLW2000-3D [85], are used for 3D face restoration and face alignment. This approach works well for 2D faces in the wild as well as noisy keypoints. However, it relies on 2D to 3D keypoint annotations.

Liu et al. [86] proposed an automatic method for generating Pose and Expression Normalization (PEN) 3D faces. The advantage of this technique is that reconstruction from a single 2D image and 3D facial recognition are pose and expression invariant. However, it is not occlusion invariant.

Lin et al. [24] implemented a technique for 3D face reconstruction based on single-shot images in the wild. Generating High-Density Facial Textures Using Graph Convolutional Networks. FaceWarehouse [20] and CelebA [81] databases are used for training.

Ye et al. [87] proposed a large-scale 3D cartoon dataset. They generated a PCA-based linear 3D plastic model for comic shapes. 6.1K portrait caricature images are collected from pinterest.com and WebCaricature dataset [88]. High-quality 3D comics have been synthesized. However, for occluded input face images, the quality of caricature is not good.

Lattas et al. [89] proposed a technique for generating high-quality 3D facial reconstructions using arbitrary images. A large-scale database of 200 different subjects was collected based on geometry and reflectivity. Train an image transformation network to estimate specular and diffuse albedo. The technique uses GANs to generate high-resolution avatars. However, it cannot generate avatars for dark skin themes.

Zhang et al. [90] proposed an automatic keypoint detection and 3D face restoration technique for manga. Use the 2D image of the manga to regress the direction and shape of the 3D manga. The ResNet model is used to encode the input image into a latent space. The decoder is used together with fully connected layers to generate 3D keypoints on the caricature.

Deng et al. [91] proposed a DISentangled precisely-COntrollable (DiscoFaceGAN) latent embedding for representing fake people with various poses, expressions, and lighting. Contrastive learning is employed to facilitate disentanglement by comparing rendered faces with real faces. Facial generation is accurate in expression, pose and lighting. The quality of generated models is lower in low light and extreme poses.

Li et al. [92] proposed a 3D face reconstruction technique for estimating the pose of a 3D face, using coarse-to-fine estimation. They generate 3D models using an adaptive weighting method. The advantage of this technique is its robustness to partial occlusions and extreme poses. However, the model fails when 2D and 3D keypoints are misestimated when occluded.

Chaudhuri et al. [93] proposed a deep learning method for training personalized dynamic albedo maps and expressive blend shapes. Generate 3D facial restorations in a photo-realistic manner. The face parsing loss and blend shape gradient loss capture the semantic meaning of reconstructed blend shapes. This technique was trained on videos in the wild and generated high-quality 3D faces and the transfer of facial motion from one person to another. It doesn't perform well under external occlusion.

Shang et al. [94] proposed a self-supervised learning technique for occlusion-aware view synthesis. Multidimensional consistency is performed using three different loss functions, namely depth consistency loss, pixel consistency loss, and keypoint-based epipolar loss. Reconstruction is done by an occlusion-aware method. It does not perform well with external occlusions (hands, glasses, etc.).

Cai et al. [95] proposed Attention Guided GAN (AGGAN), capable of 3D face reconstruction using 2.5D images. AGGAN uses an autoencoder technique to generate 3D voxel images from depth images. 2.5D to 3D face mapping using attention-based GANs. The technique handles a wide range of head poses and expressions. However, in the case of a wide mouth opening, facial expressions cannot be fully reconstructed.

Xu et al. [96] proposed a method for training head geometry models without using 3D benchmark data. Using CNNs to train deep synthetic images with head geometry without optimization. Head pose manipulation using GANs and 3D warping.

Table 1 presents a comparative analysis of 3D facial reconstruction techniques.

Table 1: Comparative Analysis of 3D Facial Reconstruction Technologies

Table 2 summarizes the advantages and disadvantages of 3D facial reconstruction techniques.

Table 2: Comparison of advantages and disadvantages of 3D facial reconstruction technologies

3 Performance Evaluation Standards

Performance evaluation measures are important to understand the quality of a trained model. There are a variety of evaluation indicators, including mean absolute error (MAE), mean square error (MSE), normalized mean error (NME), root mean square error (RMSE), cross entropy loss (CE), area under the curve (AUC ), intersection over union ratio (IoU), peak signal-to-noise ratio (PSNR), receiver operating characteristic curve (ROC) and structural similarity index (SSIM).

Table 3 summarizes the performance evaluation measures for 3D face reconstruction techniques.

Table 3: Evaluation of 3D face reconstruction techniques from performance metrics

During face reconstruction, the most important performance evaluation measures are MAE, MSE, NME, RMSE and adversarial loss. These are five widely used measures of performance evaluation. Adversarial losses have been used since 2019 with the advent of GANs on 3D images.

4 Datasets for Face Recognition

Table 4 presents a detailed description of the datasets used for 3D face reconstruction techniques.

Table 4: Detailed description of the datasets used

The analysis of different datasets highlights the fact that most 3D facial datasets are publicly available datasets. Compared to public datasets of 2D faces, they do not have a sufficient number of images to train the models. This makes the study of 3D faces more interesting as the scalability factor has not yet been tested and has become an active research area. It is worth mentioning that only three datasets, Bosphorus, Kinect-FaceDB and UMBDB datasets, have occluded images for occlusion removal.

5 Tools and techniques for 3D face reconstruction

Table 5 presents the technologies used in terms of graphics processing unit (GPU) hardware, random access memory (RAM) size, central processing unit (CPU), and brief applications. The comparison highlights the importance of deep learning in 3D facial reconstruction. GPUs play a vital role in deep learning based models. With the advent of Google Collaboratory, GPUs are now freely available.

Table 5: Comparative Analysis of 3D Face Reconstruction Technologies, Hardware and Applications

6 applications

Based on AI+X technology [128], where X is expertise in the field of facial recognition, a large number of applications are affected by 3D facial reconstruction. Facial manipulation, voice-driven animation and reproduction, video dubbing, virtual makeup, projection mapping, face replacement, facial aging, and 3D printing in medicine are some of the well-known applications. These applications are discussed in the following subsections.

6.1 Facial manipulation

The gaming and film industries use facial cloning or manipulation in video-based facial animation. Expressions and emotions are transmitted from the user to the target character through video streams. When an artist voices an animated character in a film, 3D facial reconstruction can help transfer expressions from the artist to the character. Figure 12 shows an example of manipulation in a real-time demonstration of a digitized avatar [129, 130].

Figure 12: Real-time face puppet show [129]

6.2 Voice-driven animation and reproduction

Zollhofer et al. [1] discuss various video-based face rendition works. Most methods rely on the reconstruction of source and target faces using parametric facial models. Figure 13 shows the pipeline architecture for neural speech manipulation [44]. Audio input is subjected to feature extraction through Deep Speech based on Recurrent Neural Networks. In addition, the autoencoder-based expression features are transferred along with the 3D model to a neural renderer to receive speech-driven animations.

Figure 13: Neural Speech Puppets

6.3 Video Dubbing

Dubbing is an important part of filmmaking in which audio tracks are added or replaced in the original scene. The original actor's voice needs to be replaced with the voice actor's voice. This process requires sufficient training of voice actors to lip-sync their audio to the original actor [131]. To minimize discrepancies in visual dubbing, dynamic reconstruction of lip-synthesis is required to complement the dialogue spoken by the voice actors. This involves mapping the voice actor's mouth movements to the actor's mouth movements [132]. Therefore, techniques of image swapping or passing parameters are used.

Figure 14 shows the visual dubbing of VDub [131] and Face2Face with live dubbing enabled. Figure 14 shows an example of DeepFake in 6.S191 [133], showing an example of a course instructor using deep learning to dub his own voice into that of a famous person.

Figure 14: DeepFake example in 6.S191 [133]

6.4 Virtual makeup

The use of virtual makeup is very common on online platforms, for meetings and video chats, where presenting a good look is integral. It includes digital image changes such as applying the right lipstick, face mask, etc. This is great for beauty product companies as they can run digital ads where consumers can experience the effects of the product in real-time on their images. It is achieved by using different reconstruction algorithms.

Synthetic virtual tattoos have been shown to adjust to facial expressions [134] (see Figure 15a).

Viswanathan et al. [135] proposed a system in which two facial images were taken as input, one with eyes open and the other with eyes closed. An augmented reality face is proposed for adding one or more makeup shapes, layers, colors and textures to the face.

Nam et al. [136] proposed an augmented reality-based lip makeup method that uses pixel-by-pixel makeup compared to polygon-by-polygon makeup, as shown in Figure 15b.

Figure 15: a synthetic virtual tattoo [134] and, b pixel-by-pixel lipstick makeup based on augmented reality [136]

6.5 Projection Mapping

Projection mapping uses a projector to modify the character or expression of a real-world image. This technique is used to bring life to still images and give them a visual presentation. Projection mapping using different methods in 2D and 3D images to change the appearance of people. Figure 16 demonstrates a real-time projection mapping system named Face-Forge [137].

Figure 16: Real-time projection mapping based on FaceForge [137]

Lin et al. [24] proposed a 3D face projection technique by passing the input image through CNN and combining the information with 3DMM to obtain the fine texture of the face (see Figure 17).

Figure 17: 2D surface projection mapping combined with 3DMM model [24]

6.6 Face Replacement

Face replacement is commonly used in the entertainment industry, where a source face is replaced by a target face. This technique is based on parameters such as identity, facial features and expressions of the two faces (source and target). The source face needs to be rendered so that it matches the conditions of the target face. Adobe After Effects, a widely used tool in the film and animation industry, can help with face replacement [138] (see Figure 18).

Figure 18: Face replacement system with unchanged expression [138]

6.7 Facial Aging

Facial aging is an effective technique for converting 3D facial images to 4D. If aging GANs could be used to synthesize a single 3D image, that would help create 4D datasets. Facial aging is also known as age progression or age synthesis because it "resurrects" the face by changing its features. Enhance facial features using various techniques so that the original image is preserved. Figure 19 shows the process of face translation using age-conditional GAN ​​(ACGAN) [139].

Figure 19: Face transformation using ACGAN [139]

Shi et al. [140] used GAN for face aging because different facial parts have different aging speeds in time. Therefore, they use an attention-based conditional GAN ​​with normalization to handle piecewise facial aging.

Fang et al. [141] proposed a progressive face aging method using a triplet loss function at the GAN generator level. Complex conversion loss helps them effectively deal with facial aging.

Huang et al. [142] used progressive GAN to deal with three aspects of facial aging, such as identity preservation, high fidelity, and aging accuracy. [143] proposed a controllable GAN for manipulating the latent space of input face images to control facial aging.

Yadav et al. [144] proposed a method for face recognition under various age gaps using two different images of the same person.

Sharma et al. [145] used CycleGAN's pipeline for age progression and enhanced super-resolution GAN for high-fidelity fusion GAN.

[146] proposed a facial aging method for modeling youthful faces, modeling facial appearance and geometric transformations.

As shown in Table 6, face reconstruction can be used in three different types of settings. Facial manipulation, voice-driven animation, and facial reproduction are all examples of animation-based facial reconstruction. Face replacement and video dubbing are two examples of video-based applications. Facial aging, virtual makeup, and projection mapping are some of the most common 3D facial applications.

Table 6: Application of 3D face reconstruction technology

7 Challenges and Future Research Directions

This section discusses the main challenges faced during 3D face reconstruction, followed by directions for future research.

7.1 Current challenges

Current challenges in 3D face reconstruction include occlusion removal, makeup removal, expression transfer, and age prediction. These will be discussed in the next subsections.

7.1.1 Occlusion removal

Occlusion removal is a challenging task for 3D face reconstruction. Researchers are using voxels and 3D landmarks to handle 3D facial occlusions [2, 8, 9].

Sharma and Kumar [2] developed a voxel-based facial reconstruction technique. After the reconstruction process, they use a pipeline trained with variational autoencoders, bidirectional LSTMs, and triplet loss to achieve 3D face recognition.

Sharma and Kumar [20] proposed a voxel-based face reconstruction and recognition method. They use a game theory based generator and discriminator to generate triplets. After the missing information is reconstructed, the occlusions are removed. Sharma and Kumar [22] built a one-shot learning 3D face reconstruction technique using 3D facial landmarks (see Figure 20).

Figure 20: 3D face reconstruction based on facial landmarks [9]

7.1.2 Applying cosmetics and removing them

Performing makeup and makeup removal during virtual meetings during the COVID-19 pandemic is challenging [154-156].

MakeupBag [154] proposes an automatic makeup style transfer technique by solving the makeup separation and facial makeup problems. The main advantage of the MakeupBag is that it takes skin tone and color into account when transferring makeup (as shown in Figure 21).

Figure 21: MakeupBag based on the output of makeup applied from a reference face to a target face [154].

Li et al. [155] proposed a makeup-invariant face verification system. They use a Semantic-Aware Makeup Cleaner (SAMC) to remove facial makeup across a variety of expressions and poses. The technique works unsupervised while locating areas of face makeup, and uses an attention map between 0 and 1, representing the degree of makeup.

Horita and Aizawa [156] proposed a generative adversarial network (SLGAN) guided by styles and latent vectors. They use a controllable GAN to allow users to adjust the shading effect of cosmetics (see Figure 22).

Figure 22: GAN-based cosmetic transfer and removal [156]

7.1.3 Expression transfer

Expression transfer is an active problem, especially with the advent of GANs.

Wu et al. [157] proposed ReenactGAN, a method capable of transferring human expressions from a source video to a target video. They employ an encoder-decoder based model for face translation from source to target. The transformer is evaluated using three loss functions, namely recurrent loss, adversarial loss, and shape-constrained loss. Figure 23 shows images of Donald Trump recreating expressions.

Figure 23: Expression transfer using ReenactGAN [157]

Deepfakes are a concern, where facial expressions and context are different.

Nirkin et al. [158] proposed a deepfake detection method for detecting identity manipulation and face replacement. In a deepfake image, the face region is manipulated by contextually changing the face to be altered.

[159] surveyed four deepfake methods, including full synthesis, identity swap, facial attribute manipulation, and expression swap.

7.1.4 Age Prediction

Thanks to deepfakes and generative adversarial networks [140, 142], faces can be deformed to other ages, as shown in Fig. 24. Therefore, the challenge of predicting one's age is beyond imagination, especially on ID cards or fake faces on social networking platforms.

Figure 24: Results of GAN for progressive facial aging [142]

Fang et al. [141] proposed a GAN-based facial age simulation technique. The proposed Triple-GAN model uses a triplet translation loss to model the interrelationships between age patterns. They use an encoder-decoder based generator and discriminator for age classification.

Kumar et al. [160] employ reinforcement learning on the latent space based on the GAN model [161]. They use a Markov decision process (MDP) for semantic manipulation.

[162] proposed a semi-supervised GAN technique to generate realistic facial images. They synthesized face images using real data and target ages when training the network.

Zhu et al. [163] used an attention-based conditional GAN ​​technique to synthesize facial images with targeted high fidelity.

7.2 Future challenges

Unsupervised learning is still an open problem in 3D face reconstruction. Recently, [164] proposed a solution for 3D symmetric deformable objects. In this paper, some future 3D facial reconstruction possibilities are discussed in detail, such as lip reconstruction, teeth and tongue capture, eye and eyelid capture, hairstyle reconstruction and full head reconstruction. These challenges present tasks for researchers working in the field of 3D facial reconstruction.

7.2.1 Lip reconstruction

The lips are one of the most critical components of the oral region. Various celebrities undergo lip surgery, including lip lift, lip reduction, and lip augmentation [165, 166].

Heidekrueger et al. [165] investigated the lip proportions preferred by women. It was concluded that gender, age, occupation, and country may influence the preference for lower lip proportions.

Upper lip aesthetics were reviewed by Baudoin et al. [166]. Different treatment options ranging from fillers to dermabrasion and surgical excision are investigated.

Zollhofer et al. [1] show lip reconstruction as an application of 3D face reconstruction in Fig. 25. In [167], a video of the lips reconstructs the rolling, stretching and bending of the lips.

Figure 25: High-quality lip reconstruction [1]

7.2.2 Teeth and Tongue Capture

In the literature, few research works focus on capturing the interior of the oral cavity. Reconstructing teeth and tongue in GAN-based 2D face reconstruction is a difficult task. Beards or mustaches can make it difficult to catch teeth and tongues. In [163], a statistical model is discussed. There are different applications for reconstructing dental regions, for example, making the content of digital avatars and tooth restoration based on facial geometry (see Figure 26).

Figure 26: Dental reconstruction and its application [168]

7.2.3 Eye and eyelid capture

[170] demonstrated 3D eye gaze estimation and face reconstruction from RGB videos.

Wen et al. [169] proposed a technique for real-time tracking and reconstruction of 3D eyelids (see Fig. 27). This approach is combined with face and eye tracking systems to achieve full faces with detailed eye regions. In [171], a bidirectional LSTM was used for eyelid tracking.

Figure 27: Eyelid tracking based on semantic edges [169]

7.2.4 Hairstyle reconstruction

Hairstyle reconstruction is a challenging task on 3D faces. 3D hair synthesis based on volumetric variational autoencoders [172] is shown in Fig. 28.

Figure 28: 3D hair synthesis using volumetric variational autoencoders [172]

Ye et al. [173] proposed a hair reconstruction model based on an encoder-decoder technique. It generates a volumetric vector field using a hairstyle-based orientation map. They used a mixture of CNN layers, skip connections, fully connected layers, and deconvolutional layers when generating the encoder-decoder format architecture. During training, structural and content losses are used as evaluation metrics.

7.2.5 Complete head reconstruction

3D head reconstruction is an active research area.

He et al. [174] proposed a full head-driven 3D face reconstruction. The input image and reconstruction results were generated with side-view textures (see Figure 29). They employed an albedo parametric model to complement the head texture map. Convolutional networks are used for segmentation of face and hair regions. Human head reconstruction has various applications in virtual reality and avatar generation.

Figure 29: Complete head reconstruction [174]

Table 7 presents the challenges and future directions, as well as their target problems.

Table 7: Challenges and Future Research Directions for 3D Face Reconstruction

8 Conclusion

This paper provides a detailed survey and in-depth study of 3D facial reconstruction techniques.

Six reconstruction techniques are initially discussed. The observation is that scalability is the biggest challenge for the 3D face problem, as there are no large enough publicly available datasets for 3D faces. Most researchers have worked on RGB-D images.

With the development of deep learning, there are hardware constraints for working with grid images or voxel images.

Current and future challenges related to 3D face reconstruction in the real world are discussed. This field is an open research field with many challenges, especially those related to the capabilities of generative adversarial networks (GANs) and deepfakes. exist

  • lip reconstruction
  • Internal Oral Reconstruction
  • eyelid reconstruction
  • Various hair styles
  • complete head reconstruction

On the one hand, this research has not been fully explored.

Declaration Conflict of interest: On behalf of all authors, the corresponding author declares no conflict of interest.

references

  1. Zollhöfer M, Thies J, Garrido P et al (2018) Recent advances in monocular 3D face reconstruction, tracking and applications. Computational Graphics Forum 37(2):523–550. https://doi.org/10.1111/cgf.13382
  2. Sharma S, Kumar V (2020) Voxel-based 3D face reconstruction using sequential deep learning and its application to face recognition. Multimedia Tools Applications 79:17303–17330. https://doi.org/10.1007/s11042- 020-08688-x
  3. Cloud Vision API | Google Cloud. https://cloud.google.com/vision/docs/face-tutorial. Accessed: January 12, 2021
  4. AWS Marketplace: Deep Vision API. https://aws.amazon.com/marketplace/pp/Deep-Vision-AI-Inc-Deep-Vision-API/B07JHXVZ4M. Accessed January 12, 2021
  5. Computer Vision | Microsoft Azure. https://azure.microsoft.com/en-in/services/cognitive-services/computer-vision/. Accessed January 12, 2021
  6. Koujan MR, Dochev N, Roussos A (2020) Real-time monocular 4D face reconstruction using LSFM models. Preprint arXiv:2006.10499.
  7. Behzad M, Vo N, Li X, Zhao G (2021) Towards sparse-aware 4D emotion recognition beyond face reading. Neural Computing 458:297–307
  8. Sharma S, Kumar V (2020) Voxel-based 3D occlusion-invariant face recognition using game theory and simulated annealing. Multimedia Tools and Applications 79(35):26517–26547
  9. Sharma S, Kumar V (2021) 3D Landmark-Based Face Recovery for Recognition Using Variational Autoencoders and Ternary Loss. IET Biometrics 10(1):87–98. https://doi.org/ 10.1049/bme2.12005
  10. Tu X, Zhao J, Xie M et al. (2020) Single-image 3D face reconstruction assisted by 2D face images in the Wild. IEEE Trans Multimed 23:1160–1172. https://doi.org/10.1109/TMM.2020.2993962
  11. Bulat A, Tzimiropoulos G How close are we to solving the problem of 2D and 3D face alignment? (and a dataset of 230,000 3D facial landmarks). In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1021–1030
  12. Zhu X, Lei Z, Liu X et al. (2016) Face alignment across large poses: a 3D solution. Computer Vision and Pattern Recognition (CVPR), pp. 146–155
  13. Gu S, Bao J, Yang H et al. (2019) Facial mask-guided portrait editing using conditional GANs. In: Proceedings of the 2019 IEEE Computer Community Conference on Pattern Recognition in Computer Vision 2019-June:3431–3440. doi: https: https://doi.org/10.1109/CVPR.2019.00355
  14. Guo Y, Wang H, Hu Q et al. (2020) Deep Learning for 3D Point Clouds: A Survey. IEEE Trans Pattern Anal Mach Intell 43(12):4338–4364. https://doi.org/10.1109/tpami .2020.3005434
  15. Ye M, Shen J, Lin G, et al. (2021) Deep Learning for Person Re-ID: Survey and Prospects. IEEE Trans Pattern Anal Mach Intell 8828:1–1. https://doi.org/10.1109/tpami.2021.3054775
  16. Tran L, Liu X Nonlinear 3D facial morphology models. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Pages: 7346–7355
  17. Egger B, Smith WAP, Tewari A et al. (2020) 3D Morphological Facial Models—Past, Present, and Future. ACM Trans Graph 39(5):1–38. https://doi.org/10.1145/3395208
  18. Blanz V, Vetter T (1999) Facial recognition based on 3D morphological model fitting. IEEE Trans Pattern Anal Mach Intell 25(9):1063–1074
  19. Booth J, Roussos A, Ponniah A, et al. (2018) Large-Scale 3D Morphological Models. Int J Comput Vis 126:233–254. https://doi.org/10.1007/s11263-017-1009-7
  20. Cao C, Weng Y, Zhou S et al. (2014) FaceWarehouse: A 3D Facial Expression Database for Visual Computing. IEEE Trans Vis Comput Graph 20:413–425. https://doi.org/10.1109/TVCG.2013.249
  21. Gerig T, Morel-Forster A, Blumer C et al. (2018) Morphological face models - an open framework. In: Proceedings of the 13th IEEE International Conference on Automatic Facial Gesture Recognition, FG. Pages: 75–82. https://doi. org/10.1109/FG.2018.00021
  22. Huber P, Hu G, Tena R et al (2016) A multi-resolution 3D morphological face model and fitting framework. In: Proceedings of the 11th Joint Conference on Theory and Applications of Computer Vision, Imaging, and Computer Graphics, p. 79 –86. SciTePress.
  23. Li T, Bolkart T et al (2017) Learning models of facial shape and expressions from 4D scans. ACM Trans Graphics 36(6):1–17. https://doi.org/10.1145/3130800.3130813
  24. Lin J, Yuan Y, Shao T, Zhou K (2020) High-Fidelity 3D Face Reconstruction Using Graph Convolutional Networks. Computer Vision Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr42600.2020.00593
  25. Paysan P, Knothe R, Amberg B et al (2009) A 3D face model for pose- and illumination-invariant face recognition. In: 6th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2009. Pages: 296– 301
  26. Kim D, Hernandez M, Choi J, Medioni G (2018) Deep 3D Facial Recognition. IEEE International Joint Conference on Biometrics (IJCB), IJCB 2017 2018-January:133–142. https://doi.org/10.1109/BTAS .2017.8272691
  27. Gecer B, Ploumpis S, Kotsia I, Zafeiriou S (2019) Ganfit: High-fidelity 3D face reconstruction using generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition: 1155–1164. https://doi .org/10.1109/CVPR.2019.00125
  28. Kim H, Garrido P, Tewari A, et al. (2018) Depth video portraits. ACM Trans Graphics 37:1–14. https://doi.org/10.1145/3197517.3201283
  29. Maninchedda F, Oswald MR, Pollefeys M (2017) Fast reconstruction of 3D models of faces with glasses. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2017.490
  30. Zhang S, Yu H, Wang T, et al. (2018) Dense 3D face reconstruction from a single depth image in an unconstrained environment. Virtual Reality 22(1):37–46. https://doi.org/10.1007/s10055 -017-0311-6
  31. Jiang L, Wu X, Kittler J (2018) Pose-invariant 3D facial reconstruction. 1–8. arXiv preprint arXiv:1811.05295
  32. Wu F, Li S, Zhao T, et al (2019) 3D face reconstruction using cascaded regression with landmark displacement. Pattern Recognition Letters 125:766–772. https://doi.org/10.1016/j.patrec.2019.07.017
  33. Kollias D, Cheng S, Ververas E et al. (2020) Deep Neural Network Augmentation: Generating Faces for Sentiment Analysis. International Journal of Computer Vision 128:1455–1484. https://doi.org/10.1007/s11263-020-01304- 3
  34. 4DFAB: A Large Scale 4D Facial Expression Database for Biometric Applications | DeepAI. https://deepai.org/publication/4dfab-a-large-scale-4d-facial-expression-database-for-biometric-applications. Accessed 2020 October 14
  35. Lyu J, Li X, Zhu X, Cheng C (2020) Pixel-Face: A Large-Scale, High-Resolution Benchmark for 3D Face Reconstruction. arXiv 预印本 arXiv:2008.12444
  36. Zhu Z, Luo P, Wang X, Tang X (2013) Deep Learning for Identity Preserving Face Space. In: Proceedings of IEEE International Conference on Computer Vision. Institute of Electrical and Electronics Engineers, pp. 113–120
  37. Tang Y, Salakhutdinov R, Hinton G (2012) Deep Lambertian Networks. arXiv preprint arXiv:1206.6445
  38. Richardson E, Sela M, Kimmel R (2016) 3D face reconstruction by learning from synthetic data. In: Proceedings of 4th International Conference on 3D Vision 2016, 3DV 2016. Institute of Electrical and Electronics Engineers, pp. 460–467
  39. Richardson E, Sela M, Or-El R, Kimmel R (2017) Learning detailed face reconstruction from a single image. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1259–1268
  40. Laine S, Karras T, Aila T, et al (2016) Facial Representation Capture Using Deep Neural Networks. arXiv preprint arXiv:1609.06536, 3
  41. Nair V, Susskind J, Hinton GE (2008) Synthetic analysis by learning an inverse generative black box. In: International Conference on Artificial Neural Networks, pp. 971–981
  42. Peng X, Feris RS, Wang X, Metaxas DN (2016) A Recurrent Encoder-Decoder Network for Continuous Face Alignment. In: European Conference on Computer Vision, pp. 38–56.
  43. Zulqarnain Gilani S, Mian A (2018) Learning from millions of 3D scans for large-scale 3D facial recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1896–1905. https:// doi.org/10.1109/CVPR.2018.00203
  44. Thies J, Elgharib M, Tewari A, et al (2019) Neural voice manipulation: Audio-driven facial reproduction. In: European Conference on Computer Vision, pp. 716–731
  45. Li X, Hu G, Zhu J et al. (2020) Learning Symmetrically Consistent Deep CNNs for Face Completion. IEEE Transactions on Image Processing 29:7641–7655. https://doi.org/10.1109/TIP.2020.3005241
  46. Han X, Hou K, Du D, et al. (2020) CaricatureShop: Personalized and Photo-Level Caricature Sketching. IEEE Transactions on Vision and Computer Graphics 26:2349–2361. https://doi.org/10.1109/TVCG.2018.2886007
  47. Moschoglou S, Ploumpis S, Nicolaou MA, et al (2020) 3DFaceGAN: Adversarial Networks for 3D Face Representation, Generation and Transformation. International Journal of Computer Vision 128(10):2534–2551. https://doi.org/10.1007/s11263-020-01329-8
  48. Feng M, Zulqarnain Gilani S, Wang Y, et al. (2018) "3D Face Reconstruction from Light Field Images: A Model-Free Approach". Computer Science Lecture Notes (including the Artificial Intelligence Lecture Notes subseries and the Bioinformatics Lecture Notes subseries) 11214 LNCS: 508–526. https://doi.org/10.1007/978-3-030-01249-6_31
  49. Anbarjafari G, Haamer RE, LÜSi I et al (2019) "3D facial reconstruction with region-based best-fit fusion using mobile phones for virtual reality-based social media". Scientific Bulletin of the Polish Academy of Sciences. 67: 125–132. https://doi.org/10.24425/bpas.2019.127341
  50. Kim H, Zollhöfer M, Tewari A, et al. (2018) "InverseFaceNet: Deep Monocular Inverse Rendering". In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4625–4634.
  51. Jackson AS, Bulat A, Argyriou V, Tzimiropoulos G (2017) "Reconstruction of large pose 3D faces from a single image via direct volumetric CNN regression". In Proceedings of the IEEE International Conference on Computer Vision 2017-Octob:1031–1039. https://doi.org/10.1109/ICCV.2017.117
  52. Eigen D, Puhrsch C, Fergus R (2014) "Predicting depth maps from single images using multi-scale deep networks". Preprint arXiv:1406.2283.
  53. Saxena A, Chung SH, Ng AY (2008) "3-D depth reconstruction from a single still image". International Journal of Computer Vision 76:53–69. https://doi.org/10.1007/s11263-007-0071-y
  54. Tulsiani S, Zhou T, Efros AA, Malik J (2017) "Multi-view supervision for single-view reconstruction via differentiable ray consistency". In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2626–2634.
  55. Tatarchenko M, Dosovitskiy A, Brox T (2017) "Octree Generative Networks: An Efficient Convolutional Architecture for High-Resolution 3D Output". In Proceedings of the IEEE International Conference on Computer Vision, pp. 2088–2096.
  56. Roth J, Tong Y, Liu X (2016) "Adaptive reconstruction of 3D faces from an unconstrained collection of photos". In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4197–4206.
  57. Kemelmacher-Shlizerman I, Seitz SM (2011) "Facial reconstruction in the wild". In Proceedings of the IEEE International Conference on Computer Vision, pp. 1746–1753.
  58. Xing Y, Tewari R, Mendonça PRS (2019) "A Self-Supervised Guided Method for Single-Image 3D Face Reconstruction". In Proceedings of the 2019 IEEE Winter Conference on Applied Computational Vision, WACV 2019:1014–1023. https://doi.org/10.1109/WACV.2019.00113
  59. Kemelmacher-Shlizerman I, Basri R (2011) "3D face reconstruction from a single image using a single reference plane shape". IEEE Transactions on Pattern Analysis and Machine Intelligence 33:394–405. https://doi.org/10.1109/TPAMI.2010.63
  60. Sengupta S, Lichy D, Kanazawa A, et al. (2020) "SfSNet: Learning Face Shape, Albedo, and Illumination in the Wild". IEEE Pattern Analysis and Machine Intelligence Trading. https://doi.org/10.1109/TPAMI.2020.3046915
  61. Jiang L, Zhang J, Deng B, et al. (2018) "Reconstruction of 3D faces with geometric details from a single image". IEEE Transactions on Image Processing 27:4756–4770. https://doi.org/10.1109/TIP.2018.2845697
  62. He K, Zhang X, Ren S, Sun J (2016) "Deep Residual Learning for Image Recognition". In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.
  63. Liu F, Zeng D, Li J, Zhao Q, jun (2017) "3D face reconstruction via cascaded regression in shape space". Frontiers in Information Technology and Electronic Engineering 18:1978–1990. https://doi.org/10.1631/FITEE.1700253
  64. Tewari A, Zollhöfer M, Kim H et al (2017) MoFA: A Model-Based Deep Convolutional Facial Autoencoder for Unsupervised Monocular Reconstruction. In: Working Proceedings of the 2017 IEEE International Conference on Computer Vision, ICCVW 2017 2018-Janua:1274-1283. https://doi.org/10.1109/ICCVW.2017.153
  65. Krizhevsky A, Sutskever I, Hinton GE (2012) Deep Convolutional Neural Networks for ImageNet Classification. Advances in Neural Information Processing Systems 25: 1097-1105
  66. Oxford Visual Geometry Group. http://www.robots.ox.ac.uk/~vgg/data/vgg_face/. Accessed on October 13, 2020
  67. Dou P, Shah SK, Kakadiaris IA (2017) End-to-end 3D face reconstruction with deep neural networks. In: 30th IEEE Conference on Pattern Recognition in Computer Vision, CVPR, 1503-1512. https://doi.org/10.1109/CVPR.2017.164
  68. Han X, Gao C, Yu Y (2017) DeepSketch2Face: A deep learning-based sketching system for 3D facial and caricature models. ACM Transactions in Graphics 36: 1-12. https://doi.org/10.1145/3072959.3073629
  69. Hsu GS, Shie HC, Hsieh CH, Chan JS (2018) Fast Localization 3D Component Reconstruction and CNNs for Cross-Pose Recognition. IEEE Transactions on Video Technology in Circuits and Systems 28: 3194-3207. https://doi.org/10.1109/TCSVT.2017.2748379
  70. Cao X, Chen Z, Chen A et al. (2018) Sparse photometric 3D face reconstruction guided by a morphological model. Proceedings of the IEEE Computer Society Conference on Pattern Recognition in Computer Vision. https://doi.org/10.1109/CVPR.2018.00487
  71. Tran AT, Hassner T, Masi I, et al. (2018) Extreme 3D facial reconstruction: Seeing through occlusions. Proceedings of the IEEE Computer Society Conference on Pattern Recognition in Computer Vision. https://doi.org/10.1109/CVPR.2018.00414
  72. Feng ZH, Huber P, Kittler J, et al. (2018) Evaluation of dense 3D reconstruction from 2D face images in the wild. In: 13th IEEE International Conference on Automatic Facial Gesture Recognition, FG 2018 780-786. https://doi.org/10.1109/FG.2018.00123
  73. Feng Y, Wu F, Shao X et al. (2018) Joint 3D Face Reconstruction and Dense Alignment with Location Graph Regression Networks. Computer Science Lecture Notes (including a subseries of Lecture Notes in Artificial Intelligence Lecture Notes in Bioinformatics) 11218 LNCS:557-574. https://doi.org/10.1007/978-3-030-01264-9_33
  74. Liu F, Zhu R, Zeng D, et al. (2018) Disentangling features in 3D face shapes for joint face reconstruction and recognition. Proceedings of the IEEE Computer Society Conference on Pattern Recognition in Computer Vision. https://doi.org/10.1109/CVPR.2018.00547
  75. Chinaev N, Chigorin A, Laptev I (2019) MobileFace: 3D Face Reconstruction via Efficient CNN Regression. In: Leal-Taixé Laura, Roth Stefan (eds) Computer Vision - ECCV 2018 Symposium: Munich, Germany, September 8-14, 2018, Proceedings, Part IV. Springer International Publishing, Cham, pp 15-30. https://doi.org/10.1007/978-3-030-11018-5_3
  76. Deng Y, Yang J, Xu S, et al. (2019) Accurate 3D face reconstruction using weakly supervised learning: from single image to image collection. IEEE Computer Society Symposium on Pattern Recognition in Computer Vision 2019-June: 285-295. https://doi.org/10.1109/CVPRW.2019.00038
  77. Yuan X, Park IK (2019) Facial de-occlusion using 3D morphological models and generative adversarial networks. In: Proceedings of IEEE International Conference on Computer Vision 2019-Octob:10061-10070. https://doi.org/10.1109/ICCV.2019.01016
  78. Luo Y, Tu X, Xie M (2019) Learning Robust 3D Face Reconstruction and Discriminative Identity Representations. 2019 2nd IEEE International Conference on Information and Communication Signal Processing, ICICSP 2019 317-321. https://doi.org/10.1109/ICICSP48821.2019.8958506
  79. Gecer B, Lattas A, Ploumpis S, et al. (2019) Synthesizing trunk-spur generative adversarial networks for coupled 3D facial patterns. European Conference on Computer Vision. Springer, Cham, pp 415-433
  80. Chen Y, Wu F, Wang Z, et al. (2019) Detailed 3D Facial Reconstruction with Self-Supervised Learning. IEEE Transactions on Image Processing 29:8696-8705
  81. Large-Scale Celebrity Facial Features (CelebA) Dataset. http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html. Accessed on 13 October 2020
  82. Labeled Faces in the Wild (LFW) Dataset | Kaggle. https://www.kaggle.com/jessicali9530/lfw-dataset. Accessed on October 13, 2020
  83. Ren W, Yang J, Deng S, et al. (2019) Using 3D face priors for face video deblurring. In Proceedings of the IEEE International Conference on Computer Vision. 2019-Octob:9387-9396. https://doi.org/10.1109/ICCV.2019.00948
  84. Jourabloo A, Liu X (2015) Aligning pose-invariant 3D faces. In Proceedings of the IEEE International Conference on Computer Vision. pp 3694-3702
  85. Cheng S, Kotsia I, Pantic M, et al. (2018) 4DFAB: A Large-Scale 4D Facial Expression Database for Biometric Applications. https://arxiv.org/pdf/1712.01443v2.pdf. Accessed October 14, 2020
  86. Liu F, Zhao Q, Liu X, Zeng D (2020) Joint Face Alignment and 3D Face Reconstruction for Face Recognition. IEEE Graphical Pattern Analysis and Machine Intelligence Transactions. 42:664-678. https://doi.org/10.1109/TPAMI.2018.2885995
  87. Ye Z, Yi R, Yu M, et al. (2020) 3D-CariGAN: An end-to-end solution from facial photos to 3D caricature generation. 1-17. arXiv preprint arXiv:2003.06841
  88. Huo J, Li W, Shi Y, et al. (2017) Web comics: A benchmark for comic recognition. arXiv preprint arXiv:1703.03230
  89. Lattas A, Moschoglou S, Gecer B, et al (2020) AvatarMe: Realistically Renderable "In-the-World" 3D Face Reconstruction. 757-766. https://doi.org/10.1109/cvpr42600.2020.00084
  90. Cai H, Guo Y, Peng Z, Zhang J (2021) Keypoint detection and 3D face reconstruction for manga using nonlinear parametric models. Graphical Models 115:101103. https://doi.org/10.1016/j.gmod.2021.101103
  91. Deng Y, Yang J, Chen D, et al. (2020) Disentangled and controllable facial image generation via 3D imitation-contrastive learning. https://doi.org/10.1109/cvpr42600.2020.00520
  92. Li K, Yang J, Jiao N, et al. (2020) Adaptive 3D face reconstruction from a single image. 1-11. arXiv preprint arXiv:2007.03979
  93. Chaudhuri B, Vesdapunt N, Shapiro L, Wang B (2020) Personalized face modeling for improved face reconstruction and action redirection. In Vedaldi A, Bischof H, Brox T, Frahm JM (eds) Computer Vision - ECCV 2020: 16th European Conference, Glasgow, UK, 23-28 August 2020, Proceedings, Part V. Springer International Publishing, Cham, pp 142-160. https://doi.org/10.1007/978-3-030-58558-7_9
  94. Shang J, Shen T, Li S, et al. (2020) Self-supervised monocular 3D face reconstruction by considering multi-view geometric consistency of occlusions. In Computer Vision - ECCV 2020: 16th European Conference, Glasgow, UK, 23-28 August 2020, Proceedings, XV Part 16 (pp. 53-70). Springer International Publishing
  95. Cai X, Yu H, Lou J, et al. (2020) Recover 3D facial geometry from depth views using attention-guided generative adversarial networks. arXiv preprint arXiv:2009.00938
  96. Xu S, Yang J, Chen D, et al. (2020) Depth 3D portraits from a single image. 7707-7717. https://doi.org/10.1109/cvpr42600.2020.00773
  97. Zhang J, Lin L, Zhu J, Hoi SCH (2021) Weakly supervised facet 3D reconstruction. 1-9. arXiv preprint arXiv:2101.02000
  98. Köstinger M, Wohlhart P, Roth PM, Bischof H (2011) Annotated facial markers in the wild: A large-scale, real-world database for facial marker localization. Proceedings of IEEE International Conference on Computer Vision. https://doi.org/10.1109/ICCVW.2011.6130513
  99. ICG - AFLW. https://www.tugraz.at/institute/icg/research/teambischof/lrs/downloads/afw/. Accessed 14 October 2020
  100. Tu X, Zhao J, Jiang Z, et al. (2019) Single-image 3D face reconstruction assisted by 2D face images in the wild. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2020.2993962
  101. Moschoglou S, Papaioannou A, Sagonas C et al (2017) AgeDB: The first manually collected in-the-wild age database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 51-59
  102. Morphace. https://faces.dmi.unibas.ch/bfm/main.php?nav=1-1-0&id=details. Accessed on October 14, 2020
  103. Savran A, Alyüz N, Dibeklioğlu H et al (2008) Bosphorus database for 3D facial analysis. European Symposium on Biometrics and Identity Management. Springer, Berlin, Heidelberg, pp. 47-56
  104. 3D Facial Expression Database - Binghamton University. http://www.cs.binghamton.edu/~lijun/Research/3DFE/3DFE_Analysis.html. Accessed October 13, 2020
  105. Center for Biometric and Security Research. http://www.cbsr.ia.ac.cn/english/3DFace Databases.asp. Accessed on October 14, 2020
  106. Yi D, Lei Z, Liao S, Li SZ (2014) Learning Face Representation from Scratch. arXiv preprint arXiv:1411.7923
  107. Front Side Celebrities in the Wild. http://www.cfpw.io/. Accessed on October 14, 2020
  108. Yang H, Zhu H, Wang Y, et al. (2020) FaceScape: Large-Scale High-Quality 3D Face Dataset and Detailed Steerable 3D Face Prediction. 598-607
  109. FaceWarehouse. http://kunzhou.net/zjugaps/facewarehouse/. Accessed on October 13, 2020
  110. Phillips PJ, Flynn PJ, Scruggs T et al (2005) An overview of the face recognition grand challenge. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), Vol. 1: pp. 947-954
  111. MORENO, A. (2004) GavabDB : A 3D Face Database. In Proceedings of the 2nd COST275 Internet Biometrics Work, 2004, pp. 75-80
  112. Le V, Brandt J, Lin Z, et al. (2012) Interactive facial feature localization. Computer Science Lecture Notes (including Artificial Intelligence Lecture Notes Bioinformatics Lecture Notes) 7574 LNCS:679-692. https://doi.org/10.1007 /978-3-642-33712-3_49
  113. IJB-A Dataset Request Form | NIST. https://www.nist.gov/itl/iad/image-group/ijb-dataset-request-form. Accessed October 14, 2020
  114. Min R, Kose N, Dugelay JL (2014) KinectfaceDB: A Kinect Database for Facial Recognition. IEEE Trans Syst Man, Cybern Syst 44:1534-1548. https://doi.org/10.1109/TSMC.2014.2331215
  115. Belhumeur PN, Jacobs DW, Kriegman DJ, Kumar N (2011) Localizing facial parts using consistent examples. Proceedings of the IEEE Computer Society Conference on Pattern Recognition in Computer Vision. https://doi.org/10.1109/CVPR.2011.5995602
  116. Bagdanov AD, Del Bimbo A, Masi I (2011) Forence 2D/3D Hybrid Face Dataset. In 2011 ACM Joint Studio Workshop on Human Gesture and Behavior Understanding - Proceedings of J-HGBU '11. ACM Press, New York, USA, p. 79
  117. Notre Dame CVRL. https://cvrl.nd.edu/projects/data/#nd-2006-data-set. Accessed October 13, 2020
  118. Image and Video Engineering Laboratory, University of Texas at Austin. http://live.ece.utexas.edu/research/texas3dfr/. Accessed October 14, 2020
  119. Le HA, Kakadiaris IA (2017) UHDB31: A Dataset for Better Understanding of Face Recognition Under Pose and Illumination Changes. In Proceedings of the 2017 IEEE International Symposium on Computer Vision (ICCVW). IEEE, pp. 2555-2563
  120. Colombo A, Cusano C, Schettini R (2011) UMB-DB: A Partially Occluded 3D Face Database. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2113-2119
  121. Parkhi OM, Vedaldi A, Zisserman A, (2015) Deep Facial Recognition. Pages 1-12
  122. Sanderson C (2002) VidTIMIT database. (No. REP_WORK). IDIAP
  123. Son Chung J, Nagrani A, Zisserman A, (2018) VoxCeleb2: Deep Voiceprint Recognition. arXiv preprint arXiv:1806.05622
  124. YouTube Faces Database : Homepage. https://www.cs.tau.ac.il/~wolf/ytfaces/. Accessed on October 14, 2020
  125. 300-VW | Computer Vision Online. https://computervisiononline/.com/dataset/1105138793. Accessed on October 13, 2020
  126. i·bug - Resources - 300 Faces In-the-Wild Challenge (300- W), ICCV 2013. https://ibug.doc.ic.ac.uk/resources/300-W/. Accessed on October 14, 2020
  127. Vijayan V, Bowyer K, Flynn P (2011) 3D Twins and Expression Challenge. In: Proceedings of the IEEE International Conference on Computer Vision. pp 2100–2105
  128. AI + X: Don't change careers, join AI - YouTube. http://www.youtube.com/watch?v=4Ai7wmUGFNA. Accessed on February 5, 2021
  129. Cao C, Hou Q, Zhou K (2014) Regression of displacement dynamic expressions for real-time face tracking and animation. In: ACM Transactions on Graphics. Association for Computing Machinery, pp 1–10
  130. Bouaziz S, Wang Y, Pauly M (2013) Online modeling for real-time facial animation. ACM Trans Graph 32:1–10. https://doi.org/10.1145/2461912.2461976
  131. Garrido P, Valgaerts L, Sarmadi H et al (2015) VDub: Modifying Actor Face Videos for Convincing Visual Alignment with Dubbed Audio Tracks. Computer Graphics Forum 34:193–204. https://doi.org/10.1111/cgf.12552
  132. Thies J, Zollhöfer M, Stamminger M, et al. Face2Face: real-time face capture and replay from RGB video
  133. MIT Introduction to Deep Learning | 6.S191 - YouTube. https://www.youtube.com/watch?v=5tvmMX8r_OM. Accessed on February 8, 2021
  134. Garrido P, Valgaerts L, Wu C, Theobalt C (2013) Reconstruction of detailed dynamic facial geometry from monocular videos. ACM Trans Graph 32:1–10. https://doi.org/10.1145/2508363.2508380
  135. Viswanathan S, Heisters IES, Evangelista BP, et al. (2021) Systems and methods for generating augmented reality makeup effects. US Patent 10,885,697
  136. Nam H, Lee J, Park JI (2020) Interactive pixel-by-pixel AR lip makeup system using RGB camera. Journal of Broadcast Engineering 25(7):1042–51
  137. Siegl C, Lange V, Stamminger M, et al. FaceForge: Multi-projection mapping of marker-free non-rigid faces
  138. Replace Faces in Video Using Still Image and Face Tools - After Effects Tutorials - YouTube. https://www.youtube.com/watch?v=x7T5jiUpUiE. Accessed on February 6, 2021
  139. Antipov G, Baccouche M, and Dugelay JL, (2017), Face Aging with Conditional Generative Adversarial Networks. In: IEEE International Conference on Image Processing (ICIP), pp. 2089–2093
  140. Shi C, Zhang J, Yao Y et al. (2020) CAN-GAN: Conditional Attention Normalized Generative Adversarial Networks for Face Age Synthesis. Pattern Recognition Letters 138:520–526. https://doi.org/10.1016/j.patrec.2020.08.021
  141. Fang H, Deng W, Zhong Y, Hu J (2020) Triple-GAN: Progressive Face Aging Using Triple Transformation Loss. In: IEEE Computing Society Meeting Symposium on Pattern Recognition in Computer Vision June 2020: 3500–3509. https://doi.org/10.1109/CVPRW50498.2020.00410
  142. Huang Z, Chen S, Zhang J, Shan H (2020) PFA-GAN: Progressive Face Aging with Generative Adversarial Networks. IEEE Transactions on Information Forensics and Security. https://doi.org/10.1109/TIFS.2020.3047753
  143. Liu S, Li D, Cao T, et al. (2020) GAN-based face attribute editing. IEEE Grant 8:34854–34867. https://doi.org/10.1109/ACCESS.2020.2974043
  144. Yadav D, Kohli N, Vatsa M, et al. (2020) Age gap reducer-GAN for recognizing age-spaced faces. In: 25th International Conference on Pattern Recognition (ICPR), pp 10090–10097
  145. Sharma N, Sharma R, Jindal N (2020) Improved techniques for facial age progression and enhanced super-resolution using generative adversarial networks. Wireless Personal Communications 114:2215-2233. https://doi.org/10.1007/s11277-020 -07473-1
  146. Liu L, Yu H, Wang S, et al. (2021) Learning shape and texture processes for facial aging in children. Signal Processing Image Communications 93:116127. https://doi.org/10.1016/j.image.2020.116127
  147. Nirkin Y, Keller Y, Hassner T (2019) FSGAN: Subject-independent face replacement and reenactment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7184-7193
  148. Tripathy S, Kannala J, Rahtu E (2020) ICface: Interpretable and controllable face reenactment using GANs. In: Proceedings of the IEEE/CVF Winter Conference on Computer Vision Applications, pp. 3385-3394
  149. Ha S, Kersner M, Kim B, et al. (2019) MarioNETte: Few-shot face reenactment that preserves identities of unseen targets. arXiv 34:10893-10900
  150. Zhang J, Zeng † Xianfang, Wang M, et al. (2020) FreeNet: Multi-identity face reenactment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5326-5335.
  151. Zeng X, Pan Y, Wang M, et al. (2020) Self-supervised decoupling of identity and pose for realistic face reenactment. arXiv 34:12757-12764
  152. Ding X, Raziei Z, Larson EC, et al (2020) Face Swap Detection Using Deep Learning and Subjective Evaluation. EURASIP Journal of Information Security, pp. 1-12
  153. Zukerman J, Paglia M, Sager C, et al. (2019) Video Manipulation and Face Replacement. US Patent 10,446,189
  154. Hoshen D (2020) MakeupBag: Separating Makeup Extraction and Application. arXiv preprint rXiv:2012.02157
  155. Li Y, Huang H, Yu J, et al (2020) Beauty-conscious makeup cleaners. arXiv preprint arXiv:2004.09147
  156. Horita D, Aizawa K (2020) SLGAN: Style- and Latent-Guided Generative Adversarial Networks for Ideal Makeup Transfer and Removal. arXiv preprint arXiv:2009.07557
  157. Wu W, Zhang Y, Li C, et al (2018) ReenactGAN: Learning to Reproduce Faces via Boundary Transfer. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 603-619
  158. Nirkin Y, Wolf L, Keller Y, Hassner T (2020) DeepFake detection based on differences between faces and their context. arXiv preprint arXiv:2008.12262.
  159. Tolosana R, Vera-Rodriguez R, Fierrez J, et al (2020) Deepfakes and their consequences: A survey of facial manipulation and hypothesis detection. Information Fusion 64:131-148
  160. Shubham K, Venkatesh G, Sachdev R, et al (2020) Learning a Deep Reinforcement Learning Policy for Semantic Age Operations on the Latent Space of Pretrained GANs. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1-8 .IEEE
  161. Karras T, Aila T, Laine S, Lehtinen J (2017) Stepwise growth of GANs to improve quality, stability and variation. arXiv preprint arXiv:1710.10196
  162. Pham QTM, Yang J, Shin J (2020) Semi-supervised FaceGAN for face age progression and regression with synthetic paired images. Electronics 9:1-16. https://doi.org/10.3390/electronics9040603
  163. Zhu H, Huang Z, Shan H, Zhang J (2020) Global Observation, Local Aging: Facial Aging with Attention Mechanism Haiping Zhu Zhizhong Huang Hongming Shan Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, China, 200433. ICASSP 2020 - 2020 Proceedings of IEEE International Conference on Audio, Speech and Signal Processing 1963–1967
  164. Wu S, Rupprecht C, Vedaldi A (2021) Unsupervised Learning of Possibly Symmetric Deformable 3D Objects in Images in the Wild. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3076536
  165. Heidekrueger PI, Juran S, Szpalski C et al (2017) Current preferred lip proportions in women. J Cranio-Maxillofacial Surg 45:655-660. https://doi.org/10.1016/j.jcms.2017.01.038
  166. Baudoin J, Meuli JN, di Summa PG et al (2019) A comprehensive guide to cosmetic restoration of the upper lip. J Cosmet Dermatol 18:444-450
  167. Garrido P, Zollhöfer M, Wu C et al. (2016) Corrected 3D reconstruction of lips from monocular video. ACM Trans Graph 35:1-11. https://doi.org/10.1145/2980179.2982419
  168. Wu C, Bradley D, Garrido P et al. (2016) Model-based tooth reconstruction. ACM Trans Graph 35(6):220-221. https://doi.org/10.1145/2980179.2980233
  169. Wen Q, Xu F, Lu M, Yong JH (2017) Real-time 3D eyelid tracking from semantic edges. ACM Trans Graph 36:1-11. https://doi.org/10.1145/3130800.3130837
  170. Wang C, Shi F, Xia S, Chai J (2016) Real-time 3D eye gaze animation using a single RGB camera. ACM Trans Graph 35:1-14. https://doi.org/10.1145/2897824.2925947
  171. Zhou X, Lin J, Jiang J, Chen S (2019) Learning an improved itracker combined with bidirectional LSTM for 3D gaze estimator. In: Proceedings of IEEE International Multimedia and Exposition. IEEE Computer Society, pp 850-855
  172. Li H, Hu L, Saito S (2020) 3D hair synthesis using volumetric variational autoencoders. ACM Transactions on Graphics (TOG) 37(6):1-12
  173. Ye Z, Li G, Yao B, Xian C (2020) HAO-CNN: Volumetric Vector Field-based Conscious Hair Reconstruction. Comput Animat Virtual Worlds 31:e1945. https://doi.org/10.1002/cav.1945
  174. He H, Li G, Ye Z, et al. (2019) Data-driven 3D Human Head Reconstruction. Comput Graph 80:85-96. https://doi.org/10.1016/j.cag.2019.03.008

Publisher statement : Springer Nature maintains neutrality with respect to legal claims in publishing maps and institutional affiliations.

Guess you like

Origin blog.csdn.net/I_am_Tony_Stark/article/details/132011256