CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks

CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks
CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks
https://github.com/dvl-tum/ciagan

ABSTRCT

The unprecedented increase in the use of computer vision technology in society has coincided with growing concerns about data privacy. In many real-world scenarios, such as tracking people or motion recognition, it is important to be able to process data while carefully considering how to protect people's identities. We propose and develop CIAGAN, a conditional generative adversarial network-based image and video anonymization model. Our model is able to de-identify faces and bodies while generating high-quality images and videos that can be used for any computer vision task such as detection or tracking. Unlike previous approaches, we fully control the de-identification (anonymization) process, ensuring both anonymity and diversity. We compare our method with several baselines and obtain state-of-the-art results. To facilitate further research, we provide the code and model at https://github.com/dvl-tum/ciagan

1. Introduction

【Data privacy is becoming more and more important】The widespread use of computer vision technology in society means the automatic processing of large-scale visual data including personal data. While we're eager to leverage technology for home surveillance, video conferencing, and surveillance, we're not willing to sacrifice our personal privacy. In fact, data privacy is gaining more and more attention, and entities such as the European Union have passed laws such as the General Data Protection Regulation (GDPR) [1] to guarantee data privacy. For computer vision researchers, creating high-quality datasets that include people becomes very challenging because everyone in the dataset needs to consent to the use of his or her image data. Recently, the Duke University MTMC dataset [10] was taken offline due to privacy reasons.
The key to our observation is that many computer vision tasks, such as person detection, multi-person tracking, or action recognition, do not need to recognize people in videos, they only need to detect them. Classical anonymization techniques, such as face blurring, alter images significantly, leading to a large drop in detection performance.
We propose a model to anonymize (or de-identify) images and videos by removing person-identifying features , while still retaining the necessary features to allow face and body detectors to work. Importantly, the images should still appear realistic to a human observer, but the people on them should not be identifiable. Our proposed method can be used to anonymize computer vision datasets while preserving necessary information for tasks such as detection, recognition or tracking. We exploit the generative capabilities of Conditional Generative Adversarial Networks (CGAN) [25, 15] to generate realistic-looking anonymized images and videos. In existing GAN-based methods, the image generation process is usually steered by a random noise vector to produce different outputs. Such a random process is not suitable for the purpose of anonymization, because we need to guarantee that the identity has actually changed from the input to the output. To address this issue, we propose a new identity-controlled discriminator. Our CIAGAN model satisfies the following important properties that an anonymization system should have:

  1. Anonymization: The generated output must hide the identity of the person in the original image. Essentially, we are generating a new fake identity from the input image.
  2. Control: The fake identity of the generated image is controlled by a control vector, so we have full control over the true-fake identity mapping.
  3. New identities: The generated images must contain only new identities that do not exist in the training set.
  4. Realistic: Output images must look realistic in order to be used by state-of-the-art detection and recognition systems.
  5. Temporal consistency: For tasks such as person tracking or action recognition, it is necessary to ensure temporal consistency and pose preservation in videos.

By satisfying the above five properties, we guarantee the anonymization of images and videos and the protection of data privacy. At the same time, our method guarantees that the detector will be able to use anonymous data, as our experiments demonstrate.
Our contribution in this area is fourfold:

  • We propose a general framework applicable to person anonymization in images and video streams.
  • We demonstrate that images anonymized with our method can be used by existing detection and recognition systems.
  • We demonstrate state-of-the-art results on several datasets while qualitatively demonstrating diversity and control over the generated images.
  • We performed a comprehensive stripping study showing the importance of each building block in our model.

2. Related work

Face Generation face generation. Generating real faces has been an active research area since the emergence of Generative Adversarial Networks [8, 28] [20, 16, 17]. The current state-of-the-art model [17] is able to generate high-resolution face images by gradually training a large convolutional neural network. Diversity in appearance, ethnicity, hair and eye color is achieved by adaptive instance normalization [13]. Despite the impressive quality of these methods, they cannot control the pose of the generated faces by conditioning on random noise and without information of the original face. Therefore, the fusion of the face with the rest of the body is challenging and remains an open research problem. Therefore, their usability in anonymous applications is limited.
Image-to-Image and Video-to-Video Translation Image-to-image and video-to-video translation. The Pix2Pix network [15] and its unsupervised variant [41] show impressive results on cross-domain image translation (e.g., from winter to summer). However, it is unclear whether they are suitable for making small but important changes to images from the same domain, such as faces or bodies. Closely related to this, there is a recent work on ensuring temporal consistency between videos for the task of face translation [39]. To ensure temporal consistency, [39] conditioned the generator on previous real and generated frames, as well as estimated optical flow between frames. While this work demonstrates smooth temporal consistency, the generated faces are often very similar to the original identities, unsuitable for the anonymization task.
Face Anonymization face anonymization.Until recently, face anonymization has been achieved by pixelating, blurring, or masking faces. Alternatively, [32] proposes to use a segmentation-based approach. Since these operations are based on heuristics rather than learning, there is no guarantee that these operations are optimal for the de-identification task. Crucially, these methods often render faces undetectable and thus unusable in standard computer vision pipelines. We advocate the use of machine learning to achieve anonymization to preserve important features necessary for computer vision tasks such as detection and tracking. This has been studied in [29,34,14,35,7]. However, all of these works have important limitations. In general, the faces generated by [14] can still be recognized by humans. [29] has a similar problem, moreover, the method has no control over the generation process, each identity is mapped to the same fake identity. The work of [34] focuses on changing facial landmarks, which can lead to unnatural results. Furthermore, their method has no explicit control over the generated appearance. The results of [35] are visually appealing, but the method is not computationally efficient due to the optimization procedure for face alignment. Furthermore, since the method is based on parametric facial models, it is designed to only deal with faces, so it cannot be directly extended to other domains, such as full human bodies.
The state-of-the-art method is [7], where the authors demonstrate good qualitative results and unprecedented de-identification rates. However, while the generated images can fool recognition systems, in general humans can identify the identities of the faces presented. More critically, except for [14] and [7], none of these methods attempt to process videos. [14] showed limited experiments in video processing, but temporal consistency is not well preserved. [7] shows very good temporal consistency, but as in the case of images, some identities are clearly not anonymized and are easily spotted by the human eye. Furthermore, they lack the control and diversity to display different anonymous outputs for the same input facet. Our CIAGAN model provides a general framework for anonymization of images and videos. We can directly control the recognition process by providing labels for the identities we want to generate and the hybrid styles of different identities. This not only produces high-quality images, but also has higher variability between images of the same identity (see Figure 1).
insert image description here
Figure 1: Given an image of a face, our network anonymizes the face according to the desired identity. In the figure, the variability of the generated faces controlled by the given labels can be seen. In each triplet, the first image is the real image, while the other two images are different anonymous versions of the real image.

3. CIAGAN

In this section, we detail methods for anonymizing images and videos. Our proposed Conditional Identity Anonymous Generative Adversarial Network (CIAGAN) leverages the power of Generative Adversarial Networks to generate realistic images. To control the identity generation process and guarantee anonymity, we propose a new identity discriminator to train CIAGAN. In the remainder of this section, we refer specifically to face anonymization, although the method is directly applicable to full bodies.

3.1 Method overview

We present a complete schematic of CIAGAN in Figure 2. The main components of our method are as follows:
Pose preservation and temporal consistency Pose preservation and temporal consistency. We propose to use landmark-based input face (or body) representations. This has two advantages: it ensures pose preservation, which is especially useful for things like tracking, and it provides a simple but effective way to maintain temporal consistency when processing video.
Conditional GAN ​​Conditional GAN. We leverage the generative power of GANs to produce realistic results. It is important that standard detection and tracking systems can be applied to the generated images without loss of accuracy. Naturally, these realistic faces are easily detected. We achieve pose preservation by conditioning on landmark representations. We train conditional GANs in an adversarial game-theoretic fashion, where the discriminator judges the realism of images generated by the generator.
Identity guidance discriminator Identity guidance discriminator. We propose a new module to control generator injection of recognized features to create new images. The discriminator and the generator play a cooperative game where they work together to achieve the common goal of generating realistic, anonymous images. We will now describe the three modules of the method in more detail.
insert image description here
Figure 2: Our CIAGAN model takes as input an image, its landmarks, a masked face, and a desired identity. The generator is an encoder-decoder model, where the encoder embeds image information into a low-dimensional space. The identities, given as one-hot labels, are encoded through a transposed convolutional neural network and fed at the bottleneck of the generator. The decoder then decodes the combination of source image and identity information into a generated image. The generator plays an adversarial game against the discriminator in a standard GAN setting. Finally, we introduce an identity discriminator network whose purpose is to provide the generator with a guiding signal about the desired identity of the generated faces.

3.2 Pose preservation and temporal consistency

Several de-identification methods [29, 7] take RGB images of faces as input to be anonymized. In the generated images, there is usually some leakage of facial information, which is not surprising. Therefore, while these methods produce high-quality images, the generated facial identities are not fully anonymized and can often be identified by people.
Landmark image. To ensure that our generated faces are not tied to the original identities, we propose to abstract faces. More precisely, we use facial landmark images. This has two advantages: (i) the landmark image contains a sparse representation of the face, leaving little identity information and avoiding identity leaks; (ii) the generator is conditioned on the face shape, allowing us to preserve the pose of the input in the output . This is especially important since we intend to use the generated images and videos as input to computer vision algorithms. In many vision applications, such as tracking, methods often exploit facial or body poses. Therefore, it is useful to ensure that the method does not change the pose of the anonymous face or body. To hide as much identity information as possible, but still maintain the pose, instead of using all 68 landmarks [18], we only use the face contour, mouth, and nose bridge (see Figure 2). This allows the network the freedom to choose some facial features, such as eye distance or eye shape, while at the same time, the preservation of expressions that depend on the mouth region, such as smiling or laughing, and the global composition is determined by the position of the nose. Landmarks are represented as binary images, which are fed as input to the generator.
**Masked background image. **Our goal is to generate only the face region of the image and embed it into the original image background. This allows our algorithm to focus its learning power on face generation (rather than background generation), while ensuring that we do not have background variations that could interfere with detection or tracking algorithms. To this end, we provide generative models with masked background images and landmark images. The masked background image still contains the forehead area of ​​the head. Once the generator has this information, it can learn to match the skin appearance of the generated face to the skin color of the forehead. This will result in better visuals overall. In the case of multiple faces in the same image, we detect each face on the image and apply our anonymization framework in turn.
Our pipeline can also be used for full-body anonymization by simply replacing the mask image with a segmentation mask representing body contours. In our case, we do not use body joints as a surrogate for landmark images, since a person's silhouette is sufficient as a prerequisite for pose.
Temporal consistency time consistency. To process video, any deanonymization pipeline must ensure temporal consistency of the generated images over the video sequence. State-of-the-art video translation models [39] ensure temporal consistency by using a discriminator conditioned on the optical flow between corresponding frames. Optical flow is computed via an external neural network [6], which makes the framework both complex and computationally expensive. In our work, we get temporal consistency for free due to the nature of the input representation. Landmarks at each frame are smoothed over adjacent frames using spline interpolation. Therefore, we provide the same framework for images and videos, the only difference being computationally cheap interpolation at inference time.

3.3 Conditional generative adversarial networks Conditional generative adversarial networks

GAN framework. Simply put, GANs combine two neural networks: a generator G whose goal is to generate samples that look real, and a discriminator D whose goal is to distinguish real samples from generated samples. The network is trained in an adversarial manner, training D to maximize the probability of assigning the correct label to the training and generated samples, and training G to minimize the probability of D predicting the correct label for the generated samples. In other words, D learns to separate real samples from generated samples, and G learns to trick D into classifying generated samples as real. GAN training is notoriously difficult and requires many tricks [23, 9, 4]. In this study, we choose to train CIAGAN with the LSGAN loss function [23]. The idea of ​​using a least squares loss function for GAN training is simple yet powerful: the least squares loss function is able to move fake samples towards the decision boundary, because it also penalizes correctly classified samples that are still far from real samples. This is in contrast to the cross-entropy loss, which primarily penalizes misclassified samples. Based on this property, LSGAN is able to generate samples that are closer to real data.
Under the LSGAN setting, the objective function of the discriminator is defined as follows:
insert image description here
where a and b are the labels of fake and real data.
The loss of the generator is defined as:
insert image description here
LSGAN can be replaced by any other common loss function used for GAN training without loss of generality [9, 4].
Conditional GANs. In a classic GAN training setup, a random noise vector is given as input to the generator in order to provide variability to the generated images. In our case, to maintain pose and temporal consistency, the generated faces must be aligned with landmarks in the input image. Also, we need to seamlessly blend the generated faces with the background. For this, we use the conditional GAN ​​framework [15], where we constrain the generator using a landmark and a mask image (background), as described in Section 3.2. The generator uses an encoder-decoder architecture [22]. The encoder converts the landmark and mask images into a low-dimensional representation (bottleneck) and combines it with the identity representation, while the decoder takes the combined representation and upsamples it to generate an anonymous RGB image.

Identity guidance

With the two modules explained above, our model learns to generate realistic-looking faces that maintain the pose of the original image. However, if all the variability in image generation is provided by landmark inputs, the network will quickly overfit on the training set, effectively only doing image reconstruction. By doing so, it generates faces very similar to those in the training dataset, losing the ultimate goal of anonymization. To address this issue, we introduce a new identity-guided discriminator. More precisely, for each given real image, we randomly select the desired identity for its corresponding generated image. This identity—represented on a one-hot vector—is used as input to a transposed convolutional neural network. The network generates a parameterized version of the identity and feeds it into the generator's bottleneck. In this way, the generator learns to generate faces with some characteristics of the desired identity, while maintaining the pose of the real image. In other words, the resulting image is a combination of landmark identities and desired identities. The generated image's logo must not be identical to any real logo in order for the generated image to be unrecognizable.
The identity discriminator is designed as a Siamese neural network pre-trained using the proxy-NCA loss [26]. Pre-training is done using real images, where the discriminator is trained to aggregate features from images belonging to the same identity. In GAN training, we fine-tune the Siamese network using a contrastive loss [2]. In this fine-tuning step, we allow the siamese network to combine fake and real image ID representations. Jointly train an identity discriminator and generator in a collaborative manner. The goal of the identity discriminator is to provide a guiding signal to the generator to create images similar to the representations of a particular identity.
An example of multi-object tracking. Of particular concern is the control over the generation of false identities. We need to be able to keep the same true-false identity mapping across sequences captured by one camera, e.g., multi-object tracking, but simultaneously change the mapping for different cameras to avoid long-term tracking and potential data misuse. To do this, when a person moves from camera to camera, we replace it with a new control vector, which in turn gives the person a new identity. This is a simple and powerful method for multi-object tracking within the frames captured by the camera without undesirable long-term tracking consequences.

4. Experiment

In this section, we compare CIAGAN with several classical learning-based methods commonly used for identity anonymization. Our method achieves state-of-the-art qualitative and quantitative results on diverse image and video datasets. We also present a comprehensive set of stripping experiments to demonstrate the effect of our design choices. We first introduce the datasets, evaluation metrics and baselines used in this section.
data set. We conduct experiments on 3 public datasets:

  • CelebA [21] This dataset contains 2,020,599 face images of 10,177 unique identities. We use an aligned version with each image centered at a point between the person's eyes, then pad and resize to a resolution of 178 × 218 while maintaining the original facial proportions. Each identity contains up to 35 photos. We construct facial landmarks for each face using HOG [5].
  • MOTS [38] Our method can also be adapted to other domains such as whole-body anonymization. Instead of facial markers, we use body segmentation masks. The dataset contains 3425 videos from 1595 different people.
  • Labeled Faces in the Wild (LFW) [12] This dataset consists of 6000 pairs of images divided into 10 different parts, where half of the images contain the same identity and the rest of the images contain different identities.

baseline. We compare it to standard anonymization methods as well as learning-based methods.

  • Simple Anonymization methods We use pixelated, blurred and masked faces and compare with our method.
  • Image Translation methods We use the popular pix2pix [15] and CycleGAN [41] methods. We use the official code given by the authors and present the results in the supplementary material.
  • Face Replacement methods We compare the results of de-identification with the state-of-the-art results given by [7].

4.1 Implementation Details

We use the Dlib-ml library [18] to generate landmarks and masks. We train our network on 128 × 128 resolution images and use the encoder-decoder U-Net [31] architecture as the generator. The unit vectors are parameterized by a transposed convolutional neural network consisting of a fully connected layer followed by multiple transposed convolutional layers. The feature and identity branches from landmarks are concatenated in the bottleneck of the generator. For the discriminator, we used a standard convolutional neural network with the same architecture as the identity-guided network. We train our model for 60 epochs using the ADAM optimizer [19] with a learning rate of 1e−5. We set the β hyperparameters β1, β2 to 0.5 and 0.9. In a single GPU, the total training time of the model is one day. In the supplementary material, we give the remaining implementation details and network architecture.

4.2 Evaluation indicators

We evaluate all models on face detection and re-identification metrics. We perform detection using HOG [5] and SSH Detector [27]. To evaluate the performance of the detector, we use the percentage of detected faces. For re-identification, we train a Siamese neural network using Proxy-NCA [26]. Furthermore, we use a pretrained FaceNet model [33] based on the Inception-Resnet backbone [36]. We use the standard Recall@1 evaluation metric for re-identification. It measures the proportion of samples whose closest samples are from the same class. This metric can range from 0 to 100, with 0 being no recognition at all and 100 being full recognition. Note that in a balanced dataset, a random classifier will produce (on average) Recall@1 1/|C|, where C is the number of classes. Finally, we quantitatively evaluate the visual quality of generated images using Fréchet Inception Distance (FID) [11], a metric that compares the statistics of generated samples to those of real samples. The lower the FID, the better, and the more similar the corresponding real samples are to the generated samples.

4.3 Peeling experiment

In this section, we conduct stripping experiments on our method to demonstrate the value of our design choices. In Table 1, we show several variants of the model. Siamese represents our full model with a Siamese identity discriminator and uses landmarks as input. Classification refers to the replacement of conjoined identifiers with classification networks. We can see that the detection results drop by more than 35 percentage points (pp). We also show what happens if the input is not a landmark, but an entire face image. In these cases, the detection rate drops by 1.6pp and the FID score increases, indicating that faces are more difficult to detect and the visual quality is lower.
insert image description here
Table 1: Stripping studies of our model. The first row is the result of our model, the second row is the result of the model with the classification network instead of the Siamese identity-guided network, and the third row is the result of the model with the generator accepting full face images instead of landmarks.

4.4 Quantitative results

detection and identification. The first experiment evaluates two important features that an anonymization method should have: high detection rate and low recognition rate. That is, we don't want the trained system to be able to find the identity of newly generated faces, but at the same time, we still want the face detector to have a high detection rate.
In Table 2, we show the comparison of the detection and recognition results of our method with other methods on the CelebA dataset [21]. Classical HOG [5] and deep learning based SSH [27] detectors achieve almost 100% detection rates in our anonymized images. Blurring methods have much lower detection rates on images, while pixelated images fail to detect faces at all.
Recognition performance drops from over 70% on the original dataset to 1 - 1.5% on our anonymized images. Images generated by CIAGAN were almost unrecognizable by both recognition systems. Note that the pixelation approach achieves a recall of 0.3%, which is equivalent to random guessing, but at the cost of removing everything in the image, making both detection and recognition impossible. In a setting where we want to go further and use computer vision algorithms on anonymized data, neither pixelation nor blurring is an option.
insert image description here
Table 2: Results of existing pre-training methods for common detection and recognition. Lower (↓) results mean better anonymization. The higher the result (↑), the better the detection effect.

Recognition based on landmarks. Assuming that the input to our generator is a landmark image rather than the actual image we want to deanonymize, one could argue that recognition methods that focus on image pixels are easily fooled by our method, as shown in Table 2. What happens if CIAGAN is attacked by a recognition method trained only on landmarks? Will it still maintain the anonymity function? We perform this experiment by training a similar recognition method as before [26], but with landmarks as the only input. We evaluate that using only landmarks, we can recognize up to 30.5% and 70.7% when using full images. However, when the same identifier is used for landmarks extracted from our anonymized faces, the performance drops to 1.9%. Even if raw landmarks are used as input to the generative model, CIAGAN only uses them as prior information in order to fuse them with information from the embedded network.
Are we just swapping faces? In Section 3.3, we introduce a new identity-guided network that guides the generator to generate images with similar characteristics to a given identity. One could argue that by doing this, the generator is learning to do only face swapping, replacing the faces of the chosen identities with landmarks from the source image. We demonstrate that this is not the case by evaluating the recognition rate of our generated images on a training set of real images. We set the label of the resulting image to the label of the desired identity. If the generator only learns face swapping, then the recognizer will be able to correctly recognize all generated images. However, we demonstrate that this is not the case. Neither FaceNet [33] nor our model trained in P-NCA [26] can achieve higher recognition rates than random guessers. Furthermore, in Fig. 3 we present a qualitative experiment where the first image of each row contains the source image, while the first image of each column is an image randomly selected from the desired identity. Generate other images. We see that the generated images have high-level characteristics of a given identity (such as race or gender), but are very different from the real images of these identities.

4.4.1 Comparison between de-identification and SOTA

In this section, we compare the de-identification (anonymization) ability of our model with the state-of-the-art [7]. We follow their evaluation protocol on the LFW dataset [12]. The dataset consists of 10 different segments, each containing 600 pairs. A pair is defined as positive if two elements have the same identity, and negative otherwise. In each split, the first 300 pairs are positive and the remaining 300 pairs are negative. Like [7], we anonymize the second image in each pair.
We use the FaceNet [33] recognition model, pre-trained on two public datasets: VGGFace2 [3] and CASIA-Webface [40]. The main evaluation index is the true acceptance rate: the ratio of the true positive is the false positive ratio with a maximum of 0.001. We show the results in Table 3. The network evaluated on real faces scores close to 0.99, which is almost perfect recognition. [7] achieved impressive anonymity performance by achieving a score of less than 0.04 using a network trained on both datasets. CIAGAN improves on this result by reducing the recognition rate to 0.034 using the network trained in [3] and 0.019 using the network trained in [40], thus improving anonymization. On average, CIAGAN improves the de-identification rate by 10.5% on the first dataset and 45.7% on the second dataset, while maintaining a high detection rate of 99.13%. The average true positive rate of 2.65% shows that even a near-perfect system is completely unable to find real identities in our cigan-processed data, showing the advantage of our method in achieving image anonymization.
insert image description here
Table 3: Comparison of LWF datasets with SOTA. A lower (↓) recognition rate means better anonymization.

4.5 Results of Visual Quality

As can be seen from Table 1, our method achieves a FID score of 2.08. Simple baselines, such as blurring and image translation methods, achieve significantly higher (worse) FID scores. Qualitative results for FID score comparison and baseline can be found in the Supplementary Material.
We show a range of qualitative results. In Figure 1, we show the diversity of generated images when the control vector of the identity discriminator is varied. We see that the generated images have high-level features of the desired identity (such as eye shape, race, or gender) while generating realistic images.
In Fig. 4, we qualitatively compare our results with those of [7]. We find that not only does our method provide images that are less similar to the source images, but by varying the control vectors, our network is able to provide more diverse images than [7], where the authors gradually vary their control parameter λ( We can still recognize N. Cage).
insert image description here
Figure 4: Qualitative comparison with [7]. The images in the first column are the source images. In the first row we show images generated from the [7] framework, while in the second image we show images generated from CIAGAN.

In Fig. 5, we demonstrate the temporal consistency of our method and compare the results with those of [7]. We see that pose is preserved in all cases, yielding excellent temporal consistency. At the same time, we see that the CIAGAN version working with landmarks produces better images than the version trained on full faces.
insert image description here
Figure 5: Qualitative comparison with [7] in terms of temporal consistency. From left to right: original frame; faces generated by our model using faces as input; faces generated by our model using landmarks as input, faces generated by [7].

Finally, in Figure 6, we show an experiment with whole-body anonymization. The first image in each row is the source image, while the other images are generated anonymous images. In each case, we see that the resulting image maintains the same pose as the corresponding source image, but with changes in clothing, color, and other parts of the body. To the best of our knowledge, this is the first time that face and body de-recognition has been successfully performed using the same framework.
insert image description here
Figure 6: Full-body anonymization results of our framework on the MOTS dataset.

5. Conclusions and future work

Data privacy in images and videos is a serious concern. As computer vision researchers, our goal is to do our job technically well. In this paper, we propose a framework for face and body anonymization in images and videos. Our CIAGAN model is based on conditional generative adversarial networks, and the anonymization of faces is based on the guided identity signals provided by Siamese networks. We have shown that our method outperforms the state-of-the-art in de-identification, while exhibiting substantial diversity in the generated images.
A shortcoming of all current de-identification methods [30, 34, 7] is that an initial detection of the original face is required before it is anonymized. Therefore, any faces that are not detected cannot be anonymized. Therefore, these methods cannot be deployed in systems where anonymity must be guaranteed. Our method suffers from a similar problem since it relies on landmark detection. As future work, we plan to work on full image anonymization and further remove the need for landmark detection in order to be able to handle extreme poses.

Guess you like

Origin blog.csdn.net/weixin_45184581/article/details/127243336