Literature Review|Adversarial Example Attacks on Image Description Models

Foreword : Image description adversarial attack aims to attack the normal image description model and add perturbation to the normal input image to obtain adversarial samples, so that the normal model can output target sentences or target keywords. The current related work is roughly summarized as follows. This review was originally written on 29 August 2022.


Some related work introduction

Shekhar et al. pointed out in [1] that the image description model does not capture the relationship between the modalities well. The author constructs the FOIL-COCO dataset by adding image error descriptions to the MSCOCO dataset, and describes classification, abnormality Experiments were carried out from three angles of word detection and abnormal word correction, and the defects of the image description model were verified, which laid the foundation for the subsequent image description attack work. In this paper, a multimodal bidirectional LSTM (Bi-LSTM) model is used for experiments.

insert image description here

Chen et al. [2] proposed the Show-and-Fool method to study the robustness of language models against perturbations in the field of machine vision perception. By constructing adversarial samples, they mislead the model to generate randomly selected descriptions or keywords. The attack model chooses the Show-and-Tell model, and constructs adversarial sample images for directional description and directional keyword scenes respectively.

See https://github.com/huanzhang12/ImageCaptioningAttack for source code details

Ji et al. [5] constructed adversarial samples by removing target words from normal image descriptions while ensuring the quality of residual descriptions. The designed loss function is as follows:
insert image description here

where, L rem L_{rem}LremEnsure that the frequency of occurrence of the target word is low enough, L acc L_{acc}LaccGuaranteed quality of residue description, L fil L_{fil}LfilMake sure that the generated descriptions after adding perturbations do not introduce unwanted visual noise. the target object. The last term is used to control the degree of perturbation generated to ensure the visual quality of the adversarial examples.

insert image description here

The author also proposes an evaluation index of attack quality, which ensures that the attack success rate is high enough while ensuring that the residual description is of the same quality as the original description. It is defined as follows, where AR ARAR is obtained by calculating BLEU, CIDEr and other evaluation indicators,SR SRSR is the attack success rate, and the attack is considered successful only when any target word does not appear in the generated description.

insert image description here

Zhang et al. [7] designed a loss function in the complex domain (as shown in the figure below), and used word embedding to add disturbance to generate adversarial samples. The semantic vector corresponding to the adversarial sample is used as the imaginary part of the loss function, and the semantic vector corresponding to the original image is used as the loss function. The real part of , the designed loss function is as follows:

insert image description here
insert image description hereinsert image description here
insert image description here

where L b L_{b}LbThe term ensures that the adversarial examples are as similar as possible to the original image. In this paper, the Show-and-Tell model is selected, which can successfully implement white-box and black-box attacks at the word level and sentence level, and its performance is better than that of the Show-and-Fool method [2], and the migration of the generated adversarial samples is verified.

Figure 2 Schematic diagram of complex number domain anti-disturbance

Chen et al. took a different approach. In [10], they proposed to take the generation efficiency as the attack target for the first time, and designed a NLCGSlowDown method to generate as long sentences as possible and reduce the generation efficiency.
insert image description here


Summary and Outlook

To sum up, in terms of the accuracy and relevance of generation, due to the difficulty in aligning the semantic information between modalities, the effect of cross-modal generative model generation is difficult to guarantee; at the same time, the particularity of the generation task makes it difficult The generation efficiency in the scene has attracted much attention. Existing work is mainly carried out in the two aspects of generation correlation and generation efficiency. At present, research on the security of multimodal tasks is also being carried out, such as the problem of generating hallucinations in cross-modal models (see this blog ) and the problem of text steganography in cross-modal models.

Postscript: Due to the change of personal research direction, the follow-up work in this field will not be followed up, and the references cited in this review will only be updated to 2022.


references

  1. Ravi Shekhar, et al. FOIL it! Find One mismatch between Image and Language caption, ACL, 2017.
  2. Hongge Chen et al. Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning, ACL, 2018.
  3. Xiaojun, Xu, et al. Fooling Vision and Language Models Despite Localization and Attention Mechanism, CVPR, 2018.
  4. Yan, Xu, et al. Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables, CVPR, 2019.
  5. Jiayi Ji, et al. Attacking Image Captioning Towards Accuracy-Preserving Target Words Removal, ACM MM, 2020.
  6. Malhar Jere et al. Scratch that! An Evolution-based Adversarial Attack against Neural Networks, arXiv, 2020.
  7. Shaofeng Zhang, et al. Fooled by imagination: Adversarial Attack to Image Captioning via Perturbation in Complex Domain, ICME, 2020.
  8. Akshay Chaturvedi and Utpal Garain. Mimic and Fool: A Task-Agnostic Adversarial Attack, TNNLS, 2021.
  9. Nayyer Aafaq, et al. Controlled Caption Generation for Images Through Adversarial Attacks, arXiv, 2021.
  10. Simin Chen et al. NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models, CVPR, 2022.
  11. Mirazul Haque, et al. CorrGAN: Input Transformation Technique Against Natural Corruptions, CVPR workshops, 2022.
  12. Hanjie Wu, et al. Learning Transferable Perturbations for Image Captioning, TOMCCAP, 2022.

おすすめ

転載: blog.csdn.net/qq_36332660/article/details/132277893