Paper Study 1——Adversarial Example (Adversarial Example) Review (2018 Edition)

论文地址:Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

Part of the content refers to scientific research paper 2: Adversarial Example (Adversarial Example) review
The summary of the heart of the machine is also very good Review paper: 12 attack methods and 15 defense methods against attacks

1. Main content

Affirming the contribution of deep learning in fields such as computer vision, but deep learning models are vulnerable to low adversarial attacks, this paper conducts a comprehensive survey and summary of adversarial attacks, including adversarial attacks and their methods for tasks such as image classification, and Methods for handling adversarial attacks in real-world conditions are introduced. At the same time, it also summarizes and answers how to defend against these attacks.

2. Definition of terms

  • Adversarial example/image: is a modified version of a clean image that is intentionally perturbed (e.g. by adding noise) to confuse/fool a machine learning (e.g. deep neural network)
  • Adversarial perturbation: noise added to a clean image to make it an adversarial example
  • Adversarial training*: In addition to using the original clean images, also use adversarial samples/images to train the model
  • Adversary: ​​refers to the agent that generates the adversarial sample, and sometimes the adversarial sample itself is called the adversary
  • Black-box attacks: Adversarial examples are generated without knowledge of the model. In some cases, the adversary is assumed to have limited knowledge of the model (e.g., its training process/or its architecture), but certainly no knowledge of the model's parameters. In other cases, using any information about the target model is known as a "semi-black box" attack.
  • Detector: A mechanism for (only) detecting whether an image is an adversarial example
  • Fooling ratio/rate: The proportion of a trained model that changes its original predicted class after the image is disturbed.
  • One-shot/One-step methods: Generate adversarial perturbations by performing single-step calculations, such as calculating the gradient of a model loss (loss). The counterpart to this is the iterative approach, which performs the same computation multiple times to obtain a single perturbation, which is often computationally expensive
  • Quasi-imperceptible (Quasi-imperceptible): The interference introduced by adversarial examples can be so small that it cannot be perceived by humans
  • Rectifier: Correct the adversarial example so that its prediction on the target model is consistent with the original clean example.
  • Targeted attacks: Let the model misclassify adversarial examples into a certain category. Corresponding to this is the untargeted attack, which has a relatively simple purpose. It only seeks to make the model prediction wrong, and is not specific to a specific category.
  • Threat model: Refers to potential attacks considered by a method, such as black-box attacks
  • Transferability: A characteristic that an adversarial example can maintain its effectiveness even when it attacks other models (referring to the model that is not used to generate the adversarial example).
  • Universal perturbation: Able to fool the model on any image. Generality refers to the nature of the interference without any knowledge of the image, unlike the previously mentioned transferability.
  • White-box attacks: Assume complete knowledge of the target model, including its parameter values, architecture, training method, and in some cases its training data.

3. Against attack

Main content: A review of adversarial attack methods to fool deep neural networks primarily in a "lab setting", organized chronologically, presents technical details of popular methods and some representative techniques of emerging directions in the field. It is mainly divided into two parts, one is to attack the method of deep neural network to perform the most common task in computer vision, that is, classification/recognition , and the other is mainly used to attack deep learning methods other than this task

3.1 Classification attack

Box-constrained L-BFGS (box-constrained L-BFGS)

insert image description here

Fast Gradient Sign Method (FGSM) (fast gradient sign method)

The work of three teams: Goodfellow et al., Kurakin et al., Miyato et al. Collectively, all of these methods are considered " one-step " or " one-shot " methods.

insert image description here

Basic & Least-Likely-Class Iterative Methods (BIM method) - basic iterative methods

The "one-step" method perturbs the image by taking a large step (i.e., one step of gradient ascent) in the direction of increasing the classifier's loss. An intuitive extension of this idea is to iteratively take multiple small steps in adjusting the direction after each step, equivalent to an ∞ version of projected gradient descent (PGD), a standard convex optimization method

Jacobian-based Saliency Map Attack (JSMA) Saliency Mapping Attack Based on Jacobian Matrix

This method creates an adversarial attack by limiting the 0-norm of the perturbation (mostly adopting the ∞-norm or 2-norm), which only needs to modify a few pixels in the image instead of perturbing the entire image to fool the classifier. The algorithm modifies a clean image one pixel at a time and monitors the effect of that change on the resulting classification

Carlini and Wagner Attacks (C&W)

Carlini and Wagner proposed three adversarial attack methods after distilling defenses. Their work shows that defense distillation against the target network is almost completely ineffective against these attacks. We also show that adversarial examples generated using an insecure (undistilled) network transfer well to a secure (distilled) network, making the computed perturbations suitable for black-box attacks.
In addition, other attack methods are mentioned in the paper, such as One Pixel Attack , UPSET and ANGRI , Houdini , Adversarial Transformation Networks (ATNs) , Miscellaneous Attacks ,

The methods involved are summarized as follows:
insert image description here

3.2 Attacks in Other Aspects

Attacks on Autoencoders and Generative Models (encoding), Attacks on Deep Reinforcement Learning (enhanced learning), Attacks on Semantic Segmentation and Object Detection (semantic segmentation and object detection)

4. Attacks in the real world

Such as facial attribute attack, mobile phone camera attack, road sign attack, 3D object attack (more emphasis on real-world objects)

5. The Existence of Universal Disturbance

Moosavi-Dezfooli et al. originally argued that pervasive adversarial perturbations exploit geometric correlations between decision boundaries induced by classifiers, and that they owe their existence in part to a subspace containing the normals of decision boundaries. They confirm that There is a common direction (shared across data points) along which to lay down the decision boundary of a classifier can be highly positively curved. In addition, Fawzi et al. and Tramer et al. also believe that the curvature of the decision boundary near data points is related to the vulnerability of models such as classifiers in the face of attacks. This also triggered the emergence of the later GAN model.

6. Counter defense

The current defense methods can be roughly divided into the following three categories:
insert image description here

Counter defense method method description Methodology
Modified training/input Modified training/input During learning, use the modified training; or during testing, use the modified input Resist perturbations by manipulating input data or training
Modified networks Modified networks Modify the network, such as by adding more layers or sub-networks; change the loss function or activation function (loss or activation function) Starting from the causes of network vulnerability, such as penalty loss function, etc.
Network add-on increases additional network Use an external model as an additional network when classifying unseen samples Add methods such as "pre-input" layers to detect perturbations and fix the input or use GAN ideas

Summary: The first approach does not deal directly with the learned model. On the other hand, the other two categories are more concerned with the neural network itself. These methods can be further subdivided into two types: (a) full defense; (b) detection only. 完全防御The goal of the method is to get the network to identify adversarial examples as the correct class. Approaches , on the other hand, 仅探测imply raising alarms on adversarial examples to deny any further processing.

For the latter two there are two more categories

category describe
Complete defence Enables the network to improve its robustness against adversarial examples
Detection only Discover potential adversarial sample input, and then reject the sample for subsequent processing

7. Summary

insert image description here
The internal max problem simply means that the added disturbance should confuse the network as much as possible. The external min is the optimization minimization formula for the neural network, that is, when the disturbance is fixed, the trained network model can minimize the loss on the training data, that is, improve the robustness of the model.

Guess you like

Origin blog.csdn.net/weixin_45845039/article/details/127616829