Adversarial Attack Type I: Cheat Classifiers by Significant Changes

Reading Record

 

Abstract

  Despite the depth of the neural network has achieved great success, but confrontational attacks can be replaced by a small number of well-trained classifier deception. In this paper, we propose another type of adversarial attacks can fool the classifier through significant changes. For example, we can significantly alter a face, but the trained neural network is still the opponent and original examples identified as the same person. Statistically, the existing hostile attack increases the Type II error, and attacks made against type I error, so named for the type I and type II hostile attack. Both types of attacks is equally important, but different in essence, it is intuitive interpretation and numerical evaluation. To achieve this attack designed a supervised variational autoencoder with a gradient information update latent variables classifier attack. In addition, the use of pre-trained generation model, the I-type attack submarines space. Experimental results show that the type I generated on a large scale set of the image data against the examples are practical and effective. Unlike most examples can be generated by a defensive type II attack designed detector, and enhanced policy effective only for specific types of attacks, which means that type I and type II attack the root cause.

Guess you like

Origin www.cnblogs.com/lucifer1997/p/11968923.html