[Paper Notes] Towards Evaluating the Robustness of Neural Networks

Someone previously proposed the distillation network , saying that it can provide strong robustness for the target network model, and can sharply reduce the success rate of existing attacks from 95% to 0.5%. The author proposed a new attack method and successfully attacked the distillation network. (C&w)

Article introduction:

  • Prove that defensive distillation cannot significantly improve the robustness of the model
  • Introduced 3 new attack algorithms, which can achieve 100% attack success rate in distilled and undistilled neural networks
  • The attacks in this article are generally more effective than previous attacks
  • The adversarial examples in this article can be migrated from an insecure network model to a distilled (safe) network model.
  • Data set: MNIST, CIFAR-10, ImageNet
  • This article mainly studies Targeted Attacks
  • Since defensive distillation cannot really avoid the appearance of adversarial examples, the author believes that the reason may be that the existence of adversarial examples is due to the local linear nature of the neural network.

Distillation network

The essence is a kind of model compression.
Insert picture description here
We can have a complex and powerful Teacher Model (abbreviated as Net-T), and another simple and weak Student Model (abbreviated as Net-S), which is completely learned by Net-T, and then by Net-S learns both the soft-target (soft label" refers to the feature map output by the large network after each layer of convolution) and hard-target at the same time , and finally Net-S is used as the application model, and Net-T Do not deploy and go online

So~Knowledge distillation is a kind of "pseudo" compression, or a generalized compression method

Why learn soft-target?
For example: we have a handwritten font recognition and classification task. Among the numbers from 0 to 9, 7 and 1 are written very similar, but 7 and 5 are very different. Hard-target only tells us that this picture is 7, but Logit also told us: this picture has a high probability of 7 and a small probability of 1, which is hardly like other numbers. The amount of information carried in this is the knowledge we hope Net-S will learn later! ! ! ヾ(๑╹◡╹)ノ"

What is this softmax-T?

Insert picture description here

The above formula ~ We all know what softmax is. The T of softmax-T means Temperature, which is a small parameter that needs to be uniformly divided before softmax operation. This small parameter has the following properties:

  • If T is set to 1, the formula is softmax, and the probability of each category is output according to logit;

  • If T is close to 0, the maximum value will be closer to 1, and other values ​​will be close to 0, which is similar to onehot encoding

  • If T is larger, the distribution of the output result is smoother, which is equivalent to a function of smoothing, which plays the role of retaining similar information

What do you do with this parameter? The reason is this:

If we use the original softmax, we all understand its properties. It will make the category of the maximum value appear "greater" in probability after the natural exponential operation. For example, if in the handwriting font recognition of 0-9, the prediction of the three values ​​of 7-1-5 before the softmax conversion is [7,2,1], after the softmax conversion, it is basically It becomes a result that is indistinguishable from [1,0,0]. This is not our original intention to learn logit (๑ó﹏ò๑)...

step

  1. According to (N1, N2,..., Nt) design a simple network N0.

  2. Collect simple model training data. The training data here can be the labeled data for training the original network, or it can be additional unlabeled data.

  3. Input the samples collected in 2 into the original model (M1, M2,..., Mt), and modify the temperature parameter T in the softmax layer of the original model to a larger value such as T=20. For each sample in each original model, the final classification probability vector can be obtained, and the maximum probability is selected as the model's judgment result for the current sample. For t original models, t probability vectors can be used. Then take the mean of the t probability vector as the final probability output vector of the current sample, record it as soft_target, and save it.

  4. The data collected in the label fusion 2 is defined as hard_target, the hard_target of the labeled data is its label value 1, and the hard_taret of the unlabeled data is 0. Target = a hard_target + b soft_target (a+b=1). Target is ultimately used as a label of training data to train a streamlined model. The parameters a and b are used to control the weight of label fusion, and the recommended empirical value is (a=0.1 b=0.9)

  5. Set the temperature parameters of the softmax layer of the simplified model and the temperature used when the original complex model generates the soft-target, and train the simplified network model according to the conventional model.

  6. During deployment, reset the softmax temperature parameter in the simplified model to 1, that is, use the most original softmax

Guess you like

Origin blog.csdn.net/weixin_45019830/article/details/108305219