文章目录

FGSM介绍

Adversarial Example
FGSM原理

代码实现

1、导入需要的库
2、载入MobileNetV2模型
3、图像预处理
4、将图像输入模型并得到概率最高的分类结果
5、计算梯度
6、将要添加的噪声打印出来
7、定义函数来显示图像
8、加入噪声后再将图像输入模型进行判断

FGSM介绍

Adversarial Example

Adversarial example是为了混淆神经网络而产生的特殊输入，会导致模型对给定输入进行错误分类。这些输入对人眼来说是无法辨别的，但却导致网络无法识别图像的内容。FGSM（Fast Gradient Signed Method） 是一种白盒攻击，其目标是确保分类错误。
关于攻击的分类有很多种，从攻击环境来说，可以分为黑盒攻击，白盒攻击或者灰盒攻击：

黑盒攻击：攻击者对攻击的模型的内部结构，训练参数，防御方法（如果加入了防御手段的话）等等一无所知，只能通过输出与模型进行交互。
白盒攻击：与黑盒模型相反，攻击者对模型一切都可以掌握。目前大多数攻击算法都是白盒攻击。
灰盒攻击：介于黑盒攻击和白盒攻击之间，仅仅了解模型的一部分。（例如仅仅拿到模型的输出概率，或者只知道模型结构，但不知道参数）

FGSM原理

在这里插入图片描述
在这里，从熊猫的图像开始，攻击者在原始图像上添加小的扰动，这导致模型将此图像标记为长臂猿，且具有很高的可信度。
FGSM的工作原理是利用神经网络的梯度来创建一个Adversarial example。对于输入图像，该方法使用相对于输入图像的损失的梯度来创建使损失函数最大化的新图像。这个新图像被称为对抗图像。可以使用以下表达式对其进行总结：
在这里插入图片描述
在这里，梯度是相对于输入的图像的。这样做是因为其目标是创造一个最大化损失的图像。实现这一点的方法是找出图像中每个像素对损失值的贡献程度，并相应地添加一个扰动（使用链式规则去计算梯度可以很容易地找到每个输入像素的贡献程度）。此外，由于模型不再被训练（因此梯度不针对可训练变量，即模型参数），因此模型参数保持不变。唯一的目的就是使一个已经受过训练的模型发生错误的分类。
在这篇文章中，模型是MobileNetV2模型，在ImageNet上进行了预训练。

代码实现

1、导入需要的库

import tensorflow as tf
import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.rcParams['figure.figsize'] = (8, 8)
mpl.rcParams['axes.grid'] = False

2、载入MobileNetV2模型

pretrained_model = tf.keras.applications.MobileNetV2(include_top=True,
                                                     weights='imagenet')
pretrained_model.trainable = False

有关迁移学习的内容可以参考：Tensorflow2.0之tf.keras.applacations迁移学习。

3、图像预处理

def preprocess(image):
    image = tf.cast(image, tf.float32)
    image = image/255
    image = tf.image.resize(image, (224, 224))
    image = image[None, ...]
    return image

image_path = tf.keras.utils.get_file('YellowLabradorLooking_new.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg')
image_raw = tf.io.read_file(image_path)
image = tf.image.decode_image(image_raw)

image = preprocess(image)

4、将图像输入模型并得到概率最高的分类结果

# Helper function to extract labels from probability vector
def get_imagenet_label(probs):
    return tf.keras.applications.mobilenet_v2.decode_predictions(probs, top=1)[0][0]

image_probs = pretrained_model.predict(image)
plt.figure()
plt.imshow(image[0])
_, image_class, class_confidence = get_imagenet_label(image_probs)
plt.title('{} : {:.2f}% Confidence'.format(image_class, class_confidence*100))
plt.show()

在这里插入图片描述
可见MobileNetV2准确地判断出图像中是拉布拉多犬。

5、计算梯度

loss_object = tf.keras.losses.CategoricalCrossentropy()

def create_adversarial_pattern(input_image, input_label):
    with tf.GradientTape() as tape:
        tape.watch(input_image)
        prediction = pretrained_model(input_image)
        loss = loss_object(input_label, prediction)

    # Get the gradients of the loss w.r.t to the input image.
    gradient = tape.gradient(loss, input_image)
    # Get the sign of the gradients to create the perturbation
    signed_grad = tf.sign(gradient)
    return signed_grad

6、将要添加的噪声打印出来

# Get the input label of the image.
labrador_retriever_index = 208
label = tf.one_hot(labrador_retriever_index, image_probs.shape[-1])
label = tf.reshape(label, (1, image_probs.shape[-1]))

perturbations = create_adversarial_pattern(image, label)
plt.imshow(perturbations[0])

在这里插入图片描述

7、定义函数来显示图像

def display_images(image, description):
    _, label, confidence = get_imagenet_label(pretrained_model.predict(image))
    plt.figure()
    plt.imshow(image[0])
    plt.title('{} \n {} : {:.2f}% Confidence'.format(description,
                                                   label, confidence*100))
    plt.show()

8、加入噪声后再将图像输入模型进行判断

epsilons = [0, 0.01, 0.1, 0.15]
descriptions = [('Epsilon = {:0.3f}'.format(eps) if eps else 'Input')
                for eps in epsilons]

for i, eps in enumerate(epsilons):
    adv_x = image + eps*perturbations
    adv_x = tf.clip_by_value(adv_x, 0, 1)
    display_images(adv_x, descriptions[i])

其中epsilons 表示噪声的干扰程度。

在这里插入图片描述
由此可见，这个训练好的模型已经不能对这张照片进行正确地分类了。

cofisher

发布了116 篇原创文章 · 获赞 13 · 访问量 3万+

私信关注

Tensorflow2.0之FGSM