Use gradient ascent to trick the neural network and let the network perform the wrong classification

In this tutorial, I will show how to use gradient ascent to solve how to misclassify input.

Figure out how to use gradient ascent to change an input classification

The neural network is a black box. Understanding their decisions requires creativity, but they are not so opaque.

In this tutorial, I will show you how to use backpropagation to change the input so that it is classified the way you want.

Human black box

Let us first take humans as an example. If I show you the following input:

Chances are you don't know if this is 5 or 6. In fact, I believe I can convince you that this may also be 8.

Now, if you ask someone what they need to do to turn something into 5, you might do something like this visually:

If I want you to change this to 8, you can do this:

Now, it is not easy to explain the answer to this question with a few if statements or looking at a few coefficients. And for certain types of input (image, sound, video, etc.), interpretability will undoubtedly become more difficult, but not impossible.

How to deal with neural network

How does a neural network answer the same question I asked above? To answer this question, we can use gradient ascent.

This is how the neural network thinks we need to modify the input to make it closer to other classifications.

This produced two interesting results. First of all, the black area is the network object that we need to remove the pixel density. Second, the yellow area is where it thinks we need to increase the pixel density.

We can take a step in this gradient direction and add the gradient to the original image. Of course, we can repeat this process over and over again, finally changing the input into the prediction we want.

You can see that the black spot in the lower left corner of the picture is very similar to human thoughts.

How about making the input look more like 8? This is how the network thinks you must change the input.

It is worth noting that there is a black mass in the lower left corner and a bright mass in the middle. If we add this to the input, we get the following result:

In this case, I don't particularly believe that we have changed this 5 to 8. However, we have reduced the probability of 5. It will definitely be easier to use the picture on the right instead of the picture on the left to convince you that the argument is 8.

gradient

In regression analysis, we use coefficients to understand what we have learned. In random forest, we can observe decision nodes.

In neural networks, it comes down to how we use gradients creatively. To classify this number, we generated a distribution based on possible predictions.

This is what we call forward propagation

As we move forward, we calculate the probability distribution of the output

The code looks like this:

Now suppose we want to trick the network into predicting that the value of input x is "5". The way to achieve this is to give it an image (x), calculate the prediction for the image, and then maximize the probability of predicting the label "5" .

For this, we can use gradient ascent to calculate the predicted gradient at the 6th index (ie label = 5) § with respect to the input x.

To do this in the code, we input x as a parameter to the neural network, select the 6th prediction (because we have labels: 0,1,2,3,4,5,...), the 6th index Means the label "5".

Visually this looks like:

code show as below:

When we call .backward(), what happened can be visualized by the previous animation.

Now that we have calculated the gradients, we can visualize and plot them:

Since the network has not been trained, the above gradient looks like random noise... However, once we train the network, the gradient information will be richer:

Automation through callbacks

This is a very useful tool to help clarify what happens during your network training. In this case, we want to automate this process so that it will happen automatically during training.

For this, we will use PyTorch Lightning to implement our neural network:

import torch
import torch.nn.functional as F
import pytorch_lightning as pl

class LitClassifier(pl.LightningModule):

    def __init__(self):
        super().__init__()
        self.l1 = torch.nn.Linear(28 * 28, 10)

    def forward(self, x):
        return torch.relu(self.l1(x.view(x.size(0), -1)))

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)
        result = pl.TrainResult(loss)

        # enable the auto confused logit callback
        self.last_batch = batch
        self.last_logits = y_hat.detach()

        result.log('train_loss', loss, on_epoch=True)
        return result
        
    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)
        result = pl.EvalResult(checkpoint_on=loss)
        result.log('val_loss', loss)
        return result

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.005)

The complex code that automatically draws the content described here can be abstracted as the Callback in Lightning. Callback is a small program, you may call it in various parts of training.

In this example, when processing the training batch, we want to generate these images in case some inputs are confused. .

import torch
from pytorch_lightning import Callback
from torch import nn


class ConfusedLogitCallback(Callback):

    def __init__(
            self,
            top_k,
            projection_factor=3,
            min_logit_value=5.0,
            logging_batch_interval=20,
            max_logit_difference=0.1
    ):
        super().__init__()
        self.top_k = top_k
        self.projection_factor = projection_factor
        self.max_logit_difference = max_logit_difference
        self.logging_batch_interval = logging_batch_interval
        self.min_logit_value = min_logit_value

    def on_train_batch_end(self, trainer, pl_module, batch, batch_idx, dataloader_idx):
        # show images only every 20 batches
        if (trainer.batch_idx + 1) % self.logging_batch_interval != 0:
            return

        # pick the last batch and logits
        x, y = batch
        try:
            logits = pl_module.last_logits
        except AttributeError as e:
            m = """please track the last_logits in the training_step like so:
                def training_step(...):
                    self.last_logits = your_logits
            """
            raise AttributeError(m)

        # only check when it has opinions (ie: the logit > 5)
        if logits.max() > self.min_logit_value:
            # pick the top two confused probs
            (values, idxs) = torch.topk(logits, k=2, dim=1)

            # care about only the ones that are at most eps close to each other
            eps = self.max_logit_difference
            mask = (values[:, 0] - values[:, 1]).abs() < eps

            if mask.sum() > 0:
                # pull out the ones we care about
                confusing_x = x[mask, ...]
                confusing_y = y[mask]

                mask_idxs = idxs[mask]

                pl_module.eval()
                self._plot(confusing_x, confusing_y, trainer, pl_module, mask_idxs)
                pl_module.train()

    def _plot(self, confusing_x, confusing_y, trainer, model, mask_idxs):
        from matplotlib import pyplot as plt

        confusing_x = confusing_x[:self.top_k]
        confusing_y = confusing_y[:self.top_k]

        x_param_a = nn.Parameter(confusing_x)
        x_param_b = nn.Parameter(confusing_x)

        batch_size, c, w, h = confusing_x.size()
        for logit_i, x_param in enumerate((x_param_a, x_param_b)):
            x_param = x_param.to(model.device)
            logits = model(x_param.view(batch_size, -1))
            logits[:, mask_idxs[:, logit_i]].sum().backward()

        # reshape grads
        grad_a = x_param_a.grad.view(batch_size, w, h)
        grad_b = x_param_b.grad.view(batch_size, w, h)

        for img_i in range(len(confusing_x)):
            x = confusing_x[img_i].squeeze(0).cpu()
            y = confusing_y[img_i].cpu()
            ga = grad_a[img_i].cpu()
            gb = grad_b[img_i].cpu()

            mask_idx = mask_idxs[img_i].cpu()

            fig, axarr = plt.subplots(nrows=2, ncols=3, figsize=(15, 10))
            self.__draw_sample(fig, axarr, 0, 0, x, f'True: {y}')
            self.__draw_sample(fig, axarr, 0, 1, ga, f'd{mask_idx[0]}-logit/dx')
            self.__draw_sample(fig, axarr, 0, 2, gb, f'd{mask_idx[1]}-logit/dx')
            self.__draw_sample(fig, axarr, 1, 1, ga * 2 + x, f'd{mask_idx[0]}-logit/dx')
            self.__draw_sample(fig, axarr, 1, 2, gb * 2 + x, f'd{mask_idx[1]}-logit/dx')

            trainer.logger.experiment.add_figure('confusing_imgs', fig, global_step=trainer.global_step)

    @staticmethod
    def __draw_sample(fig, axarr, row_idx, col_idx, img, title):
        im = axarr[row_idx, col_idx].imshow(img)
        fig.colorbar(im, ax=axarr[row_idx, col_idx])
        axarr[row_idx, col_idx].set_title(title, fontsize=20)

However, by installing pytorch-lightning-bolts, we made it easier

!pip install pytorch-lightning-bolts
from pl_bolts.callbacks.vision import ConfusedLogitCallback

trainer = Trainer(callbacks=[ConfusedLogitCallback(1)])

Put them together

Finally, we can train our model and automatically generate images when the judgment logic is confused.

# data
dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
train, val = random_split(dataset, [55000, 5000])

# model
model = LitClassifier()

# attach callback
trainer = Trainer(callbacks=[ConfusedLogitCallback(1)])

# train!
trainer.fit(model, DataLoader(train, batch_size=64), DataLoader(val, batch_size=64))

tensorboard will automatically generate the following pictures:

See if this is different

Author: William Falcon

Full code: https://colab.research.google.com/drive/16HVAJHdCkyj7W43Q3ZChnxZ7DOwx6K5i?usp=sharing

deephub translation team

Guess you like

Origin blog.csdn.net/m0_46510245/article/details/108702227