Saliency maps

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

problem

This article and ZFnet similar, to study the problem of network visualization, to counter the launch of the most original image based on the last split vector network, assuming the input (input) is \ (the I \) , and the input image corresponding to the tag is \ (c \) , and the classification score is \ (S_c (I) \) (which is the first \ (c \) th component), so we wanted to find a \ (I \) so that \ (S_c (I) \ ) is large enough, this input is most likely explanation for this class:
\ [\ mathrm {} argmax _I \ Quad S_C (the I), \]
However, the paper is actually research the following problems:
\ [\ mathrm {} argmax _I \ quad S_c (I) -
\ lambda \ |. I \ | _2 ^ 2 \] is actually adding a regularization term, I think it should be in a practical point of view, because in dealing with the image of tend to have a Normlize process, so if \ (I \) is too "huge" that is certainly not appropriate - at least it can not be called an image.

detail

variable

It should be noted that the above question is about \ (the I \) , it is the image, if there \ (k \) classes, then theoretically there should be \ (k \) images (with a corresponding sheet \ (\ the lambda \) ).

Then the results of the thesis is this:

Here Insert Picture Description

My result is this (CIFAR10):
Here Insert Picture Description
far, is \ (\ lambda = 0.1 \) inappropriate?

\(S_c(I)\)

Need to mention that this \ (S_c (I) \) value is not the sigmoid, but before the score, the author is so explained, as the Sigmoid:
\ [P ~ c = \ {FRAC S_C} {\ sum_c \ exp (S_c)}, \]
our aim is to improve the \ (S_c \) , and if it is \ (P ~ c \) , then we can reduce other \ (S_c \) to indirectly improve \ (P ~ c \) , rather than improve \ (S_C \) , some sense of it, tried it, under the conditions of the original parameters hardly learn ...

Spread

Author mentions this scheme can be used to locate, we must first be noted that, by this method, we can "locate" (although it may be imagined) sensitive area.

An input image, calculates
\ [\ frac {\ partial S_c
(I)} {\ partial I}, \] The result is a "matrix" (tensor?), Wherein the absolute value of the element can be measured on the category determination important that the greater the more sensitive areas.

Here Insert Picture Description
That simple example, and did not feel good to convince me. If the network is a linear discriminator, then do the same train of thought, which is sensitive to weight, so it seems intuitively true, but it feels like to put aside the data itself .. but certainly makes sense. Another problem is that, for a picture, if it is false, then the label is to choose its own, or the network judged that \ (c \) it?
In my experiments, the two did not seem much difference.

Back to the topic of positioning, After calculating the gradient of the matrix, if there \ (C \) channels, \ (C \) absolute maximum as the position of the sensitivity of each element of the channel, and so, if the picture is \ ((C, H, W is) \) , then end up with a \ ((1, H, W ) \) matrix, wherein an element of the sensitivity of the reaction.

However, where the sensitivity of the reaction indicates the approximate location of the object where the author says that he has given scope to more carefully through the continuous frame color, the kind of technology I do not know, simply do an experiment:

Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description

Here Insert Picture Description

Look, I think there are so feeling.

Code

Looking for \ (I \) when I do not know how to use the existing gradient method, he wrote a test network a success rate of 60%, because it is a relatively simple network, a large network is difficult to start with.




import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt









class Net(nn.Module):

    def __init__(self, num):
        super(Net, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(3, 16, 4, 2), #3x32x32 --> 8x15x15
            nn.ReLU(),
            nn.MaxPool2d(2, 2), # 15 --> 7
            nn.Conv2d(16, 64, 3, 1, 1), #16x7x7 --> 64x7x7
            nn.ReLU(),
            nn.MaxPool2d(2, 1) #7-->6
         )
        self.dense = nn.Sequential(
            nn.Linear(64 * 6 * 6, 256),
            nn.ReLU(),
            nn.Linear(256, num)
        )

    def forward(self, x):
        x = self.conv(x)
        x = x.view(x.size(0), -1)
        out = self.dense(x)
        return out




class SGD:
    def __init__(self, lr=1e-3, momentum=0.9):
        self.v = 0
        self.lr = lr
        self.momentum = momentum

    def step(self, x, grad):
        self.v = self.momentum * self.v +  grad
        return x + self.lr * self.v



class Train:

    def __init__(self, trainset, num=10, lr=1e-4, momentum=0.9,loss_function=nn.CrossEntropyLoss()):
        self.net = Net(num)
        self.trainset = trainset
        self.criterion = loss_function
        self.opti = torch.optim.SGD(self.net.parameters(), lr=lr, momentum=momentum)

    def trainnet(self, iterations, path):
        running_loss = 0.0
        for epoch in range(iterations):
            for i, data in enumerate(self.trainset):
                imgs, labels = data
                output = self.net(imgs)
                loss = self.criterion(output, labels)
                self.opti.zero_grad()
                loss.backward()
                self.opti.step()
                running_loss += loss
                if i % 10 == 9:
                    print("[epoch: {} loss: {:.7f}]".format(
                        epoch,
                        running_loss / 10
                    ))
                    running_loss = 0.0
        torch.save(self.net.state_dict(), path)

    def loading(self, path):
        self.net.load_state_dict(torch.load(path))
        self.net.eval()

    def visual(self, iterations=100, digit=0, gamma=0.1, lr=1e-3, momentum=0.9):
        def criterion(out, x, digit, gamma=0.1):
            return out[0][digit] - gamma * torch.norm(x, 2) ** 2
        opti = SGD(lr, momentum)
        x = torch.zeros((1, 3, 32, 32), requires_grad=True, dtype=torch.float)
        for i in range(iterations):
            output = self.net(x)
            loss = criterion(output, x, digit, gamma)
            print(loss.item())
            loss.backward()
            x = torch.tensor(opti.step(x, x.grad), requires_grad=True)
        img = x[0].detach()
        img = img / 2 + 0.5
        img = img / torch.max(img.abs())
        img = np.transpose(img, (1, 2, 0))
        print(img[0])
        plt.imshow(img)
        plt.title(classes[digit])
        plt.show()
        return x

    def local(self, img, label):
        cimg = img.view(1, 3, 32, 32).detach()
        cimg.requires_grad = True
        output = self.net(cimg)
        print(output)
        print(label)
        s = output[0][label]
        s.backward()
        with torch.no_grad():
            grad = cimg.grad.data[0]
            graph = torch.max(torch.abs(grad), 0)[0]
            saliency = graph.detach().numpy()
        print(np.max(saliency))
        img = img.detach().numpy()
        img = img / 2 + 0.5
        img = np.transpose(img, (1, 2, 0))
        fig, ax = plt.subplots(1, 2)
        ax[0].set_title(classes[label])
        ax[0].imshow(img)
        ax[1].imshow(saliency, cmap=plt.cm.hot)
        plt.show()

    def testing(self, testloader):
        correct = 0
        total = 0
        with torch.no_grad():
            for data in testloader:
                images, labels = data
                outputs = self.net(images)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()
        print('Accuracy of the network on the 10000 test images: %d %%' % (
                100 * correct / total))

root = "C:/Users/pkavs/1jupiterdata/data"

#准备训练集


trainset = torchvision.datasets.CIFAR10(root=root, train=True,
                                        download=False,
                                       transform=transforms.Compose(
                                           [transforms.ToTensor(),
                                            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
                                       ))

train_loader = torch.utils.data.DataLoader(trainset, batch_size=64,
                                          shuffle=True, num_workers=0)


testset = torchvision.datasets.CIFAR10(root=root, train=False,
                                       download=False,
                                       transform=transforms.Compose(
                                           [transforms.ToTensor(),
                                            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
                                       ))
testloader = torch.utils.data.DataLoader(testset, batch_size=64,
                                         shuffle=False, num_workers=0)



classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

path = root + "/visual1.pt"


test = Train(train_loader, lr=1e-4)
test.loading(path)
#test.testing(testloader) 60%


data = next(iter(train_loader))
imgs, labels = data
img = imgs[0]
label = labels[0]
test.local(img, label)


#test.visual(1000, digit=3)


Guess you like

Origin www.cnblogs.com/MTandHJ/p/11355180.html