Deep dream——《Going Deeper into Neural Networks》

deep dream的体验和以往看论文，跑例子的过程完全不同。这是在跑“风格迁移”的例子时，在keras的examples中无意看到了程序，然后顺带跑一跑的。跑出来的效果让我觉得和无厘头，于是读程序，看它到底干了些啥。程序风格也很特别，没有和通常训练过程一般的迭代方式，又很好奇，处于什么目的做这个呢，于是，看了论文。看了论文，简直对写论文的人佩服的五体投地。整个过程笨妞的情绪就是一条“低开高走”的K线图。

所以，这篇文章的顺序是这样的：效果—>感受—>程序—>论文。

1. 效果

还是用那张今年春节在四川西昌邛海拍的“初春图”来做的。原图是这样的：

效果图是这样的：

2. 感受

看到之后第一感觉是——imagenet里面是有多少"汪星人”的图片呀，搞得树上爬满了狗。这是做了“噩梦”吧。

于是上网搜了一下，据说要是拿人的照片来跑的话，“人变汪”的可能性比较大。好吧，幸好没拿自己的照片玩。

3. 程序

这个程序到底在干嘛！做了这么奇怪的梦。

程序是这样的：

'''Deep Dreaming in Keras.
Run the script with:
```
python deep_dream.py path_to_your_base_image.jpg prefix_for_results
```
e.g.:
```
python deep_dream.py img/mypic.jpg results/dream
```
'''
from __future__ import print_function


from keras.preprocessing.image import load_img, img_to_array
import numpy as np
import scipy
import argparse


from keras.applications import inception_v3
from keras import backend as K


parser = argparse.ArgumentParser(description='Deep Dreams with Keras.')
parser.add_argument('base_image_path', metavar='base', type=str,
                    help='Path to the image to transform.')
parser.add_argument('result_prefix', metavar='res', type=str,
                    help='Prefix for the saved results.')


args = parser.parse_args([ 'image/spring.jpg', 'deep_spring'])
base_image_path = args.base_image_path
result_prefix = args.result_prefix


# These are the names of the layers
# for which we try to maximize activation,
# as well as their weight in the final loss
# we try to maximize.
# You can tweak these setting to obtain new visual effects.


#定义要去抽取特征的模块，以及每个模块所占的权重
#在后面整合loss的时候用。
settings = {
    'features': {
        'mixed2': 0.2,
        'mixed3': 0.5,
        'mixed4': 2.,
        'mixed5': 1.5,
    },
}




#图像预处理。
#和“风格迁移”预处理过程一样，只是这里不是用vgg19的处理函数，用inception_v3的。
#因为我们的网络用inception v3
def preprocess_image(image_path):
    # Util function to open, resize and format pictures
    # into appropriate tensors.
    img = load_img(image_path)
    img = img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = inception_v3.preprocess_input(img)
    return img


#图像后处理，和“风格迁移”也差不多，把预处理是RGB->BRG的过程反过来。
def deprocess_image(x):
    # Util function to convert a tensor into a valid image.
    if K.image_data_format() == 'channels_first':
        x = x.reshape((3, x.shape[2], x.shape[3]))
        x = x.transpose((1, 2, 0))
    else:
        x = x.reshape((x.shape[1], x.shape[2], 3))
    x /= 2.
    x += 0.5
    x *= 255.
    x = np.clip(x, 0, 255).astype('uint8')
    return x


#将模式设置为训练 
#0为训练，1为测试
K.set_learning_phase(0)


# Build the InceptionV3 network with our placeholder.
# The model will be loaded with pre-trained ImageNet weights.
#载入预训练模型
model = inception_v3.InceptionV3(weights='imagenet',
                                 include_top=False)
dream = model.input
print('Model loaded.')


# Get the symbolic outputs of each "key" layer (we gave them unique names).
#统计重要层
layer_dict = dict([(layer.name, layer) for layer in model.layers])


# Define the loss.
#定义loss
loss = K.variable(0.)
#抽出settings中各层的特征，并
for layer_name in settings['features']:
    # Add the L2 norm of the features of a layer to the loss.
    assert layer_name in layer_dict.keys(), 'Layer ' + layer_name + ' not found in model.'
    coeff = settings['features'][layer_name]
    x = layer_dict[layer_name].output
    # We avoid border artifacts by only involving non-border pixels in the loss.
    #scaling为所有特征图合起来的尺寸（x*y*channels）
    scaling = K.prod(K.cast(K.shape(x), 'float32'))
    #以像素值的平均平方值为该层的loss（为什么是这样的呢？）
    if K.image_data_format() == 'channels_first':
        loss += coeff * K.sum(K.square(x[:, :, 2: -2, 2: -2])) / scaling
    else:
        loss += coeff * K.sum(K.square(x[:, 2: -2, 2: -2, :])) / scaling


# Compute the gradients of the dream wrt the loss.
#计算loss基于dream(模型输入)的梯度
grads = K.gradients(loss, dream)[0]
# Normalize gradients.
#梯度归一化
grads /= K.maximum(K.mean(K.abs(grads)), K.epsilon())


# Set up function to retrieve the value
# of the loss and gradients given an input image.
#定义迭代计算图
outputs = [loss, grads]
fetch_loss_and_grads = K.function([dream], outputs)




#获取loss和梯度
def eval_loss_and_grads(x):
    outs = fetch_loss_and_grads([x])
    loss_value = outs[0]
    grad_values = outs[1]
    return loss_value, grad_values




def resize_img(img, size):
    img = np.copy(img)
    if K.image_data_format() == 'channels_first':
        factors = (1, 1,
                   float(size[0]) / img.shape[2],
                   float(size[1]) / img.shape[3])
    else:
        factors = (1,
                   float(size[0]) / img.shape[1],
                   float(size[1]) / img.shape[2],
                   1)
    #order=1按照双线性变换的方法插值。
    return scipy.ndimage.zoom(img, factors, order=1)


#优化过程，按照step和迭代次数，逐次优化，优化方式是将梯度值叠加到x上。
def gradient_ascent(x, iterations, step, max_loss=None):
    for i in range(iterations):
        loss_value, grad_values = eval_loss_and_grads(x)
        if max_loss is not None and loss_value > max_loss:
            break
        print('..Loss value at', i, ':', loss_value)
        x += step * grad_values
    return x




def save_img(img, fname):
    pil_img = deprocess_image(np.copy(img))
    scipy.misc.imsave(fname, pil_img)




"""Process:
- Load the original image.
- Define a number of processing scales (i.e. image shapes),
    from smallest to largest.
- Resize the original image to the smallest scale.
- For every scale, starting with the smallest (i.e. current one):
    - Run gradient ascent
    - Upscale image to the next scale
    - Reinject the detail that was lost at upscaling time
- Stop when we are back to the original size.
To obtain the detail lost during upscaling, we simply
take the original image, shrink it down, upscale it,
and compare the result to the (resized) original image.
"""




#迭代正式开始......


# Playing with these hyperparameters will also allow you to achieve new effects
step = 0.01  # Gradient ascent step size
num_octave = 3  # Number of scales at which to run gradient ascent
octave_scale = 1.4  # Size ratio between scales
iterations = 20  # Number of ascent steps per scale
max_loss = 10.


#原始图像预处理
img = preprocess_image(base_image_path)
if K.image_data_format() == 'channels_first':
    original_shape = img.shape[2:]
else:
    original_shape = img.shape[1:3]
#定义图像shape的变化层次
#随着迭代，successive_shape越来越小，这是什么原理呢？图像金字塔，多尺寸？
successive_shapes = [original_shape]
for i in range(1, num_octave):
    #根据octave_scale同比例缩小图像
    shape = tuple([int(dim / (octave_scale ** i)) for dim in original_shape])
    successive_shapes.append(shape)
print(successive_shapes)
#顺序方向，图像从小往大处理，那么处理到最后，和原图像一样大。
successive_shapes = successive_shapes[::-1]
original_img = np.copy(img)
#在迭代开始前首先将图像resize小。
shrunk_original_img = resize_img(img, successive_shapes[0])


#这个迭代和平时见到的迭代不同，平时是所有数据所有情况跑一边算一次迭代。
#这里，一个shape跑完所有迭代次数，然后进入下一个shape.
#shape的意义还是不清楚。
for shape in successive_shapes:
    print('Processing image shape', shape)
    #根据预定义的shape变化层次resize图像
    img = resize_img(img, shape)
    #优化图像
    img = gradient_ascent(img,
                          iterations=iterations,
                          step=step,
                          max_loss=max_loss)
    upscaled_shrunk_original_img = resize_img(shrunk_original_img, shape)
    same_size_original = resize_img(original_img, shape)
    #原始图经过当前shape zoom与经过前一次和当前次shape两次zoom的差值为lost_detail
    lost_detail = same_size_original - upscaled_shrunk_original_img
    
    #每次迭代生成图像由输入图像（上一次迭代的输出）+ 逐次迭代的梯度和 + 基本损失折算到每个shape下的损失 组合而成
    img += lost_detail
    shrunk_original_img = resize_img(original_img, shape)


    #保存图像
save_img(img, fname=result_prefix + '.png')

从程序上来看，这就是跑一遍inception v3网络，然后，从网络第2、3、4、5块抽出输出特征图，然后以每块的像素值的平方平均值作为loss，对dream图求梯度，用这个梯度来优化dream图。迭代以iterations和successive_shapes两个维度控制，前者控制梯度优化的次数，后者控制dream图的shape变化。程序实例中，图像是渐进式处理的，先将图像缩小，然后再逐次放大，每放大一次，跑一次全iterations的全迭代过程。图像由小往大迭代的意义是否和多尺度处理相同呢？这点暂时有点困惑。

4. 论文翻译

在读论文时，发现整个论文都比较有意义，所以整篇翻译了来。论文只写了基本思想，实现方法一笔带过。

可下载论文的地址

####################################论文翻译在此########################################

最近，人工神经网络在图像分类和语音识别方面取得了显著的进步。但是，尽管这些工具都是基于众所周知的数学方法的非常有用的工具，但我们对某些模型为什么工作而另一些模型不起作用的理解却令人惊讶。所以让我们来看看一些在这些网络中窥视的简单技术。

我们训练一个人工神经网络，向它展示数百万个训练实例，并逐步调整网络参数，直到给出我们想要的分类。网络通常由10到30层的人工神经组成。每幅图像都被输入层输入，然后输入层与下一层对话，直到最终“输出”层被恢复。网络的“答案”来自这个最后的输出层。

神经网络的挑战之一是了解每一层到底发生了什么。我们知道，经过训练后，每一层都会逐步提取图像的更高层次特征，直到最后一层基本上决定图像显示的内容。例如，第一层可能寻找边缘或角落。中间层解释基本特征，以寻找整体形状或组件，如。门或叶子。最后几层将这些层组装成完整的解释--这些神经元对非常复杂的事物(如整个建筑物或树木)的反应是激活的。

想象发生的事情的一种方法是将网络颠倒过来，并要求它增强输入图像，以引起特定的解释。比如说，你想知道什么样的图像会输出“香蕉”。首先是一幅充满随机噪声的图像，然后逐渐调整图像，使之变成神经网络认为香蕉的图像。这本身效果并不是很好，但如果我们施加一个先验约束，图像应该具有与自然图像相似的统计信息，例如需要相关的相邻像素，它就会产生效果。

因此，有一个惊喜：神经网络被训练用来做图像分类，它也包含了生成图像所需的大量信息。看看更多的不同类别的例子：

为什么这很重要？我们简单的向神经网络展示我们想让他们学习的东西来训练他们，希望他们能够提取图像中事物的本质(例如，叉子需要一个手柄和2-4条)，学会忽略不重要的东西(叉子可以是任何形状、大小、颜色或方向)。但是，你如何检查网络是否正确地掌握了正确的信息、功能呢？可以通过可视化网络的各层来表示。

事实上，在某些情况下，这揭示了神经网络并不是我们想象的那样。例如，我们设计一个神经网络去“分辨”哑铃，像这样：

这些图片看似像几个哑铃，但好像又没有一个是完整的哑铃。

我们可以精确地规定我们希望网络放大的功能，还可以让网络自己做出决定。在这种情况下，我们只需向网络提供一张任意的图像或照片，然后让网络分析图片。然后我们选择一个层，并要求网络增强它检测到的任何功能。网络的每一层都在不同的抽象级别处理特征，所以。我们所生成的特征的复杂性取决于我们选择的层次。例如，较低的层往往会产生工笔图或简单的装饰性图案，因为这些层对边缘及其方向等基本特征很敏感。

如果我们选择更高层次的图像、复杂特征甚至整个物体的更复杂的特征来识别，那么我们就从一个现有的图像开始，然后把它交给我们的神经网络。我们问网络：“不管你在那里看到什么，我都想要更多的！”这就产生了一个反馈循环：如果一片云看起来有点像一只鸟，网络就会让它看起来更像。更像一只鸟，这反过来又会使网络在下一次测试时更加强烈地识别鸟，直到一只非常详细的鸟出现，似乎不知从哪里冒出来。

这个结果很有趣--即使是一个相对简单的神经网络也可以用来过度解读一幅图像，就像我们小时候喜欢看云然后把随意的把云解释为某些形状一样。

这个网络主要是针对动物的图像进行训练的，所以它很自然地倾向于把形状理解为动物。但由于数据存储在如此高的抽象程度上，结果是这些学习特征的有趣的混合体。

当然，我们可以用这种技术做更多的事情。我们可以把它应用到任何类型的图像中。结果随着图像的种类而有很大差异，因为进入的特征使网络偏向于某些解释。例如，地平线编程了塔，岩石和树木变成了建筑物，鸟类和昆虫出现在树叶的图像中。

这种技术让我们对某一层在理解图像时所达到的抽象程度有了定性的感觉。我们把这种技术称为神经网络结构中的" Inceptionism”。

如果我们迭代地将算法应用于自己的输出，并在每次迭代后进行一些缩放，我们就会得到无数的新印象，从而探索网络所知道的一系列事情。我们甚至可以从一幅随机噪声的图像开始这个过程，从而使结果成为纯粹的神经网络的结果，如下面的图像所示。

这里介绍的技术帮助我们理解和可视化神经网络如何能够执行困难的分类任务，改进网络体系结构，并检查网络在训练期间学到了什么。这也让我们怀疑神经网络是否可以成为艺术家的工具--一种新的融合视觉概念的方法--或者甚至可以为整个创作过程的根源提供一点启示。

#########################################论文结束###################################

至此，终于明白这个idea的发明者要做什么了。简直太有想法了有木有！

回想一下自己的梦境，经常就是很无厘头，说不清道不明，说不定我们的梦境就是大脑皮层某些神经元抽取白天的场景的一些特征，融合融合形成的呢。

而且通过以有监督的学习为手段，以提取特征为目的，抽取特征，也是替代人工劳力很棒的方式呀。

Deep dream——《Going Deeper into Neural Networks》

猜你喜欢