paper:Generating Images with Perceptual Similarity Metrics based on Deep Networks

版权声明:任意转载 https://blog.csdn.net/Z609834342/article/details/83422078

主要内容:本文主要从传统深度网络生成图片比较模糊出发,分析其原因为图像的details并不能全部的存在于feature中,所以通常loss倾向于平均化所有可能存在细节的locations,导致最后生成的图片blurry,但是准确的locations并不重要,重要的是the distribution of these details,所以希望模型能通过在合适的特征空间中测量距离,实现对不相关变换的不变性和对局部图像统计的敏感性,但是,特征表达具有收缩性,即很多的图片包括fake images都能用相同的特征向量来表示。总解决思想是加入额外的natural image prior和其他的loss(在半监督和生成图片的领域中很多的方法就是先提出一种很直观的现象和问题,然后给出自己的理解和分析,即使分析的本质上和其他的它所参考文章的原因一样,但是依然会给出作者自己不太一样或是换一种说法的解释,这样就能在分析上有所新,最后的解决方法还是在改动loss和添加loss上做文章)。

重要的句子:

1. Instead of computing distances in the image space, we compute distances between image features extracted by deep neural networks. This metric reflects perceptual similarity of images much better and, thus, leads to better results.

2.We demonstrate two examples of use cases of the proposed loss: (1) networks that invert the AlexNet convolutional network; (2) a modified version of a variational autoencoder that generates realistic high-resolution random images.

3. The precise location of all details is not preserved in the features. A loss in image space leads to averaging all likely locations of details, hence the reconstruction looks blurry.(分析现象)However, exact locations of all fine details are not important for perceptual similarity of images. What is important is the distribution of these details

4.Our main insight is that invariance to irrelevant transformations and sensitivity to local image statistics can be achieved by measuring distances in a suitable feature space(解决思想)直到现在我看到的半监督相关的较新的文章都倾向于在latent space中做文章,这是不是对应着feature represent呢?可能说到底感觉在图像领域中怎么找到有效的feature还是有着相当重要的作用。cnn为什么能这么有效还是因为其有效的特征选择。

5.In fact, convolutional networks provide a feature representation with desirable properties. They are invariant to small, smooth deformations but sensitive to perceptually important image properties, like salient edges and textures.下采样自带局部和旋转不变形。

6.Since feature representations are typically contractive, feature similarity does not automatically mean image similarity(这里的解释跟missing model的原因有些相似,后者是说大部分的model可能会被high value的D所引导,因为这时候G的性能还较差,这就导致大部分的images都会对应相似的model,而造成missing model的现象,最终生成的images由于缺乏多样性会模糊). In practice this leads to high-frequency artifacts, To force the network generate realistic images, we introduce a natural image prior based on adversarial training, as proposed by Goodfellow et al. [1] 1 . We train a discriminator network to distinguish the output of the generator from real images based on local image statistics.A combination of similarity in an appropriate feature space with adversarial training yields the best results

7.These go beyond simple distances in image space and can capture complex and perceptually important properties of images.

8.Loss in feature space. Given a differentiable comparator C .C may be fixed or may be trained; for example, it can be a part of the generator or the discriminator.L f eat alone does not provide a good loss for training. It is known (Mahendran & Vedaldi, 2015) that optimizing just for similarity in the feature space typically leads to highfrequency artifacts. This is because for each natural image there are many non-natural images mapped to the same feature vector (This is unless the feature representation is specifically designed to map natural and non-natural images far apart, such as the one extracted from the discriminator of a GAN.) . Therefore, a natural image prior is necessary to constrain the generated images to the manifold of natural images.

猜你喜欢

转载自blog.csdn.net/Z609834342/article/details/83422078