Image inpainting with deep learning

This article is reprinted from https://blog.csdn.net/stdcoutzyx/article/details/63686825, thanks to the original author!

The problem of image inpainting is to restore missing parts of an image. Based on the existing information in the image, to restore the missing parts in the image.

Intuitively, whether this problem can be solved depends on the situation. The key to restoration lies in the use of the remaining information. If there is a patch with missing information in the remaining information, then the remaining problem is to determine the relationship between the missing part and the remaining information. which part is similar. And this is the basic idea of ​​the now more popular PatchMatch.

Since the emergence of CNN, there have been several important developments:

  • It has been shown to be capable of capturing abstract information of images at the higher layers of CNNs.
  • The emergence of Perceptual Loss proves that the feature map of a trained CNN network can be a good auxiliary tool for the loss function in image generation.
  • GANs can use supervised learning to enhance the effect of generative networks. The reason for its effect is not yet interpretable, but it can be understood that the generative network can learn the rules in an indirect way.

Based on the above three advances, reference [1] proposes a CNN-based image restoration method.

CNN network structure

The algorithm requires the use of two networks, one is a content generation network and the other is a texture generation network. Content-generating networks are used directly to generate images, inferring possible content for missing parts. The texture generation network is used to enhance the output texture of the content network. Specifically, the generated complementary image and the original non-missing image are input into the texture generation network, and the loss is calculated on a certain layer of feature_map, which is recorded as Loss NN.

The content generation network needs to be trained with its own data, while the texture generation network uses the already trained VGG Net. In this way, generating an image can be divided into the following steps:

Define an image with a missing part as x0

  • Input x0 into the content generation network to get the generated image x
  • x as the initial value of the last generated image
  • Keeping the parameters of the texture generation network unchanged, use Loss NN to perform gradient descent on x to get the final result.

Regarding the training of the content generation network and the definition of Loss NN, the following will explain them one by one

Content Generation Network

The generation network structure is as above, and its loss function uses a combination of L2 loss and adversarial loss. The so-called adversarial loss is derived from the adversarial neural network .

In this generative network, in order to stabilize the training, two changes were made:

  • Replace all ReLU/leaky-ReLU with ELU layers
  • Use fully-connected layer instead of chnnel-wise fully connected network.

texture generation network

The Loss NN of the texture generation network is as follows:

It is divided into three parts, namely the Pixel-wise Euclidean distance, the perceptual loss based on the feature layer of the trained texture network, and the TV Loss for smoothing.

α and β are both 5e -6 ,

The Pixel-wise Euclidean distance is as follows:

TV Loss is as follows:

The calculation of Perceptual Loss is more complicated. The information of PatchMatch is used here, that is, the most approximate Patch is found for the missing part. In order to achieve this, the missing part is divided into many fixed-size patches as query, and the existing part is also divided into For the same fixed-size patch, dataset PATCHES is generated. When matching the closest patch in query and PATCHES, it is necessary to calculate the distance on the activation value of a layer in the texture generation network instead of calculating the pixel distance.

However, the operation of finding the nearest neighbor Patch seems to be non-derivative, how to break this? Similar to MRF+CNN , here, first extract the feature_map of each patch in PATCHES, combine it into a new convolutional layer, then get the feature map of the query and input it into this convolutional layer, the most similar The patch will get the maximum activation value, so feed it into a max-pooling layer to get this maximum value. In this way, backpropagation is possible.

Application on HD images

本算法直接应用到高清图像上时效果并不好,所以,为了更好的初始化,使用了Stack迭代算法。即先将高清图像down-scale到若干级别[1,2,3,…,S],其中S级别为原图本身,然后在级别1上使用图像均值初始化缺失部分,得到修复后的结果,再用这个结果,初始化下一级别的输入。以此类推。

效果

上图从上往下一次为,有缺失的原图,PatchMatch算法,Context Decoder算法(GAN+L2)和本算法。

内容生成网络的作用

起到了内容限制的作用,上图比较了有内容生成网络和没有内容生成网络的区别,有的可以在内容上更加符合原图。

应用

图像的语义编辑,从左到右依次为原图,扣掉某部分的原图,PatchMatch结果,和本算法结果。

可知,该方法虽然不可以复原真实的图像,但却可以补全成一张完整的图像。这样,当拍照中有不想干的物体或人进入到摄像头中时,依然可以将照片修复成一张完整的照片。

总结

CNN的大发展,图像越来越能够变得语义化了。有了以上的图像复原的基础,尽可以进行发挥自己的想象,譬如:在图像上加一个东西,但是光照和颜色等缺明显不搭,可以用纹理网络进行修复。

该方法的缺点也是很明显:

  • 性能和内存问题
  • 只用了图片内的patch,而没有用到整个数据集中的数据。

参考文献

[1]. Yang C, Lu X, Lin Z, et al. High-Resolution Image Inpainting using Multi-Scale Neural Patch Synthesis[J]. arXiv preprint arXiv:1611.09969, 2016.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326577008&siteId=291194637