Double DIP - an unsupervised image segmentation layer AI technology

Lei Feng network AI Technology Review by: Monthly "Computer Vision News" will choose a paper on the field of computer vision research review. March of this year, they chose by Yossi Gandelsman, Assaf Shocher and Michal Irani three scholars (author mentioned below, refer to the above three scholars) paper on Double-DIP model together to complete, which details this layer unsupervised segmentation technique a single image based on the depth image prior network-coupled.

13894005-adc365ea75a5554d

Overview

Many seemingly unrelated computer vision tasks can be regarded as exceptional image into different layers. For two prominent examples: image segmentation - into regions background and foreground layer; image defogging - is divided into a fog layer and clear layer. In this paper, the authors proposed based on "a priori depth image" coupling (DIP) network to a single unified framework for unsupervised image segmentation layer.

Priori depth image (DIP) received CVPR 2018 conference network, a statistical data can be used to lower individual structural image is generated, and only needs to be trained on a single image. In the paper, the author shows us how to get through a couple more DIP networks a powerful tool to split the image to its basic components, making it suitable for all kinds of tasks. Because the data obtained from the inside of the mixed layer, which compared with data of respective components of more complex and more representative, making it possible to achieve versatile applicability have. The authors argue that, because the model is capable of performing a variety of tasks, compared to their conduct on a different level, a variety of internal statistical properties of the different layers of more robust, there are better characterization capabilities.

Author shows us the use of the method in various computer vision tasks, such as: the removal of the watermark, foreground / background segmentation, image defogging transparency and separation video. Without providing any additional data need only be trained on a single image, you can complete all the tasks above.

About "image segmentation unified framework."

Divided by the original redefined three different tasks, can be regarded as a simple mixture of the base layer, as shown, image segmentation, image defogging FIG transparency three separate tasks can be seen, the original image is first removed into the basic layer, and then re-mixed layers.

This method divides the image into a plurality of the base layer, and to provide a unified framework for processing a large number of distinct and independent computer vision tasks. All of these have in common is the distribution of image segmentation within each individual layer pieces of "mixing" ratio of an image (i.e., original image) more "simple" (uniform), leading to the interior of each individual layer of strong similarity. It has been proved that the mini-block (e.g., 5 × 5,7 × 7) of the statistical characteristic (distribution) in a highly reproducible natural image, so that a strong internal reproducibility, can be very good for handling various computer visual tasks.

13894005-ccd6572f05276106
FIG 1 unified framework for image segmentation

The authors' approach combined with internal patch reproduce characteristic that is recurring small image (no need to supervise the task of solving ability) and the power of deep learning of the proposed framework based on unsupervised DIP networks. When the input DIP network is random noise, it can also learn to reconstruct a single image (the image as the training of only input), a single DIP network proved to be a good capture low-level statistical data for a single natural images. The network has also been confirmed in unsupervised, are capable of solving such as: noise removal, and repair of super-resolution and other issues.

The basic principles of image segmentation

13894005-744f48c06bf20173
2 the basic principle of image segmentation

Figure 2 shows us the basic principle of this method. It shows how the use of two X and Y patterns, to produce new mixing more complicated image Z. Each mini-block distribution "pure" pattern (X and Y) in the image easier compared to distributive mixing Z mini-block. Known that if X and y are two independent random variables, then their entropy and Z = X + Y is greater than their respective entropy.

图 2 的损失函数图还向我们详细展示了单个 DIP 网络作为时间函数(训练迭代)时的 MSE 重建损失。对于图中的 3 条线:(i)橙色是训练重建纹理图像 X 的 MSE 损失;(ii)蓝色是训练重建纹理 Y 的 MSE 损失;(iii)绿色是训练重建纹理图像 X+Y 的 MSE 损失。可以发现,MSE 损失值越大时,收敛时间越长。而且,混合图像的 MSE 损失值不仅大于两个单独图像的 MSE 损失值,实际上,还大于两个单独图像 MSE 损失值的总和。

为了证明这个现象不是偶然,作者从 BSD100 数据集(为了防止自然图像与规则图案间有差异)中随机选择了 100 对自然图像来重复该实验。而结果证明,混合图像与合成图像组之间 MSE 损失值的差值甚至更高。

图像分割工作模型

13894005-3c974d44d22dbae1
图3 图像分割工作模型

图 3 详细说明了 Double-DIP 对图像进行分割时的工作模型。两个深度图像先验(DIP)网络(DIP1 DIP2)将输入图像分割成对应的图像层(y1&y2),然后根据二进制掩模 m(x)进行重组,以形成尽可能接近于输入图像本身的重建图像 I。

什么样的分割是好的图像分割?有很多方法可以将其分割为基本图层,但作者提出有意义的分割应该满足这样几个标准:

重新组合时,恢复的图层能够重建输入图像

每层应该尽可能「简单」,即它应该具有很强的图像元素内部自相似性

恢复的图层之间彼此独立

这三个标准也是 Double-DIP 网络需要具体实现的参考。第一个标准通过最小化重建损失(衡量构造图像和输入图像之间的误差的参数)来实现;第二个标准通过采用多个 DIP(每层一个)实现;第三个标准由不同 DIP 的输出间的「不相容损失」强制执行(最小化它们的相关性)。

每个 DIP 网络重建输入图像 I 的不同图层 yi;每个 DIPi 的输入是随机采样的均匀噪声 zi; 使用权重掩模 m(x) 混合 DIP 输出 yi = DIPi(zi),从而生成重建图像:

13894005-4dff72862d428e57

其应尽可能接近输入图像 I。

对于某些任务中,权重掩模 m 非常简单,而在其他情况下则需要进行学习(使用附加 DIP 网络)。学习的掩模 m 可以是均匀的或空间变化的,连续的或二进制的。对 m 的约束条件与任务相关联,并且使用指定任务的「正则化损失」来强制执行。因此优化损失是:

13894005-20ec71e78723b365

关于 Double-DIP 网络的训练和优化类似于基本 DIP。而在输入噪声中,增加额外的非恒定噪声扰动可以增加重建的稳定性。通过使用 8 个变换(4 个旋转 90°和 2 个镜像反射 - 垂直和水平)转换输入图像 I 和所有 DIP 的相应随机噪声输入,可以进一步丰富训练集。

优化过程使用到了 ADAM 优化器,而每张图片在 Tesla V100 GPU 上仅需要几分钟来完成。

 研究成果

 论文内提到的多个成果中,我们在下文中着重讨论:

1)前景/背景分割

2)水印去除

前景/背景分割

我们可以设想将图像分割成前景和背景区域,前景层为 y1,背景层为 y2,对于每个像素根据二进制掩模 m(x)进行组合,得到:

13894005-7d4fc8ada0ea6320

这个公式非常适合文中所提到的框架,它将「好的图像片段」定义为易于通过自身合成,但很难使用图像其他部分进行合成这个概念。为了使分割掩码 m(x)变为二进制,我们使用以下正则化损失:

13894005-0065aee737620afe

Double-DIP 能够基于无监督的层分割获得高质量的分割,如图 4 所示,更多图像分割结果可以在该项目的网站上进行观看。尽管有许多其他分割方法(其中包括语义分割)的表现甚至比 DIP 要好,然而它们都有一个的缺点——需要用大量的数据训练。

13894005-b41958ab0def75fc
图 4 图像分割实例

水印去除

水印广泛用于保护受版权保护的图像和视频。Double-DIP 能够将水印作为图像反射的特殊情况来进行去除,其中图层 y1 和图层 y2 是分别是清理后的图像和水印。

And different image segmentation, in which case, the mask is not explicitly provided, but the use of two practical solutions available to handle the ambiguity inherent in the transparent layer. If it involves only a single watermark, then the user with the watermarked area bounding box; and when an image having a small amount of the same watermark (typically 2-3 images) in the training process by the principle of Fuzzy discretion. 5 is examples of some of the watermark removal:

13894005-6b4ad031629f3e9b
Example 5 FIG watermark removed

in conclusion

"Double-DIP" as unsupervised segmentation layer provides a unified framework that can be applied to a wide variety of tasks. In addition to the input image / video, it does not require any additional training data. Although this is a common way, but in certain tasks (such as defogging), the results can be quite get it with the most advanced technical expertise in the field effect or even better. Author of the paper believes, semantic / perceptual cues enhanced Double-DIP might make progress semantic segmentation and other advanced computer vision tasks in the next work, they also intend to do further research on this aspect.

Lei Feng network AI Technology Review related links are summarized as follows:

Original paper Address

https://arxiv.org/abs/1812.00467

Original Address magazine

https://www.rsipvision.com/ComputerVisionNews-2019May/4/

Github project Recommended:

13894005-943840aae27d4bd2.jpg

Reproduced in: https: //www.jianshu.com/p/9821e44897a9

Guess you like

Origin blog.csdn.net/weixin_33862041/article/details/91170278