Disentangled Image Matting

Disentangled Image Matting


Paper link: https://arxiv.org/abs/1909.04686
Published source: 2019 CVPR
1.
Most image matching methods before the background require a roughly specified trimap as input, and estimate scores for all pixels in the unknown area of ​​the trimap alpha value.
Estimating the alpha matte directly from the rough Trimap is the main limitation of the previous method, which ignores the imprecision of the input trimap.
2. Viewpoints of the
article This article believes that there is a classification problem in the image matting using trimap as an additional input that has not been fully resolved.
If you look at the trimap carefully, the pixels in the unknown area will be divided into three sets: opaque foreground, opaque background and semi-transparent area. The first two categories are called opaque pixels, and the latter category is called mixed pixels.
The desired behavior of the matting method is to produce accurate 0 and 1 for opaque pixels, while accurately assessing the tiny opacity (between 0 and 1) of the mixed pixels.
It can be seen that the task of matting includes two related but different tasks. The first is to classify the pixels in the unknown area to confirm the mixed pixels, which is called trimap adaptation in this article , and the second is to accurately calculate the opacity value of the mixed pixels, which is called alpha estimation in this article .
In response to the above problems, the article believes that matting cannot be regarded as a simple single regression task like the previous method, and since it is found that matting can be divided into two tasks, trimap adaptation and alpha estimation, this article proposes a simple but powerful Cutout frame, called AdaMatting (Adaptation and Matting) uses multitasking to perform these two tasks in two different decoder branches.
3.trimap adaptation
we first first look at the proposed articles trimap adaptation, then go to the article's overall framework of
the so-called trimap adaptation is actually designed to predict the optimum T triMap opt , visually speaking, it is to translucent and opaque areas The foreground and background are separated. In the article, the following formula is used to define the optimal trimap.
Insert picture description here
According to the definition of T opt , matting is naturally divided into two steps:
1) Determine whether α is exactly 0, 1, or neither of them;
2) If the area is semi-transparent , Accurate calculation of α.
Insert picture description here
In the figure, you can see that the unknown area in the first input trimap is very wide and wrong, because the low-quality label does not cover all the hair. After adjusting the trimap, not only the range of the output trimap is reduced, but also the output trimap is corrected to obtain a more reliable trimap.
4. Network
This paper designs a fully end-to-end CNN model called AdaMatting. The following figure depicts AdaMatting, which includes an encoder that generates a shared representation, followed by two related decoders to solve trimap adaptation and alpha respectively estimate. Then, the result of trimap adaptation and the intermediate alpha matte are sent to the propagation unit to form the final alpha matte.
Insert picture description here
The multi-task autoencoder includes an encoder (ResNet-50) to generate a shared representation, and then two decoders (similar to the U-Net structure), the purpose is to learn the mapping from the shared representation to the desired output,
t-decoder And a-decoder respectively represent the trimap decoder and the alpha decoder, and each decoder is composed of 3×3 convolutional layers and up-sampling modules.
Under the guidance of cross-entropy loss, the trimap decoder uses the high-level feature part of the shared representation to output 3 channels as the classification result. The alpha decoder uses the low-level feature part of the shared representation to output a 1-channel alpha estimation, which is input to the propagation unit for further refinement.
The propagation unit consists of two res blocks and a convolutional LSTM cell, and takes the input image, trimap adaption and alpha estimation propagation results as input.
ResBlocks extract features from the input, and convolutional LSTM cell can keep memory during propagation to form the final alpha matte.
The propagation unit gradually refines the predicted alpha matte, producing a final result with more accurate edge details and significantly fewer unwanted artifacts.
5. Loss function The
loss function
can be modeled as a segmentation task using multi-task loss trimap adaptive, and the input image is segmented into a solid foreground, a solid background and a semi-transparent area. The process of solving this segmentation problem can produce rich semantic features, which in turn helps to solve the alpha matte regression.
The article uses task uncertainty loss instead of linear combination loss. The loss can be expressed as:
Insert picture description here
Among them, T ~ and α ~ They represent the output of trimap adaptation and α estimation, σ 1 and σ 2 represent dynamically adjusted task weights, and L t and L α represent trimap adaptive loss and α estimation loss, respectively. More specifically, L t is the cross-entropy loss, and L α is the L1 loss, which is only calculated on the unknown region of (denoted as u):
Insert picture description here
6. Results
(1) Test results on the alphamatting.com dataset
Insert picture description here
(2) VS multitasking two stages
a more intuitive approach is to use two cascaded networks, adaptive and sequentially solving trimap image matting, rather than a single network trained multitasking manner. We call such a model Seq-AdaMatting. The
Insert picture description here
results prove that the effect of multi-task learning is better than the two-stage method.

Guess you like

Origin blog.csdn.net/balabalabiubiu/article/details/114839048