Fast Deep Matting for Portrait Animation on Mobile Phone

Fast Deep Matting for Portrait Animation on Mobile Phone


Link to the paper: https://arxiv.org/pdf/1707.08289.pdf
Source: 2017 ACM
1. Content
This paper proposes a real-time automatic depth matting method for mobile devices based on segmented blocks and feathering blocks. By using densely connected blocks and dilated convolution, a lightweight fully convolutional network is designed to predict the coarse binary mask of portrait images. Further developed a feathering block with edge preservation and matting adaptive to learn the guided filter, and convert the binary mask into an alpha matte. Finally, an automatic portrait animation system based on fast depth matting is built on mobile devices, which does not require any interaction and can achieve real-time matting at 15 fps.
2. Network
input is color image I, output is alpha matte α. The network consists of two stages. The first stage is the portrait segmentation network, which takes an image as input and obtains a coarse binary mask. The second stage is the feathering module, which refines the foreground/background mask into the final alpha matte. The first stage uses a light full convolutional network to quickly provide a coarse binary mask, and the second stage uses a single filter to refine the coarse binary mask, which greatly reduces the error.
Insert picture description here
(1) segmentation module
for fast foreground segmentation, the article proposes a light density network in sub-blocks. The following figure shows its architecture.
Insert picture description here
The network has 6 convolutional layers and 1 max-pooling layer.
The initial block is composed of 3 × 3 convolution and max-pooling, which is used to down-sample the input image.
The dilated dense block contains four convolutional layers with different dilated rates (each layer gets a different field of view) and four densely connected (grasp the foreground with different sizes).
The connections of the four convolutional layers are sent to the final convolution to obtain binary feature maps.
Finally, the feature map is interpolated to obtain score maps of the same size as the original image.
(2) Feathering block
article with one Feathering Block roughened to improve the divided block generated binary mask and the gradient drift due to the problem of the convolution operation of pixel level smoothing caused.
①结构
The input of the feathering block is an image I, the corresponding rough binary mask S, the square of the image, and the product of the image and its binary mask.
Insert picture description here
Concatenate the inputs, and then send the concatenated results to a convolutional network containing two 3 × 3 convolutional layers, and then three maps corresponding to the weights and biases of the binary mask can be obtained.
②feathering
feathering layer may be expressed as a linear transformation of each pixel in the sliding window centered on a binary mask of crude:
Insert picture description here
wherein, feathering [alpha] is the output layer, denoted Matte Alpha, S F. foreground binary mask from the coarse fraction, S B is The background score, i is the position of the pixel, and ( ak , bk , ck ) are linear coefficients assumed to be constant in the k-th sliding window ω k . Therefore, there are:
Insert picture description here
where q i i * I i , F. I = the I I * S F. I , B I = the I I * S B I , the I input image, thereby to obtain the derivative
Insert picture description here
of the feathering algorithm ensures retention block has an edge and adaptive properties. The performance of the plume is like ensemble learning, because F, B, I can be used as classifiers, and the parameters a, b, and c can be used as classification weights. S F. And S B two score maps are strong reaction in the edge region, because of the uncertainties in these regions. When the parameters a, b, and c are well trained, the score maps of the foreground and background are allowed to have inaccurate responses. In this case, we want the parameters a and b to be as small as possible, which means that inaccurate responses are suppressed. In other words, as long as the absolute values ​​of a and b are set to be small in the edge area, and c is dominant, the feathery block can retain the edge.
When the linear model is applied to all sliding windows in the entire image, the value of α is different in different windows. After calculating (ak, bk, ck) for all sliding windows in the image, the article averages all possible values ​​of αi:
Insert picture description here
In order to determine the linear coefficient, this article designs a sub-network to solve it. The network uses a loss function consisting of two parts, the first loss L α (Learning parameter) Measure alpha matte, which is the absolute difference between the GT alpha value and the predicted alpha value of each pixel. The second loss is the component loss (preserving the information of the input image as much as possible), which is the L2 norm loss function of the predicted RGB foreground. Therefore, the article minimizes the following cost function:
Insert picture description here
In fact, the feathering block can be intuitively explained as a kind of attention mechanism, which gives different attention to various parts according to factors. In particular, from the example in the figure below, we can infer that factor a pays more attention to the body part of the object, factor b pays more attention to the background part, and factor c pays more attention to the head part of the object. Therefore, it can be inferred that the factors a and b locally emphasize the matting problem, and the factor c considers the matting problem globally.
Insert picture description here
(a) Input image. (b) The foreground of the original image, calculated by Eq(1). © Eq(2) and Eq(3) the weight ak of the feather block. (d) The weight bk of the plume in equations (2) and (3) (e) The weight ck of the plume in equations (2) and (3).
3. Results
量化结果:
Insert picture description here
This paper compares the components of the proposed system with the state-of-the-art semantic segmentation networks Deeplab and PSPN. The Light Dense Network (LDN) greatly improves the speed, and the feather block (FB) reduces the gradient error (Grad) and mean square error (MSE). In addition, the feather block has better performance than the guided filter (GF).
视觉效果:
Insert picture description here
(A) Original image. (B) GT prospects. (C) The foreground calculated by the binary mask. (D) The foreground calculated by the binary mask, with a guided filter. (E) The prospect calculated by the binary mask and the feather block in this paper.

Guess you like

Origin blog.csdn.net/balabalabiubiu/article/details/115069173