Spectral Matting 方法的理解

《Spectral Matting》[1]是来自Anat Levin的2008年的论文，同年他的另一篇论文是《A Closed-Form Solution to Natural Image Matting》[2]（我在上一篇博客中已讨论）。2018年的《Semantic Soft Segmentation》[3]也来自CSAIL.MIT，可谓一脉相承，[3]中采用[1]的 Spectral Matting 方法，将传统方法与深度学习方法有机结合在一起，实现了复杂图像上精确到头发级的像素软标注。本文着重讨论[1]的方法，后续文章也会讨论[3]。
Matting算法的典型假设是：

I_{i} = α_{i} F_{i} + (1 - α_{i}) B_{i} (1)

$\mathbf I_i = \alpha_i \mathbf F_i +(1-\alpha_i)\mathbf B_i \qquad(1)$
公式(1)中

I_{i}

$\mathbf I_i$ 表示图像

I

$\mathbf I$ 的像素

i

$i$ 上的颜色（R,G,B）矢量，

F_{i}

$\mathbf F_i$ 表示前景颜色矢量，

B_{i}

$\mathbf B_i$ 表示背景颜色矢量，

α_{i}

$\alpha_i$ 表示该点的透明度取值范围是

[0, 1]

$[0,1]$ 。

I_{i}

$\mathbf I_i$ 是前景

F_{i}

$\mathbf F_i$ 和背景

B_{i}

$\mathbf B_i$ 的线性组合，该方程被称为：compositing equation。
公式（1）的透明度

α

$\alpha$ 是一个标量，[1]扩展了它，让它成为了一个矢量：

\vec{α_{i}} = {α_{i}^{1}, α_{i}^{2}, \dots, α_{i}^{K}}, and has constraint \sum_{k = 1}^{K} α_{i}^{k} = 1 I_{i} = \sum_{k = 1}^{K} α_{i}^{k} F_{i}^{k} (2)

$\vec {\alpha_i}=\{\alpha_i^1, \alpha_i^2,\cdots,\alpha_i^K\}, \quad \text{and has constraint}\sum_{k=1}^K \alpha_i^k = 1 \\ \mathbf I_i=\sum_{k=1}^K \alpha_i^k \mathbf F_i^k \qquad(2)$
公式（2）中像素

i

$i$ 的颜色矢量是多层颜色矢量

F_{i}^{k}

$\mathbf F_i^k$ 的线性组合。以下是一个例子：
这里写图片描述

图1 （d）中展示了8个matting层对应的

α^{k}, 其 中 (k = 1, \dots, 8)

$\alpha^k,其中(k=1,\cdots,8)$ 。（c）中Alpha matte是由（d）的

α^{3}, α^{5}, α^{7}

$\alpha^3,\alpha^5,\alpha^7$ （（d）中

α^{k}

$\alpha^k$ 的序号按行顺序排列，第一行是1、2、3、4，第二行是5、6、7、8）合并而成。

\vec{α_{i}} = {α_{i}^{1}, α_{i}^{2}, \dots, α_{i}^{K}}

$\vec {\alpha_i}=\{\alpha_i^1, \alpha_i^2,\cdots,\alpha_i^K\}$ 对应每一个像素，同一维度的分量构成一层——

α^{k}

$\alpha^k$ （矢量）被称为

k^{t h}

$k^{th}$ matting component，图1（d）展示的是8个matting分层所对应的matting component，理想的matting component是稀疏的，这一点从图1（d）中可以看到，其中白色部分是1，黑色部分是0，即（原文）：
Each component should be either completely opaque or completely transparent over as many image pixels as possible. This means that areas of transition between the different layers are limited to a small number of pixels, and each pixel is influenced by a small number of layers.
本文[1]主要是通过谱分析（Spectral Analysis）来分析matting component

α

$\alpha$ 与其对应的Laplacian Matrix的关系。

1、谱分析（Spectral Analysis）

定义：亲和度矩阵（Affinity matrix） $\mathbf A$ ，它的每一项表示图像两点间的亲和度，可以是： $\mathbf A_{ij}=e^{-d_{ij}/\sigma^2}$ ，其中 $d_{ij}$ 表示像素 $i$ 和像素 $j$ 之间的距离，比如：色彩距离（Color distance）、几何距离（Geometric distance）。根据 $\mathbf A$ 可以定义Laplacian matrix（ $\mathbf L$ ），有 $\mathbf L = \mathbf D - \mathbf A$ ，其中矩阵 $\mathbf D$ 是一个对角矩阵，它的对角元素是对应的亲和矩阵同一行元素的和，即： $\mathbf D_{ij} = \sum _j \mathbf A_{ij}$ 。由上， $\mathbf L$ 是一个半正定（Semi-positive）对称矩阵。
假设：在理想情况下， $\mathbf A$ 能够恰好捕抓到在图片中不同连接件（Connected component）像素之间的关系，其实就是说在亲和度矩阵 $\mathbf A$ 上，可以看到像素之间（是否属于同一连接件）的关系。我们定义一个示性矢量 $\mathbf m^C$ ，它的每一个元素是：

m_{i}^{C} = {\begin{array}{cc} 1 & i \in C \\ 0 & i \notin C \end{array} (3)

$m_i^C = \left \{ \begin{array}{cc} 1&i\in C\\ 0&i\notin C \end{array} \right. \qquad(3)$

m^{C}

$\mathbf m^C$ 对应一个连接件

C

$C$ 。所谓连接件

C

$C$ ，其实是一个像素集，在此集合中的像素都在同一个连接件上，并有

C \subset I

$C \subset I$ 。
由于

L = D - A

$\mathbf L = \mathbf D - \mathbf A$ ，

D_{i j} = \sum_{j} A_{i j}

$\mathbf D_{ij} = \sum _j \mathbf A_{ij}$ ，因此，

L \cdot m^{C} = 0

$\mathbf L \cdot \mathbf m^C = \mathbf 0$ ，即

m^{C}

$\mathbf m^C$ 是

L

$\mathbf L$ 特征值（Eigenvalue）

e = 0

$e=0$ 所对应的特征矢量（Eigenvector），也就是说，特征值0有几个特征矢量，就可以分解出多少个连接件C。假设图像

I

$I$ ，由K个连接件：

C^{1}, C^{2}, \dots, C^{K}

$C^1,C^2,\cdots,C^K$ 组成，则与之相应的示性矢量

m^{C^{k}}

$\mathbf m^{C^k}$ 相互独立，是构成

L

$\mathbf L$ 的零空间（Nullspace）的基。
关于以上特征值，特征矢量，【1】有一段叙述：
In real images, the affinity matrix A is rarely able to perfectly separate between the different pixel clusters. Therefore, the Laplacian L usually does not have multiple eigenvectors with zero eigenvalue. However, it has been observed that the smallest eigenvectors of L tend to be nearly constant within coherent image components. Extracting the different components from the smallest eigenvectors is known as spectral rounding.
也就是说，在真实环境下，很难得到多个0特征值对应的特征矢量，但L的最小特征值所对应的特征矢量还是能体现一些连接件的特性，比如：在连贯的image components中，最小特征矢量（the smallest eigenvectors）趋向一个常数。而这种通过the smallest eigenvectors提取不同component的方法被称为spectral rounding。
【1】为阐明连接件给出一个声明：
Claim 1
Let

α^{1}, \dots, α^{K}

$\alpha^1,\cdots, \alpha^K$ be the actual decomposition of the image I into k matting components. The vectors

α^{1}, \dots, α^{K}

$\alpha^1,\cdots, \alpha^K$ lie in the nullspace of the matting Laplacian L (given by eq. 6 with ε = 0) if every local image window

w

$w$ satisfies one of the following conditions:
1. A single component

α^{k}

$\alpha^k$ is active within

w

$w$ .
2. Two components

α^{k 1}, α^{k 2}

$\alpha^{k1}, \alpha^{k2}$ are active within

w

$w$ and the colors of the corresponding layers

F^{k 1}, F^{k 2}

$F^{k1}, F^{k2}$ within

w

$w$ lie on two different lines in RGB space.
3. Three components

α^{k 1}, α^{k 2}, α^{k 3}

$\alpha^{k1}, \alpha^{k2}, \alpha^{k3}$ are active within

w

$w$ , each layer

F^{k 1}, F^{k 2}, F^{k 3}

$F^{k1}, F^{k2},F^{k3}$ has a constant color within

w

$w$ , and the three colors are linearly independent.
此处要求图像的每个窗体都要满足以上三个条件之一。若满足，则matting components：

α^{1}, \dots, α^{K}

$\alpha^1,\cdots, \alpha^K$ ，便一定在the matting Laplacian L的Nullspace上，即是L特征值为0所对应特征矢量的线性组合。【1】文中eq.6 是L的每个元素的定义

L (i, j)

$L(i,j)$ ：

L (i, j) = \sum_{q | (i, j) \in w_{q}} (δ_{i j} - \frac{1}{| w_{q} |} (1 + (I_{i} - μ_{q})^{T} (Σ_{q} + \frac{ϵ}{| w_{q} |} I_{3})^{- 1} (I_{i} - μ_{q}))) (4)

$L(i,j)=\sum_{q\vert(i,j)\in w_q}(\delta_{ij}-\frac{1}{\vert w_q \vert}(1+(I_i-\mu_q)^T(\Sigma_q + \frac{\epsilon}{\vert w_q \vert}I_3)^{-1}(I_i-\mu_q)))\qquad(4)$
Claim 1详细的证明见【1】。在实际图像中，并不能总满足Claim 1，[1]中有一段补充说明：
In most real images, the assumptions of claim 1 don’t hold exactly, and thus the matting Laplacian might not have multiple eigenvectors whose eigenvalue is 0. Yet if the layers are sufficiently distinct, they are generally captured by the smallest eigenvectors of L.
接下来，仍以理想情况来讨论，即图像满足Claim 1，则根据上述结论，matting components在L的Nullspace中，于是，恢复matting components问题就转化为L的Nullspace基线性变换的问题，有：

α^{k} = E y^{k} E = [e_{1}, e_{2}, \dots, e_{K}], y^{k} = [y_{1}, y_{2}, \dots, y_{K}]^{T} (5)

$\alpha^k = \mathbf E \mathbf y^k \\ \mathbf E=[\mathbf e_1,\mathbf e_2,\cdots,\mathbf e_K],\mathbf y^k=[y_1,y_2,\cdots,y_K]^T \qquad(5)$
公式(5)中，

e_{1}, e_{2}, \dots, e_{n}

$\mathbf e_1,\mathbf e_2,\cdots,\mathbf e_n$ 表示L的0特征值所对应的K个线性无关特征矢量。

y^{k}

$\mathbf y^k$ 表示特征矢量线性组合的权重矢量。
我们要找的matting components应该是怎样的呢？
Recall that the matting components should sum to 1 at each image pixel, and they should be near 0 or 1 for most image pixels, since the majority of image pixels are usually opaque. Thus, we are looking for a linear transformation of the eigenvectors that would yield a set of nearly binary vectors.
即

α^{k}

$\alpha^k$ 应该接近二进制矢量，而且在同一像素点上还有和为1的约束（公式（2））。因而我们可以得到一个优化目标：

min J (α) = min \sum_{i, k} | α_{i}^{k} |^{γ} + | 1 - α_{i}^{k} |^{γ}, where α^{k} = E y^{k} Subject to \sum_{k} α_{i}^{k} = 1 (6)

$\min J(\alpha) = \min \sum_{i,k} \vert \alpha_i^k \vert^{\gamma} + \vert 1-\alpha_i^k \vert^{\gamma},\quad \text{where }\alpha^k=\mathbf E \mathbf y^k \\ \text{Subject to } \sum_k \alpha_i^k = 1 \qquad(6)$
这是因为，当

α_{i}^{k}

$\alpha_i^k$ 等于0或1时，

| α_{i}^{k} |^{γ} + | 1 - α_{i}^{k} |^{γ}

$\vert \alpha_i^k \vert^{\gamma} + \vert 1-\alpha_i^k \vert^{\gamma}$ 会取最小值，以下是

γ = 0.9

$\gamma=0.9$ 时的情况。

x = np.linspace(0,1,101)
y = x**0.9+ (1-x)**0.9
plt.plot(x,y)
plt.show()

这里写图片描述
图2 公式（6）的求和项在0和1处取得最小值
由上分析，对公式（6）求最小值，将会偏好得到0-1矢量，这就可以达到我们的目的了。对于 $\gamma$ ，来一段原滋原味的：

If $0\lt \gamma \lt 1$ is used (in our implementation $\gamma =0.9$ ), then $\vert \alpha_i^k \vert^{\gamma} + \vert 1-\alpha_i^k \vert^{\gamma}$ is a robust score measuring the sparsity of a matting component. Without the requirement $\alpha^k=\mathbf E \mathbf y^k$ the sparsity term would be minimized by binary vectors, but as the vectors $\alpha^k$ are restricted to linear combinations of the eigenvectors they must maintain the fuzzy layer boundaries. Although we do not explicitly constrain the α values to be between 0 and 1, in practice the resulting values tend to lie in this range due to the sparsity penalty.

自此，我们得到了一个关于 $\vec \alpha$ 的代价函数，可进一步通过优化方法求解。
【问题】：上面用到了“谱分析”，但什么是谱分析？可参看：http://www.lunarnai.cn/2018/02/04/spectral-graph/

参考文献：
[1]《Spectral Matting》(2008) CSMIT.MIT
[2]《A Closed-Form Solution to Natural Image Matting》(2008) CSAIL.MIT
[3]《Semantic Soft Segmentation》（2018）CSAIL.MIT

Spectral Matting 方法的理解

1、谱分析（Spectral Analysis）

猜你喜欢