Abstract

一个结合梯度特征HOG及颜色特征的实时跟踪算法，速度达到80FPS，即每秒80帧图像。

Introduction

Staple: Sum of Template And Pixel-wise LEarners
对于目前的主流跟踪算法，采用的tracking-by-detection策略，即先检测目标的位置，以HOG检测为例，对同一个目标，可能得到多个目标的矩形框，如下图所示。有的时候直接通过NMS(non-maximum suppression 非极大值抑制)处理保证只有一个解。不过多数跟踪算法宁可错杀，也不愿放过一个。HOG Object Detection 可以参考 Histogram of Oriented Gradients and Object Detection
这里写图片描述

Online learning and Correlation Filters：在线学习+协同过滤
Robustness to deformation：应对形变
Schemes to reduce model drift：应对漂移问题
Combining multiple estimates：结合多种估计
Long-term tracking with re-detection：长期跟踪及重复检测

Proposed Approach

符号及含义

$t$ frame index, 帧索引，帧下标
$x_t$ 第t帧图像， $x$ 指代任意一帧图像
$p_t$ 第t帧图像中目标对应的矩形，当然，这个是最优的， $p$ 指代任意一帧图像
$S_t$ 第t帧图像中目标对应的所有矩形，所以我们有 $p t = a r g m a x p \in S t f (T (x t, p); θ t - 1)$ $p_t=argmax_{p\in S_t}f(T(x_t,p);\theta _{t-1})$
$f(T(x,p);\theta)$ 依据模型参数 $\theta$ ，计算得到目标在图像 $x$ 对应矩形 $p$ 的分数(score)。这个分数当然是越高越好，所以选择取最大分数时的矩形 $p$ 作为最优的矩形 $p_t$ 。对于 $T(x,p)$ 可以暂时肤浅地理解为检测出来的梯度特征与颜色特征。同样参数 $\theta$ 也可以暂时肤浅地理解为预测的梯度特征与颜色特征。然后 $f(T(x,p);\theta)$ 求预测特征与检测特征之间的匹配的情况，匹配分数越高，就越可能对应实际的目标矩形 $p_t$ 。
$\theta$ 模型参数 $\theta$ 可以通过损失最小化求出，设损失函数为 $L(\theta;X_t)$ ,在这里 $X_t=\{(x_i,p_i\}_{i=1}^t$ 并不是帧的集合 $\{x_1,x_2,...,x_t\}$ ，而是 $\{(x_1,p_1),(x_2,p_2),...,(x_t,p_t)\}$ ，这样包含之前每一帧中目标的位置。对参数的复杂度加以惩罚，最终得到： $θ t = a r g m i n θ \in Ω {L (θ; X t) + λ R (θ)}$ $\theta _t=argmin_{\theta \in \Omega} \{L(\theta;X_t)+\lambda R(\theta)\}$
$f(x)$ 回到对 $p$ 进行打分的函数，前面提到要结合梯度特征与颜色特征，考虑算法的实时性，当然用线性方式结合速度快，这样有 $f (x) = c t m p l f t m p l (x) + c h i s t f h i s t (x)$ $f(x)=c_{tmpl}f_{tmpl}(x)+c_{hist}f_{hist}(x)$
抱歉这里的 $x$ 理应对应 $T(x,p);\theta$ ，但原文中就是这么用的。tmpl 就是template(梯度特征)，hist就是histogram(颜色直方图特征)
$f_{tmpl}(x;h)$ 考虑梯度特征的打分函数，这里 $\mathcal{T}$ 与 $T$ 以及前面的函数 $T(x_t,p)$ 应该没有任何联系。 $\mathcal{T} \in \mathbb{Z}^2$ 为有限的网格(finite grid)，可以理解为图像中一像素的位置坐标(x,y)。 $T$ 应该为向量的转置。这里h为模型参数， $\phi _x$ 为图像梯度特征。这样，对于每一点，我们有： $f t m p l (x; h) = \sum u \in  h [u] T ϕ x [u]$ $f_{tmpl}(x;h)=\sum_{u \in \mathcal{T}}h[u]^T \phi _x[u]$
$f_{hist}(x;\beta)$ 考虑颜色特征的打分函数，有一点点不同，在这里， $\beta$ 同样是模型参数， $\mathcal{H}$ 也同样是有限的网格(finite grid) $f h i s t (x; β) = β T (1 ∣  ∣ \sum u \in  ψ x [u])$ $f_{hist}(x;\beta)=\beta ^T(\frac 1 {\lvert \mathcal{H} \rvert}\sum _{u\in \mathcal{H}} \psi _x[u])$
$\theta$ 参数 $\theta =(h,\beta)$
$L(\theta,X_T)$ 损失函数= $\sum _{t=1} ^T w_t l(x_t,p_t,\theta)$ ,这里每帧的损失函数 $l(x,p,\theta)=cost(p,argmax_{q\in S}f(T(x,q);\theta))$ ，在这里， $p$ 自然是正确的矩形。
然后得到参数的解： $h t = a r g m i n h {L t m p l (h; X t) + 1 2 λ t m p l ∥ h ∥ 2}$ $h_t=argmin_h \{L_{tmpl}(h;X_t)+\frac 1 2 \lambda _{tmpl} \lVert h \rVert ^2 \}$ $β t = a r g m i n β {L h i s t (β; X t) + 1 2 λ h i s t ∥ β ∥ 2}$ $\beta _t=argmin_{\beta} \{L_{hist}(\beta;X_t)+\frac 1 2 \lambda _{hist} \lVert \beta \rVert ^2 \}$

Online least-squares optimisation

上面仅仅介绍了原文的前10个公式，原文一共26个公式，就不一一介绍了。接下来主要讲大概。
这一小节通过梯度下降求解损失函数

Learning the template score

再次梯度下降求 $h$

Learning the histogram score

还是梯度下降求 $\beta$

Search strategy

本文假设矩形窗口 $p$ 有平移和缩放，但保持长宽比例和朝向（目标不旋转）
这里写图片描述

算法学习 -- Staple: Complementary Learners for Real-Time Tracking

Abstract

Introduction

Proposed Approach

符号及含义

Online least-squares optimisation

Learning the template score

Learning the histogram score

Search strategy

Reference

猜你喜欢

算法学习 -- Staple: Complementary Learners for Real-Time Tracking

Abstract

Introduction

Related Work

Proposed Approach

符号及含义

Online least-squares optimisation

Learning the template score

Learning the histogram score

Search strategy

Reference

猜你喜欢