Face alignment track

1.Real-Time Shape Tracking of Facial Landmarks

Wettum performed a comparative experiment to determine the best
algorithm for real-time tracking with smart phone [7]. To compare
the four algorithms, namely Lucas-Kanade (LK) [8] point tracker,
Structured Output Tracking with Kernels (Struck) [9],
Discriminative Scale Space Tracker (DSST) [10], and Kernelized
Correlation Filters (KCF) [11], the Dlib Facial Landmark Detector
(DFLD), which is a facial landmark localization library 12] and
Deformable Shape Tracking (DEST) [13] were used as comparison
objects. Results indicate that LK tracker and DSST are the
algorithms that can be used for actual facial landmark tracking.

However, as DSK cannot be performed in real-time, we can
conclude that LK is the most useful algorithm for facial landmark
tracking.

2.Facial Landmark Tracking on a Mobile Device

A widely used approach to the tracking problem is optical
flow estimation
. Optical flow is used to study a large variety
of motions. One of the most popular methods is the Lucas-
Kanade algorithm. This method is developed by Bruce D.
Lucas and T. Kanade and presented in [15]. The algorithm
estimates the displacement for the local neighbourhood of a
pixel. One pixel does not give enough information for matching
with another pixel. It is better to use multiple pixels, i.e. a
neighbourhood around a pixel. For every point that is tracked a
movement vector is obtained by comparing the pixel intensities
of two consecutive images. Many improvements have been
made to the Lucas-Kanade algorithm. J. Bouguet reduces the
resolution of images first and then applies the Lucas-Kanade
method [16], he proposed a pyramidal implementation of the
classical Lucas-Kanade algorithm.

Correlation filters recently obtained considerable attention
due to computational efficiency. J. F. Henriques et al. proposed
a method, named CSK [22] and uses correlation filters in
a kernel space. This method is able to process hundreds of
frames per second [5]. KCF (Kernelized Correlation Filter)
method is an improvement to CSK and is proposed in [23].
KCF has been shown successful and able to outperform TLD
and Struck, while running at hundreds of frames per second
[23].

B. Lucas-Kanade (LK) Point Tracker
Optical flow is considered suitable for landmark tracking
on a mobile device because it can be used to observe a
large variety of motions [26], i.e. static observer and moving
object, moving observer and static object, moving observer
and moving object. This last scenario is most likely on a
mobile device. A sparse optical flow method is used because
specific points are of interest. The Open Source Computer
Vision (OpenCV) library [29] includes an implementation of
the Lucas-Kanade (LK) method. This implementation is based
on a sparse iterative version of the Lucas-Kanade optical flow
in pyramids [16].
The LK tracker from the OpenCV library is implemented
in the C++ framework because it is able to run in realtime.
The real-time performance is facilitated by the use
of a pyramidal representation of the frames. The pyramidal
representation allows the tracker to handle large pixel motions,
i.e. larger than the used ROI. The ROI can be kept relatively
small which is advantageous for the computational load. The
pyramid representation is built in a recursive manner, starting
from the original frame. Assuming the Full HD camera of
the mobile device is used, it would be useless to go above a
pyramidal level of 5 (5 lower resolution frame representations).
For example, 1920 x 1080 pixels is the resolution of the image
at level 0. The image resolutions of the subsequent levels are
respectively 960 x 540, 480 x 270, 240 x 135, 120 x 67 and
60 x 33 pixels. In the framework 5 pyramidal levels are used
for resolutions of 1280 x 720 pixels and 4 pyramidal levels
are used for all resolutions below 1280 x 720 pixels.
OpenCV uses a corner point detector to initialise the LK
tracker. The corner point detector is not used in the framework
because the framework uses an ideal landmark detector. Furthermore,
the corner point detector finds the most prominent
corners in the image. Landmarks such as the nose tip are often
not defined by prominent corners.

C. Discriminative Scale Space Tracker (DSST)
In the face recognition system on a mobile device the facial
image is captured by hand. This will introduce a varying
scale of the face in the captured video. The Discriminative
Scale Space Tracker (DSST) [30] performs well in image
sequences with significant scale variations. Moreover, the
DSST is the best performing tracker in the Visual Object
Challenge (VOT) 2014 [31]. Therefore, the DSST tracker
might be a good solution to the landmark tracking problem.
The DSST implementation of the Dlib C++ software library
is used.
The DSST is an extension to the Minimum Output Sum
of Squared Errors (MOSSE) tracker [32] with robust scale
BACHELOR ASSIGNMENT SCS - DECEMBER 2016 5
estimation. The MOSSE tracker is limited to estimating the
translation between frames, the DSST tracker adds robust scale
estimation. The MOSSE tracker is initialised in the first frame.
The object is tracked by correlating the trained filter (which
models the appearance of the object) over a search window.
The maximum value in the correlation output is the new position
of the object. The correlation filter of the tracker is then
updated, the filter is trained during run-time. The correlation
is computed in the Fourier domain because computing the
correlation is an element-wise multiplication in the Fourier
domain. The DSST estimates the target size by learning a
one-dimensional discriminative scale filter. The scale filter is
trained by extracting sample patches of different scales around
the current position of the object. Intensity features and the
HOG features (histogram of oriented gradients) are used for
the translation filter.
D. Kernelized Correlation Filters (KCF)
The high-speed tracking algorithm with Kernelized Correlation
Filters (KCF) [23] is also implemented in the framework.
This tracking algorithm is considered suitable for landmark
tracking because it performs in real-time [31]. A realisation
of the KCF algorithm is available in the OpenCV library. The
implementation is extended with color features which result
in superior performance for visual tracking [33].
E. Structured Output Tracking with Kernels (Struck)
The Structured Output Tracking with Kernels (Struck)
method [20] is based on structured output prediction. The
method uses a kernelized structured output SVM, which
is learned online. This allows for adaptive tracking, which
is beneficial for facial landmark tracking. Facial landmarks
deform due to facial expressions. As was mentioned in the
previous section, this method uses a budget mechanism in
order to perform in real-time.
The Struck algorithm is implemented using the code from
the authors. The code of the authors is open source and is
available on GitHub [34]. The code is modified in order to
include it in the framework. No changes have been made to
the operation of the algorithm. The default settings are used,
this means Haar features and a Gaussian kernel are used.

猜你喜欢

转载自blog.csdn.net/u011808673/article/details/81303618