2017-04-28 Saliency Detection via Dense and Sparse Reconstruction

The learning of "Saliency Detection via Dense and Sparse Reconstruction" (L.-H. Zhang, X. Ruan, M.-H. Yang. IEEE International Conference on Computer Vision, Dec, 2013, pp. 2976-2983)

Use image boundaries via superpixels as background templates, from which dense and sparse appearance models are constructed. Improved saliency map by error propagation, object-biased Gaussian model, and use Bayesian inference integrate final saliency map.

Model overview
1. Generate superpixels using the simple linear iterative clustering (SLIC) algorithm, and extract the D-dimensional feature of each boundary segment and construct the background template set.
2. Use dense reconstruction errors to measure the saliency of each region via Principal Component Analysis.
3. Use sparse reconstruction error to measure the saliency of each region via sparse representation.
4. Apply the K-means algorithm to cluster N image segments, then compare segment i between the other segments belonging to cluster k to smooth the reconstruction errors generated by dense and sparse appearance models.
5. Utilize the similarity between pixel z and its corresponding segment n at scale s as the weight to average the multi-scale reconstruction errors, and get pixel-level saliency.
6. Use an object-biased Gaussian model to refine saliency map.
7. Take one saliency map as the prior and use the other one instead of Lab color information to compute the likelihoods, which integrates more diverse information from different saliency maps.
8. Use these two posterior probabilities to compute an integrated saliency map.
Specific description and supplementary knowledge
1. Dense Reconstruction Error

Background templates: B=b1,b2…bM B∈RD×M segment i (i∈[1,N] ),

Principal Component Analysis (PCA):

A multivariate statistical analysis method in which multiple variables are selected by linear transformation to select fewer significant variables.

In this paper, use PCA to analysis the background templates. Get a more concise expression of the background, and to some extent filter out the interference that does not belong to the background.