Bored reading the paper: Visual Attention Model RARE2012

Riche, N., Mancas, M., Duvinage, M., Mibulumukini, M., Gosselin, B., & Dutoit, T. (2013). RARE2012: A multi-scale rarity-based saliency detection with its comparative statistical analysis. Signal Processing: Image Communication28(6), 642–658. https://doi.org/10.1016/j.image.2013.03.009
 
A long time ago, much earlier article it today, see the article using its methods, specifically I read next.
Visual attention mechanism to this thing we're interested in, and that is because it is useful to it. For example, help optimize the man-machine interface, allowing users to use the interactive buttons more comfortable; advertising design evaluation; video image data compression, focusing more interested in preserving image information. Robot visual perception and so on it.
On a common definition of human visual attention, I do not know there is no research on the biology to understand how this is going, anyway, by the time this article was published there is no drop. But generally speaking, the human attention can be defined as prioritizing incoming stimuli and selective attention in which a part of the natural ability . OK, there is a preliminary definition also okay. That ye do visual attention on it, the brain receives the image signal is not just sort of a signal sequence to be that way.
In computer vision, exploration of most of the attention mechanism depends on the "saliency maps" of the concept, literally means "saliency map." Simply put, "saliency maps" of the input signal is any one model made a mapping result of the mapping is that the model will be more important to get a strong signal accordingly.
So for visual attention mechanism is concerned, is the input image; human eyes are attracted to the place is more important signal. So, explain the mechanism of visual attention, just want to find a better "saliency maps". It should quickly tell us the input image, those places in our visual perception system is very attractive.
According to this line of thought, "saliency maps" in contains two mechanisms. One is bottom-up attention, also known as stimulus-driven or external attention. Another top-down, also known as task-driven endogenous or attention, which integrates specific knowledge (task viewer may have in a particular case, the scene model type, and the like can be identified object ). The RARE2012 purely bottom-up, because the better the performance of the bottom-up method. Is totally dependent on the input image information, you do not need to consider other mechanisms for decision-making, of course, better performance spicy.
The article compares the year the popular several ways to the conclusion that their approach is very good. Ha ha ha
Their method:
-------------------------------------------------------------------------------------------
The first stage algorithm:
The first step: The method of PCA first principal component analysis, the image is mapped to a three-channel rgb three linearly independent space. Is split into three channels, these three channels, channal1 mainly containing luminance information, and channal3 channal2 contains chromaticity information. But the three channels of information are independent. It looks like a little three channels Well hsv, hsv is lightness, hue and saturation. But what I do not know the specific decomposition Oh, depends on the source, the article did not say.
The second step: calculation of PCA rarity of images of three channels directly. Hey, here still have to look at the source code, the image dimension principal component analysis to get down I can understand that the top split channel using PCA method is the Editor do? No matter what, so do it the three rarity distribution. Do so is in the extracted image in a lower color feature , of course, also includes a luminance distribution characteristics.
The third step: extracting a direction and then the above-described three features in FIG channel image using Gabor filters. Gabor filter is a Gabor selected because the brain is similar to the process simple visual cortex nerve (V1) of.
Gabor is defined as: 
Gabor human visual system in response to visual stimulation of simple cells is very similar. It has good characteristics in terms of local spatial and frequency domain information extracted object. Gabor wavelet sensitive to the edge image, capable of providing good directional characteristic selection and choice of scale, and insensitive illumination changes, it is possible to provide good illumination change adaptation.
D Gabor filter is formed by a Gabor function having obtain optimum properties while localized spatial and frequency domains, and thus can describe the corresponding spatial frequency (scale), the spatial position and orientation selective local configuration information . Gabor is used to extract the image of the spatial orientation and texture characteristics .
Articles of Gabor respectively input eight directions, so for an input image is concerned, there will be a total of 8 results. This eight outputs to be integrated into an output image.
Output the same angle in different directions fusion:
According to equation (2) for eight different patterns computational efficiency coefficient:
FIG sort eight directions from the size of the EC. Each pattern is multiplied by weight: i / N. N = 8, i is this pattern have a place in the EC. Paper set by a threshold, filtering out of small EC pattern:
T = 0.3 is the author considered to be more reasonable value.
FIG 8 are then fused direction:
Thus the PCA method channal1 three images obtained, channal2 channal3 and extracted through the three Gabor rarity grain direction of FIG.
-------------------------------------------------------------------------------------------
second stage:
Rarity mechanism of this phase is rare2012 the key, after all, is the name of this thing.
The method is: in the statistical frequency scales set the pixel appears.
n_in_i 是当前像素j的灰度值为i的概率(比例), n_in_i 就是根据rarity图的直方图得到的。这个公式说的有点不明不白的,S是啥?看起来是没有归一化的rarity图中的灰度最大值。不管怎样,它的思想就是统计图像中某一灰度出现的频率,认为是某一个灰度在局部区域出现的概率。这就是该像素的注意力得分,就是Attention()。
Fig. 2中给出了一个例子,输入左图,蓝色的区域在整幅图像中出现的概率较低,那么它在稀有度图中的值就偏高。
第二阶段中,对第一阶段得到的6张map计算attention。
-------------------------------------------------------------------------------------------
第三阶段:
对第二阶段得到的6张attention map进行融合操作。
首先是通道内融合,由channal1得到的颜色特征图纹理方向图计算attention后,进行融合。融合方法为:
就是EC和map点乘。这里出现了S哦,不过它是不是公式5中的S呢,这个要看下源码,论文中没说。N=2,为啥是2,哪里来的两张图,也不明白。
 
先不管这些地方,看最后,rare2012是如何得到最终的输出的:
根据第三阶段的融合操作,三个通道的图像最后输出了三个结果。
这三个结果再融合起来,就是最后的输出了。融合的方法就是第一阶段的第三步,融合gabor后的8张图像的方法。首先计算效率系数,然后排序,然后乘权重,阈值筛选。
rare2012是由rare2007和rare2011发展而来,每一次改进都带来的一些创新。性能更好,考虑的特征更全面。俺么rare2012结果如何?
对比结果中,上面是眼动监测的结果,也就是人眼实际的聚焦情况。下面是rare2012的结果。看起来挺好的嘛。
但是rare2012有时也有完全出错的时候。fig7中后面三个数据的结果,rare2012都错了。看来注意力机制还是要引入充分合理的自上而下的逻辑判断。
不过rare2012在当年对比同类模型,还是相当有优势的。当然论文中有定量的性能和准确率分析。
 
 

Guess you like

Origin www.cnblogs.com/isYiming/p/12158923.html