Concepts related to Visual saliency detection

Visual saliency detection refers to simulating human visual characteristics through intelligent algorithms and extracting salient areas (ie, areas of interest to humans) in images.

Visual Attention Mechanism (VA), that is, when faced with a scene, humans automatically process regions of interest and selectively ignore regions of no interest. These regions of interest are called saliency regions. As you can see, the four people in the picture are the most noticeable when looking at this image.

There are two strategies for the human visual attention mechanism:

1) Bottom-up based on data -driven attention mechanism

Driven only by perceptual data, guides a person's viewpoint to salient regions in a scene; regions that typically have strong contrast with or are significantly different from their surroundings attract bottom-up attention. Using the color, brightness, edge and other features of the image, the difference between the target area and its surrounding pixels is judged, and then the saliency of the image area is calculated. The image below shows bottom-up attention. The light gray bars in column 1 and the vertical bars in column 2 are immediately noticeable.

2) Top-down attention mechanism based on task -driven goals

Determined by human "cognitive factors", such as knowledge, expectations, and current goals. The saliency of image regions is calculated for specific features of the image. The figure below shows top-down attention. Under the monitoring task, the human body in the scene can attract attention.

In the fields of robotics and computer vision, researchers are increasingly interested in visual attention mechanisms that identify the most relevant parts from large amounts of visual data. Therefore, in recent years, based on the feature synthesis theory and psychological models of attention such as Guided search, researchers have proposed a large number of computational attention selection models to simulate the human visual attention mechanism. These models include: human visual attention models based on cognition, Bayesian, decision theory, information theory, graphical models, frequency domain analysis, and pattern classification based.

1. Cognitive Attention Model
Itti proposed a saliency-based visual attention model in 1998, and further improved the model theory in Nature in 2001. Itti's saliency model is the most representative, which has become the standard for bottom-up visual attention models. Its basic structure is shown in the figure below.

For an input image, the model extracts primary visual features: color (RGBY), brightness, and orientation, uses a center-surround operation at multiple scales to generate feature maps that embody saliency measures, and merges these feature maps After obtaining the final saliency map, use the winner-take-all competition mechanism in biology to obtain the most salient spatial position in the image, which is used to guide the selection of the attention position, and finally adopts the return inhibition. (Inhibition of return) method to complete the transfer of focus.

2. Decision Theory Attention Model
The decision theory view holds that an evolving perceptual system can produce optimal decisions about the surrounding environment in the sense of decision theory. The point is that visual attention should be driven by optimality relative to the current task. Decision-theoretic attention models can express both bottom-up and top-down attention. And it has been successfully applied in computer vision, such as classification and attention localization prediction, and achieved high accuracy.

3. The attention model of frequency domain analysis is
based on the saliency model of spectrum analysis, which is simple in form, easy to interpret and implement, and has achieved great success in attention focus prediction and salient region detection. To meet real-time requirements, the operation speed can be increased by nearly 10 times compared with similar iNVT models. The fly in the ointment is that its biological plausibility is not very clear.

4. Graph Theory Attention Model
A graph model is a probabilistic framework that uses graphs to represent the structure of conditional dependencies between random variables. This type of attention model treats eye movements as a time series. Since there are a large number of latent variables that affect the generation of eye movements, this type of attention model uses methods such as Hidden Markov Model, Dynamic Bayesian Network and Conditional Random Field. Graph models can model complex attention mechanisms and thus achieve better predictive power. The disadvantage is the high complexity of the model, especially when it comes to training and readability.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324726385&siteId=291194637