【论文】LearningDepth from Single Monocular Images

2005 NIPS


The article uses Markov Random Fields (Markov Random Fields, MRF) to directly estimate the depth information of the image from a single image.
Different from the RGBD input data, the article uses YCbCr data + depth data.
The purpose of using MRF is to fuse local and global information on a single map.

feature extraction

The use of convolution kernel

In order to extract the text information, the author used 15 convolutions and applied them to the Y channel (intensity channel) of YCbCr, and used the first Laws's mask convolution kernel (calculated average) on the two color channels, so a total of 17 feature vector. At the same time, absolute energy and sum squared energy are used for calculation, so there are 34 eigenvectors in total.
Among the 15 convolution kernels, there are 9 Laws' masks and 6 boundary detections.
Please add a picture description

Multiscale multi-scale feature extraction

In order to fuse global features, the author uses three scales. Among them, scale1x is a high-resolution feature, and scale9x is a low-resolution feature. Also consider the four neighbors around each patch. At the same time, considering that landscapes such as trees have vertical features, the column where the patch is located is divided into four vertical patches. For each patch (C0), a total of 3*5+4=19 patch features are fused.Please add a picture description

Considering 34 feature vectors, a total of 19*34 is calculated for each patch.

relative depth of features

In the above image, consider two adjacent patches of x and y, and calculate whether they belong to the same object or different objects. For the output (absolute) of 17 filters, the histogram quantization of 10 bins is used. Judging whether they belong to the same object from 170 bins.

Model

The authors used two models, Gaussian MRF (1) and Laplacians MRF (2). Please add a picture description
Please add a picture description
Compare Gaussian Distribution, Laplacians Distribution
insert image description here

  1. A histogram of relative depths of features is a natural Laplacians distribution.
  2. has wider tails and thus is more robust to outliers and outliers in depth estimation.
  3. The results also demonstrate that the depth estimated using the Laplacians model has sharper edges.

in conclusion

feature extraction

  1. using multiscale and column features significantly improves the algorithm’s performance.

The error caused by the data set

  1. Some of the errors can be attributed to errors or limitations of the training set. For example, the training set images and depthmaps are slightly misaligned, and therefore the edges in the learned depthmap are not very sharp.
  2. Further, the maximum value of the depths in the training set is 81m; therefore, far-away objects are all mapped to the one distance of 81m.

Guess you like

Origin blog.csdn.net/yaoyao_chen/article/details/130489721