Paper Study Notes (1) -Modeling the World from Internet Photo Collections


If necessary, I am finishing the PPT can download individual centers.

Abstract

There are a lot of pictures on the internet, constitutes the largest and most diverse collection of photos. How the computer vision researchers use these images to study it? This article from the 3-dimensional modeling and visualization scene angles to explore this issue. We show a structure-from-motion and image- based rendering algorithm, the algorithm can operate on images obtained by the search keywords. We call this method ** Photo Tourism **, this approach has contributed to the reconstruction work of many world famous ruins sights. This article shows the algorithm and the result is the possession of good photos World Heritage Site (from the Internet), urban, landscape and other three-dimensional scene reconstruction first step. Finally, we also discussed the difficulties and key open questions encountered by the research team.

Introduction

Most of the world picture locations can be found online, and complete perspective of time, such as Google Maps street-level images of cities.

Network images provide a wealth of information for the site modeling world (shape modeling research), because of its wealth of perspectives and diversity, making the algorithm design of robust, able to adapt to the changing environment.

Image disorder due to its network, uncalibrated, changing, uncontrolled brightness, resolution and quality issues, and difficult to be applied by conventional computer vision. A major challenge to the pictures used in the fields of computer vision is: two images corresponding to the matching of 3D coordinates.

The wording of this idea: first review of the latest technology, and then introduce some first step in solving this problem, and we called Photo Tourism of the visual front end. Then , we are known to those put forward a set of open research questions, including the creation of a more effective response and reconstruction techniques for the large image data sets. This study is based on a 2006 article evolved, the development of new algorithms, more details see:, http://phototour.cs.washington.edu .

2 Previous Work

Over the past 20 years, 3D computer vision algorithms achieved rapid development in performance. These algorithms covers: features of the corresponding sparse reconstruction based on image modeling, image-based rendering technique, image search technique. The following subsections introduce the contents of each algorithm.

2.1 feature matching

Introduced nearly 20 years of feature matching technology, final note paper using SIFT features (Scale Invariant Feature Transform (SIFT) , Scale Invariant Feature Transform)

2.2 sparse reconstruction

** sparse reconstruction: ** matching set of features from both the reconstructed three-dimensional scene and the estimated camera position and posture. The past 20 years, there is a lot of work, work and previous work in this article are similar, but more significantly compared to previous contributions: the SfM technology and real-world images from the network. In the application SfM method, we did 4:00 changes: (1) to initialize the camera with attitude estimation parameters; rule (2) select two heuristic initialization of the image; (3) testing of every good point after reconstruction in determining whether to add the reconstructed scene; (4) calculating a focal length of the camera image from the EXIF ​​information.

2.3 image-based modeling

In recent years, such as sparse reconstruction, reconstruction model based on computer vision technology was a great attraction in the field of computer graphics, these methods are also known modeling method based on image . There are a lot of people have done a lot of good work, and in this respect, compared with their predecessors, our work emphasis is to create a smooth transition between pictures and three-dimensional model, rather than a three-dimensional model of interactive visualization; because before this job people are done.

2.4 image-based rendering

Image-Based Rendering pioneering work in the field is Aspen MovieMap project (Lippman 1980). The project acquired from a moving car Aspen Colorado city of tens of thousands of photos, reconstruct an accurate three-dimensional map of the local urban scene graph, and provides an interactive user interface. This paper work and the project is similar to, but less labor time spent. And the reconstruction of the surface of the building not as realistic IBR-related work, but this is not a problem because our initial goal is not reconstruction fidelity. Therefore, we avoid some of the challenging issues IBR areas: full face model reconstruction, lighting problems, accurate pixel interpolation problem. This allows us to unrestricted IBM and IBR method, and more freedom to operate the input picture.

2.5 image browsing, retrieval and annotation

Recently, the use of location information method to browse images and more popular. Existing systems, is to set the location information via GPS or hand manually. Our method uses existing databases and online search picture picture get, we also take advantage of sparse three-dimensional geometry and image feature matching structure of the navigation information.
Retrieval techniques we use are: Video Google, but the original technology of three-dimensional extended version.
We annotation technology enables specific goals or technology areas shift between the different images. Technology can develop its own set of annotations; may be directly introduced into an existing annotation from Flickr; migration can also be annotated.
In 2002, the system returns a transfer photos camera positions, our system is able to perform the same function, but also added: visualization, navigation, comments, and other functions.

3 Overview Overview

(This section is an overview of the context of article)
The main objective of this paper: in Objective Our IS to geometrically the Register Large Photo from the Collections at The Internet and OTHER Sources.
The main difficulties: Network Graphics unfriendly for modeling: the quality problems.
Main Solution: Reconstruction and sparse feature matching
Part IV: The method described in detail herein,
Part V: How to obtain an attractive surface
Part VI: photo explorer interface for three-dimensional image reconstruction of a scene after a user input
section VII: labeling technology transfer in multiple pictures in
part VIII: modeling the effect of 11 scenes show
part IX: challenging problems encountered by the research team.

4 Reconstructing Cameras and Sparse Geometry (camera calibration and sparse reconstruction)

Sparse reconstruction needs within the parameters of the camera, position, attitude, or absolute coordinate information. However, the picture does not have the information network, the focal length can EXIF back to the initial value, optimization solution calculation information read. Other parameters need to go through camera calibration technique to calculate.
** Calculation: ** Each image feature points marked -> feature point matching between the image -> SFM iterative process to optimize the camera parameters. Finally, we use interactive technology, the camera over the map information stored in the restored structure.

4.1Keypoint Detection and Matching (critical point detection and matching)

As used herein, SIFT features to detect and represent the feature points in each image. The SIFT scale invariance because of the good, and a local descriptor to each feature point assigned. A picture may contain thousands of SIFT feature points.
Each pair of matching feature points (I, J) image: kd-tree structure from the descriptor in J. Not determined by providing the nearest neighbor matching points selected threshold, but two points in a feature J by the nearest neighbor distance ratio of I d 1 d 2 < 0.6 \frac{d_1}{d_2}<0.6 . Determining a matching point of the feature point. (If the two images in the case of many of the features are present, the removal of these match, because there must be something wrong)
after matching feature points, RANSAC algorithm using thetwo-view geometrysubstantially matrix. In each iteration, we useeight methodset candidate calculation of the fundamental matrix. RANSAC outlier threshold = 0.6% of theRANSAC algorithm returns eight parameters of the matrix F to fine optimized by the Levenberg-Marquardt algorithm. By setting the value of the threshold, the matching point does not meet the conditions for removal of. If the total is less than 20 match points, then match these two pictures will be removed all.
After each picture to find the consistency to match the geometry of the feature points in multiple images to form a picture of the connecting track, each track contains two key points on the same graph, then it is the key removed. At least one track to include key points on the two pictures.
Matching connection point between the plurality of pictures constituting the picture communication tracks, the communication configuration set by the image of the track image communication
Here Insert Picture Description
Here Insert Picture Description

4.2 Structure from Motion (sparse reconstruction)

Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description

4.3 Geo-Registration

Here Insert Picture Description

4.4 Scene Representation

Here Insert Picture Description

5 Photo Explorer Rendering

5.1 User Interface Layout

Here Insert Picture Description

5.2 Rendering the Scene

Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description

5.3 Transitions between Photographs

Here Insert Picture Description

6 Photo Explorer Navigation

Here Insert Picture Description
Here Insert Picture Description

7 Enhancing Scenes

Here Insert Picture Description

8 Results

Here Insert Picture Description

9 Research Challenges

Here Insert Picture Description

Baidu library full translation site: https://wenku.baidu.com/view/0736a232866fb84ae45c8d6d.html

Guess you like

Origin blog.csdn.net/sinat_40624829/article/details/89857217