[New UAV dataset] From pedestrian re-identification to UAV target positioning

Paper title : University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization
Paper address : https://arxiv.org/abs/2002.12186
Code address : https://github.com/layumi/University1652 -Baseline
dataset download : fill in the Request and send it to [email protected]

Introduction (relationship with pedestrian re-identification):

  • With the development of drones, target positioning from the perspective of drones is a basic task. The location of the target building can be judged through the combination of drone perspective images and satellite images.
  • The main difficulty is consistent with the pedestrian re-identification task, which is image matching across perspectives. Cross-camera matching in pedestrian re-identification tasks, and vertical viewpoint matching in drone localization tasks (street view <-> drone <-> satellite)
  • Pedestrian re-identification is currently developing relatively well, and the data set is also highly rated by everyone; while the task of geo-localization has just begun, the matching difficulty is relatively high, and there is still a lot of room to do it.
  • Pedestrian re-identification has some considerations in the privacy policy, and the biological information of the human body is collected; while the architectural positioning of drones is relatively less problematic in terms of scientific research ethics/privacy.

Main task description:

  • Task 1 - Target positioning from the drone's perspective (Drone-> Satellite): Given a picture or video from the drone's perspective, this task is to find the most similar satellite image. Satellite images often have GPS, so you can Object localization in man-machines.
  • Task 2 - Drone Navigation (Satellite->Drone): Given a map from the perspective of a satellite, the drone tries to find where it flew (the map from the perspective of the drone). If you find it, follow the flight history and fly back to complete a navigation operation.

data collection:

  • We used the wiki to find the building names of 72 universities, and removed the squares, campuses, and some places that could not be found on Google Maps. The figure below shows the top 100 building names (https://en.wikipedia.org/wiki/Category:Buildings_and_structures_by_university_or_college)
  • We use google earth to simulate the image from the perspective of the drone, such as the following video, approaching the building in a spiral manner

  • At the same time, for each building, we also collected satellite images and google map street view images.
  • Previous datasets often only collect ground and satellite image pairs. We provide drone perspective images as an intermediate medium. At the same time, drones can reduce the occlusion of trees and make it easier to match with satellite images. (The following table is a comparison of training set)
  • The statistics of our data set are as follows: (training and testing are 33 and 39 universities, a total of 72 universities, no overlap)

Data license:

  • We follow Google's official Guidelines (https://www.google.com/permissions/geoguidelines/) for research release
  • At the same time, according to some previous projects such as Tokyo 24/7 and CVUSA and other data sets, the school mailboxes are used to publish data.

Benchmarks:

  • It mainly uses the instance loss of my previous article. This article was published in November 2017. Recently, it won the ACM TOMM 2020 for image-text mutual search. If you are interested, you can use it to classify tens of thousands of categories. . Use CNN to classify 100,000 images (https://zhuanlan.zhihu.com/p/33163432
    )
  • The main idea is to share the weight of the final classification layer, but the previous feature extraction networks are still separate.
    The pytorch code is at https://github.com/layumi/University1652-Baseline/blob/master/model.py#L230-L253. The front model can be different, and the final classifier uses the same one.
  • Provide a baseline, on the one hand to verify the validity of the data set, on the other hand to provide a basic code for everyone to modify.

Experimental results:

The experimental results are verified in several aspects:

  1. Is the drone's perspective better than street view positioning, because there are fewer occlusions, and at the same time, another advantage of the drone is that it can take pictures of the roof. Experiments have verified this point.

  2. Is the feature we learned better than the general feature learned from a large data set.

  3. Qualitative results: (UAV target positioning on the left; UAV navigation on the right)

  4. Can our model be used in real drone videos?
    We divided two experiments, the real UAV image search our simulated UAV image:

Search satellite images with real drone images:

it can be seen that it is still very work.

  1. Compare several commonly used baselines, including contrastive loss, triplet loss, etc.:

  2. Instance loss on other datasets (both use VGG16):

  3. Migrate to traditional small image retrieval datasets:

    where Fs is the subnetwork of satellite images + UAV images, and Fg is the subnetwork of ground images. We guess that Fs learns vertical changes, and Fg learns horizontal changes. Therefore, for traditional building datasets, it is better to shoot on the ground or the ground network Fg.

Finally, we have provided some samples in the data set, you can click to have a look ~
code address: https://github.com/layumi/University1652-Baseline

【Explore drone images】

Explore satellite images】

Explore street view images

Thank you for reading, welcome to discuss~~

Guess you like

Origin blog.csdn.net/Layumi1993/article/details/104679167