Global Google - Microsoft Open Global Building Dataset

Global Google-Microsoft Open Architecture Dataset
This dataset integrates Google V3 Open Architecture and Microsoft's latest architectural footprints, containing a staggering 2,534,595,270 footprints. As of September 2023, it has become the most comprehensive open data set. The dataset covers 92% of level 0 administrative boundaries, divided into 182 divisions. Each footprint is clearly labeled with its source, indicating whether it comes from Google or Microsoft. The dataset is accessible in cloud-native geospatial formats such as GeoParquet, FlatGeobuf, and PMTiles, providing a powerful resource for a variety of applications. More details, including comprehensive information and methods for this dataset, can be found here and here respectively. Source Cooperative

Dataset schema
Using a national-level data set, each row in the data set provides information on the floor space of a specific building, and relevant information is provided on each column. Preface – Artificial Intelligence Tutorial

This dataset combines Google's V3 Open Architecture and Microsoft's latest architectural footprint. To our knowledge, with 2,534,595,270 footprints as of September 2023, this dataset is the most complete publicly available dataset. It covers 92% of the administrative boundaries of Level 0 and is divided into 182 sub-districts. Each footprint is tagged with its respective source (Google or Microsoft). The dataset is accessible through cloud-native geospatial formats such as GeoParquet, FlatGeobuf, and PMTiles.

The original Google V3 Open Architecture can be downloaded as a compressed CSV file from this link. Here are some key details about the original dataset:

The dataset contains 1.8 billion building detections covering an inferred area of ​​58 million square kilometers in Africa, South Asia, Southeast Asia, Latin America and the Caribbean.

Each building in the dataset has a polygon that defines its footprint on the ground, a confidence score indicating how certain we are that it is a building, and a plus code that corresponds to the center of the building. There is no information about the building type, street address, or any other details other than the building's geometry.

The latest version of Microsoft Building Footprints can be downloaded from Microsoft Planetary Computer as a gzip-compressed partition file.

The Microsoft Global Open Building Dataset was generated through Bing Maps, which detected a total of 1.24 billion buildings. The buildings were identified using images from Bing Maps, which includes data collected from 2014 to 2023 and includes images from Maxar, Airbus and France's IGN.

boundary_id (INTEGER): Concatenates the CGAZ level 0 boundary ISO to an integer unique ID used to partition the dataset in BigQuery.
Confidence (FLOAT): A measure of the model's confidence in the accuracy of the building footprint. Since the original dataset does not have this attribute, the Microsoft-provided footprint sets this column to empty.
bf_source (string): Indicates the source of the footprint - Google or Microsoft.
area_in_meters (FLOAT): Represents the area of ​​the polygon in square meters.

Dataset Citation

Please cite the original citations from source dataset including date of access of the combined dataset for citation here is a sample citation

Google-Microsoft Open Buildings - combined by VIDA, Date Accessed: [Insert the date you accessed the webpage in the format YYYY-MM-DD]


Earth Engine Snippet

 These datasets were collected from country-level geoparquet files, only a subset of which are mentioned below, but an earthengine ls should provide more information for all countries. The format of all feature sets is

projects/sat-io/open-datasets/VIDA_COMBINED/"3 letter country ISO code"

for example India would be

var ind = ee.FeatureCollection("projects/sat-io/open-datasets/VIDA_COMBINED/IND")


    fillColor: '00000000',
    color: '#964B00',
  }),{},'Combined Buildings India'

Sample code:

Earth Engine App: global-buildings


The data is shared under the Creative Commons Attribution (CC BY-4.0) license and the Open Data Commons Open Database License (ODbL) v1.0 license. As the user, you can pick which of the two licenses you prefer and use the data under the terms of that license.

Contact information: VIDA has provided contact information and if you'd like more information about the dataset or the processing steps, feel free to write an email to [email protected].

Provided by: VIDA, Google, Microsoft

Curated in GEE by: Samapriya Roy

Last updated in GEE: 2023-11-28


Guess you like