Getting started with PostGIS in five minutes [spatial database]

In this article, we'll cover some basics of PostGIS and its capabilities, as well as some hints and tricks you can use to simplify your solution or improve performance.

insert image description here

Recommendation: Use NSDT Designer to quickly build programmable 3D scenes.

In short - PostGIS is a Postgres extension that adds support for storing and manipulating spatial data types. When we build software applications that store, manipulate, and visualize data on maps, we often need to use spatial data stores. We could use Google Maps or a similar application as a good example of the functionality of typical geospatial visualization software.

Before we can use PostGIS functionality, we need to install the extension in Postgres:

CREATE EXTENSION IF NOT EXISTS postgis;

1. Spatial data type

PostGIS supports several different types of (geo)spatial data types. Most of them have "cousins" in the field of graphic design. But unlike graphics software, where object coordinates are relative to a screen or a piece of paper, geospatial coordinates refer to points on the Earth's surface. This makes it possible to present such objects on a map, but also to analyze their interactions. This will come in handy when we start using spatial objects and manipulations to solve real world problems.

1.1 Vector - Vectors

Similar to graphic design software, space vector data supports basic geometric shapes such as points, linestrings, and polygons. In addition to basic geometries, PostGIS also supports some more advanced geometries:

  • Multiple versions of primitive geometry - homogenous collections of points, linestrings, or polygons
  • 3D version of the base geometry - same base geometry with added Z coordinate
  • geometry collections - collections of arbitrary geometries, homogeneous or heterogeneous
  • Polyhedral Surfaces - complex 3D surfaces

Mapping and navigation applications rely heavily on vector objects to simulate the characteristics of maps. Looking at the screenshot below, most objects on Google Maps can be represented as polygons (such as buildings) or points (such as businesses) or lines (such as roads). When viewing a map in 3D, buildings are usually represented as multipatch surfaces.
insert image description here

To create a table with the "geometry" data type, we can run the following statement:

CREATE TABLE building (
	id UUID PRIMARY KEY,
	geom geometry
);

This will create a table with a "geom" column of type geometry, which is a common type for all vector objects. Think of it as a base class in the OOP world. This means we can combine points, lines, polygons and other vector objects in the same column. If we know in advance which geometries we will be dealing with, we can specify this as part of the column type definition. In this case, PostGIS will not allow other geometry types to be inserted in the same column. This is always the preferred way of storing data, since some operations expect the geometries to be of the same type.

CREATE TABLE building (
	id UUID PRIMARY KEY,
	geom geometry(Polygon)
);

Additionally, we can include SRID (Spatial Reference Identifier) ​​in the column type definition, forcing all values ​​to conform to the same SRID. SRIDs will be covered in more detail later.

CREATE TABLE building (
	id UUID PRIMARY KEY,
	geom geometry(Polygon,4326)
);

1.2 Rasters - Rasters

The spatial raster data type is also similar to its graphic design cousins ​​(JPEG, PNG, TIFF, and other raster files we use in our daily lives), with some differences.

Unlike regular rasters, where a pixel is a point on the screen or paper, a spatial raster has a spatial resolution that defines the pixel's width and height. Thus, each pixel of the spatial raster covers a uniformly sized rectangle on the map.
A spatial raster has one or more bands, and each band has a matrix of all "pixel" values. The data type for each band is set individually and can be almost any numeric type - binary (useful for masking), integer or floating point values. In a way, it's a generalization of the 24-bit RGB rasters we're used to in the graphic design world. The spatial equivalent of a 24-bit RGB raster is a 3-band raster, where each band is defined as an unsigned 8-bit integer. However, with the flexibility to store any value other than color, we can use rasters to store all sorts of information—terrain elevation, population density, vegetation information or indicators, and so on.
insert image description here

Raster data support is included in a separate postgis extension, which needs to be installed before we can use it:

CREATE EXTENSION IF NOT EXISTS postgis_raster;

We can then create a table using the raster type:

CREATE TABLE satellite_image (
	id UUID PRIMARY KEY,
	rast raster
);

1.3 Point cloud

Point cloud data formats can be seen as a hybrid between raster and vector. It is somewhat similar to a raster, representing a discrete data set, consisting of individual points rather than shapes. However, unlike a raster, it has no resolution or density, so points can be located anywhere in 3D space. Compare a point cloud to a vector type - it's similar to a collection of 3D vector points.

Point cloud data is typically obtained from LiDAR, 3D scanners, or similar devices that measure the physical properties of objects in 3D space. When visualized, it looks similar to the image below. Trees (or any other objects) look like continuous 3D objects, but they are all made of discrete points in space.

insert image description here

Point cloud support is included in a separate postgis extension, which needs to be installed before we can use rasters:

CREATE EXTENSION pointcloud;
CREATE EXTENSION pointcloud_postgis;

2. Space operation

When dealing with "regular" non-spatial data, we often join and filter tables based on exact values ​​in columns containing raw values ​​representing object identifiers (integers, strings, or possibly UUIDs). This is due to the nature of the problems we typically solve in relational databases. A typical query on a non-spatial dataset might look like this:

SELECT *
FROM book b
INNER JOIN publisher p ON p.id = b.publisher_id;

or this:

SELECT *
FROM book b
WHERE b.publisher_id = 12345;

However, with spatial data, we typically don't have real-world use cases that require us to filter spatial objects by equality or join tables by matching spatial objects using an equality comparer. If we think about how it works when using the Google Maps application - zooming, panning, clicking on objects, we can deduce that the most common operation on spatial data is intersection. Whenever we pan or zoom the map, the system needs to figure out which objects should be fetched from storage and rendered on screen. This is usually done by intersecting the object with a rectangle representing the visible part of the map. The following query finds buildings that intersect a given rectangle on the map:

SELECT *
FROM building b
WHERE ST_Intersects(b.geom, ST_MakeEnvelope(24, 47, 25, 48, 4326));

Another common operation is distance calculation, which is often used to determine which objects are near a given point on a map.

3. Spatial index

When indexing raw values, databases usually use Hash or B-Tree to build the index. This approach cannot be applied here due to the differences in the operations normally used for spatial data. The spatial index needs to be constructed in a way that allows us to efficiently find spatial objects from the collection of spatial objects that intersect with a given spatial object.

To solve this problem, the spatial index uses an R-Tree (“R” in “Rectangle”) structure, which builds a tree of rectangles where each child node rectangle is contained within a parent node rectangle. The leaves of the tree are rectangles that represent the bounding boxes of spatial objects in PostGIS columns.
insert image description here

This way, we can quickly traverse the tree to find which objects intersect a given object, instead of checking whether each object intersects. This reduces the time complexity of the filtering operation from O(N) to O(logN).

The SQL command to create a spatial index is very similar to "regular" index creation:

CREATE INDEX building_geom_idx ON building USING GIST(geom);

The only difference here is the "GIST" part, which signals to PostGis that we need to use the "Generic Index Structure" for this index. PostGIS supports three spatial indexes (GIST, SPGIST, and BRIN), but in most cases, GIST is a good choice.

It's worth noting that spatial indexes can also be used with raster data, since we often need to find related rasters quickly. The same syntax can be applied to raster columns, but in this case we are indexing a bounding box around the raster image, so the statement needs to include the ST_ConvexHull function.

CREATE INDEX satellite_image_rast_idx ON satellite_image USING GIST(ST_ConvexHull(rast));

As with any index, there is a performance tradeoff when inserting objects into the database because PostGIS needs to insert new objects into the R-Tree index. But whenever we plan to use spatial operations, we should consider adding indexes to the columns used in the query, as it will significantly improve performance.

4. Coordinate system identification - SRID

SRID (spatial reference identifier) ​​is important information that we need to assign attributes to each spatial object. It includes information about the coordinate system, where the (0, 0) point is on the globe, the resolution of the coordinates, and how coordinates on the map correspond to actual points on the globe.

The most commonly used SRID is WGS84 — SRID 4326 is used for GPS tracking, Google Maps, and many other applications, but there are many more SRIDs that are popular, and some provide higher accuracy than WGS84 in some parts of the world. So we always need to know the SRID of the data coming into the system.

PostGIS is very flexible when it comes to SRIDs. In the example above, we created a table "building" that contained a geometry column with no SRID specified. This means that PostGIS will allow polygons with any SRID to be inserted. This is sometimes useful, or even necessary, in situations where we cannot predict or change the SRID of incoming data, but should be avoided if possible.

Spatial columns can also have a predefined SRID, which forces all objects in that column to use the specified SRID.

CREATE TABLE building (
	id UUID PRIMARY KEY,
	geom geometry(Polygon, 4326)
);

The first reason to use a uniform SRID on all objects is that spatial queries require objects with the same SRID, and will fail if we try to intersect objects with different SRIDs:

SELECT ST_Intersects(
	ST_MakeEnvelope(24, 47, 25, 48, 4979), 
	ST_MakeEnvelope(24, 47, 25, 48, 4326)
);

The following error will be prompted:

ERROR:  ST_Intersects: Operation on mixed SRID geometries (Polygon, 4979) != (Polygon, 4326)

There is a workaround for this problem, but it leads to the next disadvantage. Whenever we have mismatched SRIDs, we can convert one spatial object to another object's SRID.

SELECT ST_Intersects(
	ST_Transform(ST_MakeEnvelope(24, 47, 25, 48, 4979), 4326), 
	ST_MakeEnvelope(24, 47, 25, 48, 4326)
);

In this query, ST_Transform transforms all coordinates from the source SRID to the target SRID and outputs a polygon with SRID 4326 that can intersect another polygon without error.

However, this approach comes with a performance penalty. First, this transformation takes some time. More importantly, we will not be able to use spatial indexes to improve the performance of ST_Intersects operations, because spatial indexes apply to geometries in the original SRID, not transformed geometries in the destination SRID. The query execution plan will need to perform a table scan on the first table to determine which objects intersect with objects in the second table, after conversion to the target SRID.

One way to handle this is to perform a ST_Transform on all objects as they are inserted into the database, always maintaining consistency across SRIDs. This has many benefits, but it's worth noting that object conversions are not always precise, and we lose some precision when converting from one SRID to another. If precision is critical to your software, it might be a good idea to store both original and transformed objects in the database and use them interchangeably.

5. Conclusion

This article provides a brief introduction to PostGIS, what it is, some of the spatial data types and operations it supports, and some real-world problems that can be solved with PostGIS. We also covered spatial indexes, which are the first stop for optimal performance. Hope it helps to climb the steep learning curve into the world of GIS.


Original Link: PostGIS Quick Start - BimAnt

Guess you like

Origin blog.csdn.net/shebao3333/article/details/130572972