An article takes you to play PostGIS spatial database

1. Introduction to Spatial Database

1. What is a spatial database

Humans actually understand the world from a three-dimensional perspective, while traditional relational databases are two-dimensional. To describe spatial geographic locations, points, lines, and planes, we need a three-dimensional database, the so-called spatial database.
spatial data
postGIS is a spatial database.

2. How is the spatial database stored

In addition to the character strings, values, dates, etc. that ordinary databases have, spatial databases add spatial data types.

These spatial data types abstract and encapsulate spatial structures such as boundaries and dimensions.

It can be understood that the spatial database has many built-in attributes, which are used to describe the spatial structure information.

Taking the Point data type as an example, a point can be represented by its X and Y coordinates in a certain coordinate reference system, such as "POINT(116.4074 39.9042)" represents a point located in the center of Beijing.

Also, spatial data types are organized in a type hierarchy. Each subtype inherits the structure (properties) and behavior (methods or functions) of its supertype.
insert image description here

3. Does the spatial database have an index?

Normal databases have indexes. Spatial databases also have spatial indexes. What does it do?

Give an application chestnut to help understand. Find all shopping malls within 100m near you. If there is no spatial index, you need to exhaustively calculate according to the square root of the coordinates, and then keep all the shopping mall data whose distance is less than 100.

insert image description here

However, there is a difficulty in the index design of spatial databases: how to organize the data structure. Ordinary databases can use B+ trees, etc. There are many kinds of spatial index, grid index, quadtree index, pyramid index...

Its principle: too advanced, not displayed for now

insert image description here

4. What is the spatial function?

Two-dimensional bugs can't imagine how complicated the three-dimensional world is: analyzing geometric information, determining spatial relationships...

Spatial databases of course need professional solutions to these problems, therefore, it has built-in spatial functions.

Spatial functions are mainly divided into five categories:

Convert - functions to convert between geometry (the format in which spatial information is stored in PostGIS) and external data formats
Manage - functions to manage information about spatial tables and PostGIS organizations
Retrieve - retrieve attributes of geometries and spatial information measures Functions
for Compare – Functions that compare the spatial relationship of two geometries
Generate – Functions that generate new shapes based on other geometries

Two, PostGIS quick start

1. What is postGIS?

In fact, you should have guessed that it is a plug-in on postgreSQL, but because of it, postgreSQL has transformed into a powerful spatial database.

insert image description here

This means that this buddy is open source.

2. How to use postGIS

This tutorial is mainly to help you quickly understand what postGIS is, so it won't be too detailed.

Download and install postGreSQL and postGIS by yourself

import data fileshape file

A shapfile must have:

.shp - store the geometric information of the geographic features.shx
- store the index information of the feature geometry
.dbf - store the attribute information (non-geometric information) of the geographical features.
Optional files include:

.prj —— stores spatial reference information, that is, geographic coordinate system information and projected coordinate system information. Use the well-known text format for the description.

insert image description here
Soga! It turns out that the guide file gives you a visual display. We will answer this question!

To operate data, use SQL
, the original formula is still familiar, and it is still the original taste.

SELECT name
FROM nyc_neighborhoods
WHERE boroname = 'Brooklyn';

Metadata Management
PostGIS provides two tables for tracking and reporting geometry in the database (the contents of these two tables are equivalent to metadata)

The first table, spatial_ref_sys - defines all spatial reference systems known to the database, which will be described in more detail later.
The second table (actually view-view) geometry_columns - provides description information for all spatial data tables in the database

By querying this table, GIS clients and databases can determine what to expect when retrieving the data, and can perform any necessary projections, processing, rendering without checking every geometry - which is what metadata brings effect.

Representing real-world objects
The Simple Features for SQL (SFSQL) specification, the original guiding standard for PostGIS development, defines how to represent real-world objects. But this buddy only shows two dimensions, and PostGIS expands the representation of 3D and 4D.

In human terms, it can represent: points, line strings, polygons, and graphics collections (Collection).

Give an example of a graphics collection.

SELECT name, ST_AsText(geom)
FROM geometries
WHERE name = 'Collection';

The returned result is a collection of points and polygons.
insert image description here
It’s just the basic operation, have you learned it too?

Storage of Geometries
PostGIS supports the import and export of geometries in a variety of formats:

  • Well-known text(WKT)
  • Well-known binary(WKB)
  • Geographic Mark-up Language(GML)
  • Keyhole Mark-up Language(KML)
  • GeoJson
  • Scalable Vector Graphics(SVG)

The method of use is to call encodethe function. The following SQL query shows an example of WKB representation

SELECT encode(
  ST_AsBinary(ST_GeometryFromText('LINESTRING(0 0,1 0)')),
  'hex');

Since WKT and WKB are defined in the SFSQL specification, they cannot handle geometries in 3 or 4 dimensions. For these cases, PostGIS defines the Extended Well Known Text (EWKT) and Extended Well Known Binary (EWKB) formats for processing 3D or 4D geometries.

Data type conversion
PostgreSQL includes a short-form syntax that allows data to be converted from one type to another, the type conversion syntax:

olddata::newtype

For example, to convert a double type to a text string type:

SELECT 0.9::text;

After working for a long time, it turns out that there is a manual (book) for this thing, so you don't need to work too hard to learn it.
insert image description here

built-in function

Here is an introduction of what functions it has. Comrades have an idea in mind, and check the manual when needed.

sum(expression) aggregate to return a sum for a set of records
count(expression) aggregate to return the size of a set of records
ST_GeometryType(geometry) returns the type of the geometry
ST_NDims(geometry) returns the number of dimensions of the geometry
ST_SRID(geometry) returns the spatial reference identifier number of the geometry
ST_X(point) returns the X ordinate
ST_Y(point) returns the Y ordinate
ST_Length(linestring) returns the length of the linestring
ST_StartPoint(geometry) returns the first coordinate as a point
ST_EndPoint(geometry) returns the last coordinate as a point
ST_NPoints(geometry) returns the number of coordinates in the linestring
ST_Area(geometry) returns the area of the polygons
ST_NRings(geometry) returns the number of rings (usually 1, more if there are holes)
ST_ExteriorRing(polygon) returns the outer ring as a linestring
ST_InteriorRingN(polygon, integer) returns a specified interior ring as a linestring
ST_Perimeter(geometry) returns the length of all the rings
ST_NumGeometries(multi/geomcollection) returns the number of parts in the collection
ST_GeometryN(geometry, integer) returns the specified part of the collection
ST_GeomFromText(text) returns geometry
ST_AsText(geometry) returns WKT text
ST_AsEWKT(geometry) returns EWKT text
ST_GeomFromWKB(bytea) returns geometry
ST_AsBinary(geometry) returns WKB bytea
ST_AsEWKB(geometry) returns EWKB bytea
ST_GeomFromGML(text) returns geometry
ST_AsGML(geometry) returns GML text
ST_GeomFromKML(text) returns geometry
ST_AsKML(geometry) returns KML text
ST_AsGeoJSON(geometry) returns JSON text
ST_AsSVG(geometry) returns SVG text

In general, it is possible to: Find the area, find the boundary, find how many small polygons there are in the large polygon... Do whatever you want in the multi-dimensional world.

Spatial Relations

So far, we can only work with one geometry at a time.

Spatial databases are powerful because they can not only store geometries, but also analyze the relationship between geometries.

Questions such as "Which is the closest bike bay to the park?" or "Where are the intersections of subway lines and streets?" can only be answered by comparing and analyzing the geometries representing bike bays, streets, and subway lines.

The OGC standard defines the following set of methods for comparing geometric figures.

  • ST_Equals(geometry A, geometry B) is used to test the spatial equality of two graphics .
  • ST_Intersects, ST_Crosses, and ST_Overlaps are all used to test whether the interior of the geometry intersects .
  • ST_Touches() tests whether two geometries touch on their borders , but do not intersect on their interiors
  • ST_Within() and ST_Contains() test whether a geometry is completely contained within another geometry
  • ST_Distance(geometry A, geometry B) calculates the shortest distance between two geometries

Spatial joins
Spatial joins (spatial joins) are the main components of spatial databases, they allow you to use spatial relationships as join keys (join keys) to join information from different data tables

Rollups are also supported. The combination of JOIN and GROUP BY supports certain analyzes typically found in GIS systems. This is almost the same as a normal relational database. Not much to introduce.

Just give a chestnut and feel it.

SELECT
  n.name,
  Sum(c.popn_total) / (ST_Area(n.geom) / 1000000.0) AS popn_per_sqkm
FROM nyc_census_blocks AS c
JOIN nyc_neighborhoods AS n
ON ST_Intersects(c.geom, n.geom)
WHERE n.name = 'Upper West Side'
OR n.name = 'Upper East Side'
GROUP BY n.name, n.geom;

The above sql calculation: "What is the population density (person/square kilometer) of 'Upper West Side' and 'Upper East Side'?

3. PostGIS advanced gameplay

So far, everything is mediocre, and then I will introduce advanced gameplay.
insert image description here

1. Spatial index

Spatial indexes are one of the greatest values ​​of PostGIS. In the previous example, building a spatial join required comparing entire tables against each other. This is expensive: Joining two tables of 10,000 records each (with no indexes on each) would require 100,000,000 comparisons; with spatial indexes, this could be as low as 20,000 comparisons

There are not many BBs to create and delete indexes. You all will.

CREATE INDEX nyc_census_blocks_geom_idx
ON nyc_census_blocks
USING GIST (geom);

Note: The USING GIST clause tells PostgreSQL to use the generic index structure (GIST-generic index structure) when building the index.

principle

First answer a question, what does the spatial index do?

Improve query efficiency.

How does the spatial index improve query efficiency?

A standard database index creates a tree structure based on the values ​​of the columns being indexed. Spatial indexes are slightly different, because the database does not index the value of the geometry field - that is, the geometry object itself, we index the bounding box of the extent of the feature.

insert image description here

In the image above, the number of lines intersecting the yellow star is 1, which is the red line. But there are 2 range boxes that intersect with the yellow box, red and blue.

The database solves the question "what line intersects the yellow star" by first using the spatial index to solve the question "what bounding box intersects the yellow bounding box" (very quickly), and then "what line intersects the yellow star" . The above process is only for the spatial elements of the first test.

For a large number of data tables, this "two-pass method" of indexing first and then local accurate calculation can fundamentally reduce the amount of query calculation.

To put it simply, the calculation of regular geometric figures is simpler than that of irregular figures , and this is used as the basic idea of ​​optimization.

The most commonly used functions in PostGIS (ST_Contains, ST_Intersects, ST_DWithin, etc.) all include automatic index filters. But some functions (such as ST_Relate) do not include index filters.

To perform a bounding box search using an index (i.e. a pure index query - Index only Query - without filters), you need to use the "&&" operator

Query Planner: Do You Use Indexes?
The PostgreSQL query planner intelligently chooses when to evaluate queries with or without spatial indexes. Counterintuitively, it is not always faster to perform a spatial index search.

By default, PostgreSQL periodically collects data statistics for use by the query planner. However, if you change the composition of the table within a short period of time, the statistics will not be up to date. Therefore, to ensure that the statistics match the table contents, it is wise to run the ANALYZE command manually after loading and deleting bulk data from the table.

ANALYZE nyc_census_blocks;

Vacuuming: Reclaiming space
Whenever a new index is created or after a large number of updates, inserts, or deletes to a table, vacuuming (VACUUMing) must be performed. The VACUUM command asks PostgreSQL to reclaim any unused space left in table pages by updates or deletions of records.

VACUUM ANALYZE nyc_census_blocks;

2. Data projection

2.1 Introduction to Data Projection

The earth is not flat, and there is no easy way to put it on a flat paper map (or on a computer screen), so people have come up with all kinds of ingenious solutions (projections).

Each projection scheme has advantages and disadvantages, some projections retain area characteristics; some projections retain angle characteristics, such as Mercator projection (Mercator); some projections try to find a good intermediate mixed state, only a few parameters. small distortion. What all projections have in common is that they transform (the Earth) into a flat Cartesian coordinate system.

Using projections is particularly simple, and PostGIS provides the ST_SRID(geometry) and ST_SetSRID(geometry, SRID) functions.

Compare data
The comparison of coordinates needs to be based on its SRID (strictly speaking, it should be a spatial reference system). If it is not the same reference system, the comparison is meaningless and an error will be returned, such as the following example.

SELECT ST_Equals(
         ST_GeomFromText('POINT(0 0)', 4326),
         ST_GeomFromText('POINT(0 0)', 26918)
         );

Converting Data
Data can be converted between different SRIDs.

SELECT ST_AsText(
 ST_Transform(
   ST_SetSRID(geom,26918),
 4326)
)
FROM geometries;

2.2 Geographic coordinates

insert image description here

It is very common to have coordinates in the form of "geographics", that is, data in the form of "latitude/longitude". Geographic coordinates are not Cartesian plane coordinates.

If your data is geographically compact (contained within states, counties, or cities), use a Cartesian-based geometry type. Otherwise, use Geography that uses spherical coordinates. The reason for this is purely a matter of mathematical accuracy and performance, which will not be explained here.

The coordinate data type has been introduced earlier, so let’s review it here.

SELECT code, ST_X(geog::geometry) AS longitude FROM airports;

3. Geometry creation function

All the functions we've seen so far work on existing geometry and return a result. A "geometry creation function" takes a geometry as input and outputs a new shape.

3.1 Replacing form with dots

A common need when composing spatial queries is to replace polygonal features with point representations of features. This is useful for spatial joins, since using St_Intersects(geometry, geometry) on two polygon layers often results in double counting: a polygon lying on the boundary of both polygons will intersect polygons on both sides, Replacing it with a point will force it to be on one side or the other, rather than intersecting polygons on both sides

insert image description here

  • ST_Centroid(geometry) – Returns a point approximately on the centroid of the input geometry. This simple calculation is very fast, but sometimes it is not desirable because the return point is not necessarily on the feature itself. If the input geometry is concave (geometry shaped like the letter 'C'), the returned centroid may not be inside the shape.
    insert image description here

  • ST_PointOnSurface(geometry) – returns a point guaranteed to be within the input polygon. Computationally, it is much more expensive than centroid operations.

3.2 Buffer

Buffer manipulation is common in GIS workflows and is also possible in PostGIS. ST_Buffer(geometry, distance) accepts a geometry and a buffer distance as arguments, and outputs a polygon whose boundaries are at the same distance from the input geometry as the input buffer distance. Just use the ST_Buffer function.
insert image description here

3.3 Overlapping and merging

Another classic GIS operation - overlay - creates new geometry by computing the intersection of two overlapping polygons.

Use the ST_Intersection(geometry A, geometry B) function.

ST_Union merges two geometries

4. Validity of Geometry

The polygon is not necessarily valid. If it is invalid, a TopologyException error will be reported. Below are some validity rules.

Rings of polygons must be closed
Inner rings must lie inside outer
rings Rings cannot intersect themselves (they cannot touch each other nor cross)
Rings cannot touch other rings except at a point

The first two are mandatory. The latter two are optional. You can also customize self-consistent rules.

The ST_IsValid(geometry) function can be used to check the validity of geometry.

Invalid graphics can be fixed, the bad news is: there is no 100% sure way to fix invalid geometry.

A good tool for visually fixing invalid geometry is OpenJump (http://openjump.org), which includes a validator under Tools->QA->Validate Selected Layers.

insert image description here

Now for the good news: A large percentage of defects can be fixed in a database using any of the following methods:

ST_MakeValid function
ST_Buffer function

5. Equality of geometric figures

Determining equality can be difficult when dealing with geometry. PostGIS supports three different functions and operators that can be used to determine different levels of equality.

Exact Equality (ST_OrderingEquals)
Exact equality is determined by comparing the vertices of two geometries one by one in order to ensure that they are identical in position. Even equality is considered unequal if the vertices are defined in a different order.

Spatial Equality (ST_Equals)
The function of ST_Equals can be used to test the spatial equality or equivalence of geometric figures. Whether it's the direction in which the polygon is drawn, the definition of the polygon's starting point, or the difference in the number of points it contains doesn't matter here. What matters is that the polygons contain the same region of space, they are equal.

Bounding box equality (~=)
For faster comparisons, the bounding box equality operator ' ~= ' is provided. This only operates on the bounding box (rectangle), ensuring that the geometries occupy the same 2D extent, but not necessarily the same space. It is not necessarily accurate, but it can be used for rough screening first, and then combined with other methods for fine selection. First thick and then thin.

6. Linear referencing

Linear referencing is a method of representing features that can be described by referring to an underlying linear feature. Common examples of modeling using linear referencing include:

Road assets, which are represented in miles along the road network.
Road maintenance operations, which occur along a road network between a pair of mile measurements.
Aquatic inventory, where the presence of fish is recorded as a distance between upstream and downstream locations.
The hydrological characteristics of a river, referenced from one point of the river to another.

What exactly is linear referencing?

Linear referencing is a method of storing geographic locations using relative positions along measured linear features.

Can't understand?

You always know the auxiliary line. In fact, linear reference can be understood as an auxiliary line, and the calculation of other positions is the relative position of the auxiliary line. For example, you use your elder brother's height as a reference to calculate how much taller you are than him, and judge whether you have grown taller (provided that your elder brother does not grow taller, mistake~)

For details, see the following example.
insert image description here
The figure below is the actual application of linear reference in the traffic network, and the red one is the linear reference.
insert image description here

A linear reference can be created with the following syntax.

SELECT ST_LineLocatePoint('LINESTRING(0 0, 2 2)', 'POINT(1 1)');

7. DE9IM model

"Dimensionally Extended 9-Intersection Model" (DE9IM) is a framework for modeling how two spatial objects interact.

First, each spatial object has:

interior (interior)
boundary (boundary)
exterior (exterior)

Even line segments and points have interior, exterior and boundaries.

For line segments: interior is the portion of the line bounded by the endpoints; boundary is the endpoints of a linear feature; exterior is all other parts of the plane except the interior and boundary.
insert image description here

For points, it's even stranger: the interior is the point, the boundary is the empty set, and the exterior is everything else on the plane except points.

Using these definitions of interior, exterior, and boundary, the relationship between any pair of spatial features can be characterized by the dimensions of nine possible intersections of interior/boundary/outside/between a pair of features.

insert image description here
Note that there is a parameter above dim, and the rule is: for the polygons in the above example, the intersection of the interiors is a two-dimensional area, so the corresponding part of the matrix is ​​filled with "2". The boundaries intersect only at zero-dimensional points, so the corresponding matrix parts are filled with "0".

Take another chestnut.
insert image description here
The DE9IM matrix about their intersection is as follows:

insert image description here
Note that the borders of the above two features don't actually intersect at all (the endpoints of the line intersect the interior of the polygon, not the border of the polygon, or vice versa), so cells B/B are filled with "F".

The SQL to generate the DE9IM model matrix is ​​as follows.

SELECT ST_Relate(
         'LINESTRING(0 0, 2 0)',
         'POLYGON((1 -1, 1 1, 3 1, 3 -1, 1 -1))'
);

The power of DE9IM matrices lies not in generating them, but in using them as matching parameters to find geometries that have a specific relationship to each other.

Suppose we have a data model of lakes (Lakes) and docks (Docks), and further assume that docks must be located inside the lake and must touch the boundary of the lake at one end. Can we find all docks matching this rule in the database?

It can be done with the following sql, and the derivation process will not start.

SELECT docks.*
FROM docks JOIN lakes ON ST_Intersects(docks.geom, lakes.geom)
WHERE ST_Relate(docks.geom, lakes.geom, '1FF00F212');

8. Index cluster

One way to speed up data access is to ensure that records that may be retrieved together in the same result set are located in close physical locations on the hard disk. This is called "clustering".

Clustering based on spatial indexes makes sense for spatial data that will be accessed by spatial queries: similar things tend to have similar locations (first law of geography).

Write a sql and create an index cluster.

-- Cluster the blocks based on their spatial index
CLUSTER nyc_census_blocks USING nyc_census_blocks_geom_idx;

Divided into two categories:

  • Clustering based on R-Tree
  • Clusters on GeoHash
    insert image description here
    can deeply understand both algorithms on their own.

9. 3-D

So far we've been dealing with 2-D geometries (two-dimensional geometry), with only X and Y coordinates. But PostGIS supports additional dimensions for all geometry types, and for each coordinate, additionally a "Z" dimension for height information and an "M" dimension for adding additional information (usually time, road miles, or distance information).

There are a number of functions for computing relationships between 3D objects that
insert image description here
can even be extended to ND if you wish.

10. Nearest field search

KNN is a neighbor search method based on pure spatial index. I won't expand here, you know that there is such an algorithm.
insert image description here

11. Use triggers to track historical edits

A common requirement for production databases is the ability to track the history of user edits to data: how did the data change between two dates, who did it, and what changed in them? Some GIS systems track users' editing data operations by including change management functionality in the client interface, but this adds complexity to the client-side editing tool.

Using the database and the trigger mechanism of the database, any table can be tracked for editing history, so that the client can maintain simple "direct editing" of the edited table (the client is not responsible for tracking the edit history function, only responsible for CRUD).

12. ST_MakeEmptyRaster function for creating an empty raster

ST_MakeEmptyRaster is used to create an empty raster with no cell value (no band), and each parameter is used to define the metadata of this empty raster:

width, height - the number of columns and rows of the grid
upperleftx, upperlefty - the coordinates of the upper left corner of the grid in the corresponding spatial coordinate system
scalex, scaley - the width and length of a single cell (the unit is equivalent to that of the spatial reference coordinate system unit).
skewx, skewy —— rotation angle, if the raster data is facing north, the value is 0. The default value is 0.
srid —— The spatial reference coordinate system, which is set to 0 by default.
pixelsize - the width and length of a single pixel. When scalex and scaley are equal, you can directly use this parameter to set the pixel size.

The effect is as follows
insert image description here

Guess you like

Origin blog.csdn.net/qq_41708993/article/details/129990365