Python geographic data processing 3: vector data reading and writing (1)

1. Vector data

  Geographic features with clear boundaries, such as cities, can be well represented by vector data. However, continuous data (such as elevation data) will not work. If you are in a mountainous area, it is very difficult to draw a polygon on the boundary of all areas with the same elevation. However, different polygons can be used to distinguish different altitude ranges. Many kinds of data are very suitable to be represented by vectors. For example, for the elements on a road map, roads are represented by line segments, cities and counties are represented by polygons, and cities are represented by points or polygons depending on the scale of the map. All elements on the map may be expressed by points, lines, and polygons.
  Vector data is very suitable for making maps, but there are some shortcomings. For example, when drawing and displaying, how to zoom in and out can achieve a better display effect.

Coastline Paradox The
  British mathematician Lewis Fry Richardson was the first person to measure the coastline of land. But the measurement process is not so easy, because the final measurement result depends entirely on the selected scale. For example, a wide coastline with multiple straits has a road beside it. Imagine driving along this road, using the car’s odometer to measure the distance, and then getting out of the car and walking back on the way it came. However, when walking, it follows the edge curve of the strait, while the road is not. It's easy to figure out that the journey is farther than driving, because there are many detours. The same principle applies to measuring the entire coastline, because if you measure in smaller increments, you can measure more changes. When measuring the British coastline, the final length measured using a 50-kilometer increment is 600 kilometers longer than the 100-kilometer increment.

  Shapefile is a general format for storing vector data. But it is not a separate file. It needs at least 3 binary files, including: 1 master file ( .shp ), 1 index file ( .shx ), storage geometry information, and 1 dBASE ( .dbf ) table to store attribute data, and need Make sure that all three files are stored in the same folder.
1 Shapefile file  Another format is GeoJSON , which are plain text files and can be used in any text editor. A GeoJSON contains only 1 file, storing all necessary information.
For example: online geojson data format map
Insert picture description here
GeoJSON file format
manually draw map

2. OGR

  The OGR simple feature library is a part of the Geospatial Data Abstraction Library (GDAL), which is a very popular open source library for reading and writing spatial data. The OGR part of GDAL has the function of reading and writing many different vector data formats . OGR also allows you to create and manipulate the geometric shapes of features, edit their attribute values, filter vector data based on attribute values ​​or spatial locations, and it also provides data analysis capabilities.
  The GDAL library was originally written in C and C++, but it is bound to several other languages, including Python, so although these codes are not rewritten in Python, it is used in Python to use the GDAL/OGR library Provides an interface. Therefore, if you want to use GDAL in Python, you need to install the GDAL library and the corresponding Python bindings.

OGR class structure
  This data source contains multiple sub-layer objects, and each layer represents a dataset in the data source.Shapefile contains only one dataset (one layer), But SpatiaLite contains more than one. No matter how many data sets there are in a data source, each data set is considered a layer by OGR.
  A data source consists of one or more layers.
  In the attribute table in Arcgis, each row represents a feature, and each column represents an attribute field:

Attribute table

2.1 ogrinfo

  It is used to output the information of vector data supported by OGR.
  The errors that appear are:
Insert picture description here
  Solution : Move ogr_FileGDB.dll in the osgeo\gdalplugins folder to the osgeo folder.
mobile
  View parameter information:
Insert picture description here
  View supported formats (part):
Insert picture description here
  Not only tells you which drivers are included in the OGR version, but also tells you whether each driver supports read and write operations.
  You can use Python to determine which drivers are available, and use the interactive environment (IDLE) for detection. First, import the OGR module in the osgeo package, and then use ogr.GetDriverByName to find a specific driver:

>>> from osgeo import ogr
>>> driver = ogr.GetDriverByName('GeoJSON')  # GeoJSON不需要区分大小写
>>> print(driver)
<osgeo.ogr.Driver; proxy of <Swig Object of type 'OGRDriverShadow *' at 0x0000013B3CBE8840> 

  Error demonstration:

>>> driver = ogr.GetDriverByName('shapefile') # 正确的名字为:Esri shapefile
>>> print(driver)
None

2.2 Upgrade pip command (supplement)

  1. Do not directly win+R and then cmd, but choose "Start"-"Windows System"-"Command Prompt"-Right click "Run as Administrator"! ! !
python -m pip install --upgrade pip;
  1. After running the command prompt as an administrator, useMirror download and upgrade!!!
python -m pip install --upgrade pip -i https://pypi.douban.com/simple

update successed
延迟问题:raise ReadTimeoutError(self._pool, None, ‘Read timed out.’)

pip install --index-url https://pypi.douban.com/simple <model>
 
如:pip install --index-url https://pypi.douban.com/simple opencv-python

or:

pip --default-timeout=100 install -U pip

Mirror download: (fast, stable, effective in pro-test)

  1. Tsinghua mirror: https://pypi.tuna.tsinghua.edu.cn/simple
  2. Ali: http://mirrors.aliyun.com/pypi/simple
  3. University of Science and Technology of China: http://pypi.mirrors.ustc.edu.cn/simple
pip install -i http://pypi.douban.com/simple --trusted-host pypi.douban.com numpy

2.3 ospybook 1.0-Python visualization of geographic data

  1. Advantage: It can help you visualize data without opening other software programs
  2. Disadvantages: poor interactivity

installation method:

  1. Installation package: under the ospybook-1.0 folder (download link: http://manning.com/garrard/?a_aid=geopy&a_bid=c3bae5be )
  2. To install, locate the setup.py directory and open the command line to run:python setup.py build
  3. Run again:python setup.py install

 Output the list of available drivers in the ospybook module:

>>> import ospybook as pb  #  使用ospybook模块
>>> pb.print_drivers()     #  输出可用的驱动列表
ESRIC (readonly)
FITS (read/write)
PCIDSK (read/write)
netCDF (read/write)
PDS4 (read/write)
VICAR (read/write)
JP2OpenJPEG (readonly)
JPEG2000 (readonly)
PDF (read/write)
MBTiles (read/write)
BAG (read/write)
EEDA (readonly)
OGCAPI (readonly)
DB2ODBC (read/write)
ESRI Shapefile (read/write)
MapInfo File (read/write)

3. Vector data reading

  Open a dataset in Shapefile format by ArcGIS, which contains a global dataset.
ne_50m_populated_places.shp
Attribute data sheet
  Output through Python:

import sys
from osgeo import ogr

fn = r'E:\Google\GIS\osgeopy data\global\ne_50m_populated_places.shp'
ds = ogr.Open(fn, 0) # ds = data source,0:表示以只读模式打开文件,1或True:表示以编辑模式打开
if ds is None:  # 确保shapefile文件不为空,可正常打开
    sys.exit('Could not open {0}.'.format(fn))
lyr = ds.GetLayer(0) # 获取图层索引,从0开始,不提供参数时,默认返回第1个图层

i = 0  # 从数据源中取回第1个图层,并遍历此图层中的前5个要素
for feat in lyr:

    
    pt = feat.geometry() # 获得几何对象
    x = pt.GetX()        # 获得坐标位置
    y = pt.GetY()

    # 获得属性值
    name = feat.GetField('NAME')
    pop = feat.GetField('POP_MAX')
    # pop = feat.GetFieldAsString('POP_MAX')  #  数据转换
    # pop = feat.GetFieldAsInteger('POP_MAX')
    print(name, pt, pop, x, y)
    i += 1
    if i == 5:
        break
del ds  # 删除ds变量,强制关闭文件
Bombo POINT (32.5332995248648 0.583299105614628) 75000 32.533299524864844 0.5832991056146284
Fort Portal POINT (30.2750016159794 0.671004121125236) 42670 30.27500161597942 0.671004121125236
Potenza POINT (15.7989964956403 40.6420021300982) 69060 15.798996495640267 40.642002130098206
Campobasso POINT (14.6559965589219 41.562999118644) 50762 14.655996558921856 41.56299911864397
Aosta POINT (7.31500259570618 45.7370010670723) 34062 7.315002595706176 45.7370010670723

3.1 Access to specific elements

  Method: View the specific offset value of the feature, that is, the feature number (FIDs). The offset value starts from 0 and is used to indicate the location of the feature in this data set.
  Get the last feature in the layer:

>>> num_features = lyr.GetFeatureCount()
>>> last_feature = lyr.GetFeature(num_features - 1)
>>> print(last_feature.NAME)
Hong Kong

  Current element: Use the ResetReading() function call

import os
import sys
from osgeo import ogr
data_dir = r'E:\Google chrome\Download\GIS with python\osgeopy-data\osgeopy-data\osgeopy-data-washington\osgeopy-data'

fn = os.path.join(data_dir, 'Washington', 'large_cities.geojson')
ds = ogr.Open(fn, 0)
lyr = ds.GetLayer(0)
print('First loop')
for feat in lyr:
    print(feat.GetField('Name'), feat.GetField('Population'))

print('Second loop')
lyr.ResetReading() # This is the important line.
for feat in lyr:
    pt = feat.geometry()
    print(feat.GetField('Name'), pt.GetX(), pt.GetY())

3.2 View data

3.2.1 View properties

  Use the print_attributes function to output attribute value information:

print_attributes(lyr_or_fn, [n], [fields], [geom], [reset] )
  1. lyr_or_fn can be a layer or a data source path. If it is a data source, the first layer is used.
  2. n is an optional value, used to set the number of output records, all values ​​are output by default.
  3. fields is an optional value, used to set the list of attribute fields included in the output result, including all fields by default.
  4. geom is an optional boolean value, used to set whether to output the geometric feature type, the default is True.
  5. reset is an optional boolean value, used to set whether to reset to the first record before outputting the value, the default is true.

  The names and populations of the first 3 cities in the output file:

>>> import ospybook as pb
>>> fn = r'E:\Google chrome\Download\GIS with python\osgeopy-data\osgeopy-data\osgeopy-data-global\osgeopy-data\global\ne_50m_populated_places.shp'
>>> pb.print_attributes(fn, 3, ['NAME', 'POP_MAX'] )

FID    Geometry                  NAME           POP_MAX    
0      POINT (32.533, 0.583)     Bombo          75000      
1      POINT (30.275, 0.671)     Fort Portal    42670      
2      POINT (15.799, 40.642)    Potenza        69060      
3 of 1249 features

  The pb.print_attributes() function can be used to view the attribute information of a small amount of data, but not to view the information of big data

3.2.2 Drawing spatial data

  ospybook contains classes that can visualize data space, so it involves Python's matplotlib module. To plot and display data, you need to create a new instance of the VectorPlotter class. In interactive mode, the drawing data will be displayed immediately; when not in interactive mode, after drawing the data, you need to call the draw function.
  plot function:

plot(self, geom_or_lyr, [symbol], [name], [kwargs])
  1. geom_or_lyr is a feature object, layer, or path to a data source. If it is a data source, the first layer in the data source will be drawn and displayed
  2. symbol is an optional value, used to set the symbol style of geometric elements
    1. fill=False: hollow polygon
    2. "Bo": blue circle
    3. "Rs": square
    4. "B-": Blue line
    5. "R–": dotted line (each unit is horizontal)
    6. "G": dotted line (each unit is vertical)
  3. name is an optional value, used to set a name for the data so that it can be accessed later
  4. kwargs is an optional value, which is specified by keywords. kwargs is often used as an abbreviation for an indeterminate number of keyword arguments

matplotlib

Map drawn by matplotlib:

>>> import os
>>> os.chdir(r'E:\Google chrome\Download\global') #  更改工作目录,可直接使用该文件夹下的文件名,而不需要从新键入整个目录
>>> from ospybook.vectorplotter import VectorPlotter
>>> vp = VectorPlotter(True)  #  创建一个交互式绘图面板
>>> from matplotlib.pyplot import *     # 此处需要导入matplotlib模块进行绘图
>>> vp.plot('ne_50m_admin_0_countries.shp', fill=False)  # fill参数使文件用空心多边形表示
>>> vp.plot('ne_50m_populated_places.shp', 'bo')  # bo表示蓝色圆圈
>>> vp.draw()

Show results:
Insert picture description here

Guess you like

Origin blog.csdn.net/amyniez/article/details/113061835