Python gdal vector (shapefile) smoothing and arcpy vector file smoothing comparison, arcpy shp file smoothing time is too long

I recently encountered a problem, that is, when smoothing a shapefile (vector surface file), due to the relatively large data (my shp file is about 30 megabytes), the arcpy smoothing process is very, very slow, about 40-50 minutes, But the processing in arcgis is very fast, it only takes a few minutes. I've done various searches on the web to reduce smoothing time, and I don't see anyone having a similar problem. Later, I found the code to smooth the shp file with gdal, and wanted to use gdal to smooth the shp file, but there was a problem with the smoothing result of gdal. Let's make a smooth comparison between gdal and arcpy.

1. gdal smoothes vector surface files

code show as below:

from osgeo import ogr
from pathlib import Path
import os

def shapefile_edge_smooth(_input_shapefile, _smoothed_shapefile_output_path, _buffer_distance=0.0005):
    '''平滑矢量文件边界
    _input_shapefile:输入的矢量文件路径
    _smoothed_shapefile_output_path:平滑后输出的矢量文件路径
    _buffer_distance:平滑缓冲区距离,单位是米,可以根据影像分辨率设定,不能低于影像分辨率'''
    in_ds = ogr.Open(_input_shapefile)
    in_lyr = in_ds.GetLayer()
    # feature_number = in_lyr.GetFeatureCount()
    driver = ogr.GetDriverByName('ESRI Shapefile')
    if Path(_smoothed_shapefile_output_path).exists():
        # 如果输出的文件存在,则删除这个文件
        driver.DeleteDataSource(_smoothed_shapefile_output_path)
    out_ds = driver.CreateDataSource(_smoothed_shapefile_output_path)
    out_lyr = out_ds.CreateLayer(_smoothed_shapefile_output_path, in_lyr.GetSpatialRef(), ogr.wkbPolygon)
    def_feature = out_lyr.GetLayerDefn()

    for feature in in_lyr:
        geometry = feature.GetGeometryRef()
        _buffer = geometry.Buffer(_buffer_distance).Buffer(-1 * _buffer_distance)
        out_feature = ogr.Feature(def_feature)
        out_feature.SetGeometry(_buffer)
        out_lyr.CreateFeature(out_feature)
        out_feature = None
    out_ds.FlushCache()
    in_ds.Destroy()
    out_ds.Destroy()


if __name__ == '__main__':
    input_shp = r'D:\data\test_data\HZ\shp\HZ.shp'
    output_shp = r'D:\data\test_data\HZ\smooth_result\gdal_smoothed2.shp'
    shapefile_edge_smooth(input_shp, output_shp)

2. Arcpy smoothes vector surface files

code show as below:

# encoding=utf8
# Import system modules
import os
import arcpy
import arcpy.cartography as CA
import arcpy.management as DM
import time
arcpy.env.overwriteOutput = True


def smooth_shp(workspace_path, inshp_path, outshp_path):
    # 设置workspace路径
    tempworkspace = os.path.join(workspace_path, 'tempworkspace.gdb')
    arcpy.env.workspace = tempworkspace
    infeature_layer = os.path.join(tempworkspace, 'infeatures')

    arcpy.MakeFeatureLayer_management(inshp_path, infeature_layer)  #将shp文件转换为feature layer
    CA.SmoothPolygon(infeature_layer, outshp_path, "PAEK", 0.0005, "", "FLAG_ERRORS")  #矢量面平滑
    

if __name__ == '__main__':
    smoothed_result = r'D:\data\test_data\HZ\smoothed_result.shp'
    shp_path = r'D:\data\test_data\HZ\shp\infeature.shp'
    workspace_path = r'D:\data\test_data\HZ\shp'
    smooth_shp(workspace_path, inshp_path=shp_path, outshp_path=smoothed_result)
    

3. Comparison of gdal and arcpy smoothing results

The black line in the figure below is the patch boundary of the original data , the red line is the patch boundary smoothed by gdal , and the blue line is the patch boundary smoothed by arcpy .
insert image description here
It can be seen that the smoothing of gdal and arcpy has less jaggedness and sharp corners than the original data boundary, and it can be seen that smoothing is effective . The result boundaries of the two are not the same, which may be a problem with the built-in algorithm, so it seems that the smoothing effect of gdal and arcpy is not bad .

But is that really the case?

Look at some results for comparison
insert image description here

It can be seen that the smoothing result of gdal generates a lot of ⚪ inside the patch, and the boundary of gdal smoothing is not accurate. My smoothing unit is set to 0.0005 for both smoothing methods, exactly the same parameters. However, the smooth outer boundary of gdal is obviously much larger than the smooth boundary of arcpy, and unnecessary boundary lines will be generated inside the graph (this may be related to the fact that the gdal smoothing method is based on the buffer). Therefore, gdal smoothing is not suitable for my data with a large amount of data and complex graphics, but the results of gdal for some data with relatively simple graphics are still acceptable .

In the end, you still have to use arcpy for data smoothing.

But when calling arcpy for smoothing, it is too slow, what's going on?

4. It takes too long to call arcpy for smooth polygon

After countless experiments and parameter adjustments, plus consulting the official help documentation, I found that the function SmoothPolygon used by arcpy for smoothing needs to input the following parameters SmoothPolygon(infeature_layer, outshp_path, “PAEK”, 0.0005, “”, “FLAG_ERRORS”)

The following explains the meaning of these parameters one by one:
infeature_layer: input data set element
outshp_path: output file path
"PAEK": the algorithm used for smoothing, which can be "PAEK" or "BEZIER_INTERPOLATION"
0.0005: tolerance, generally in meters If it is about 15 is more appropriate (varies with the data), here my data is latitude and longitude, so I used a very small value.
"FLAG_ERRORS": This parameter is whether to check the topological errors in the data. arcgis10.8 has three options: "NO_CHECK", "FLAG_ERRORS" and "RESOLVE_ERRORS" respectively correspond to unchecked topological errors, check and mark topological errors and resolve topology Error, arcgis10.2 only has the first two options.

In arcgis, the default for smoothing is "NO_CHECK", that is, no topological errors are checked , so it runs very quickly. But when arcpy is called, the official document recommends "FLAG_ERRORS" , so I also wrote "FLAG_ERRORS", which makes the operation very slow when encountering particularly large data, and it will be faster when it is changed to "NO_CHECK". . It is this little problem that has troubled me for a long time. I will record it here, hoping to help everyone.

Guess you like

Origin blog.csdn.net/persist_ence/article/details/127903046