Zonal Statistics of Raster Data Based on Python

This article uses examples to explain in detail how to use Python to perform partition statistics on raster data, pay attention to the official account GeodataAnalysis, and reply to 20230401 to obtain sample data and code, including the writing ideas of the code of this tool.

A Zonal Statistics operation is a statistical operation that calculates cell values ​​for a raster (value raster) within an area defined by another dataset. First, divide the rasterization into multiple regions according to the vector data, extract the pixel values ​​of each region and perform statistical calculations separately, and then output the results (generally directly output in a certain field of the input vector).

ArcGIS has raster zonal statistical tools, but it is not flexible enough and can only perform statistical calculations such as maximum and minimum values. This article introduces the raster data partition statistics tool based on the Python open source library rasterioand geopandasimplementation, which can realize the partition statistics function more flexibly and meet diverse needs.

Let's see how to use it first. The whole tool is encapsulated into a Python class. To use it, you need to create a class instance first. The initialization parameter is the file path of the raster and vector, such as:

zs = ZonalStatistics(tif_path, shp_path)

zonal_statisticsTo perform partition statistics, you only need to call the method of the class instance . This method has four parameters, and the specific meanings are as follows:

  • zone_field: String, mandatory parameter, specify which field of the vector data to output to, if there is no such field, a new field will be created.
  • statistics_type: String, the algorithm of statistical calculation. funcIt is a required parameter when it is empty. The optional algorithms are min, max, mean, , std, sum, range, medianwhich respectively represent the minimum value, maximum value, average value, standard deviation, sum, range, and median .
  • func: User-defined function, statistics_typewhen it is empty, it is a mandatory parameter. It should be noted that the input of this function during operation is numpya mask array.
  • all_touched: Boolean value, optional parameter, if it is True, all pixels touched by the vector's geometric features will participate in the calculation, if it is False, only the pixels whose center is inside the polygon will participate in the calculation.

An example code for calling this function is as follows:

zs.zonal_statistics('min', 'min')
zs.zonal_statistics('max', 'max')
zs.zonal_statistics('range', func=lambda x: np.ma.max(x)-np.ma.min(x))

Saving the result is also very simple, because this class uses geopandasvector data to operate, and its attribute gdfis one GeoDataFrame, and the result can be directly saved through it, as shown in the following example:

zs.gdf.to_file('./result/zonal_statistics.shp', encoding='utf-8')

The calculation results are as follows, displayed by the maximum value field:

insert image description here

ZonalStatisticsThe code of the class is as follows for reference:

class ZonalStatistics(object):

    def __init__(self, raster_path, shp_path) -> None:
        self.raster_path = raster_path
        self.gdf = gpd.read_file(shp_path)
        self._init_params(self.raster_path)
        
    def _init_params(self, raster_path):
        src = rio.open(raster_path)
        self.transform = src.transform
        self.crs = src.crs
        self.shape = src.shape
    
    def _geometry_mask(self, src, geometries, all_touched=False):
        if isinstance(src, rio.DatasetReader):
            pass
        elif isinstance(src, str):
            src = rio.open(src)
        else:
            raise ValueError

        if not isinstance(geometries, (tuple, list)):
            raise ValueError

        geometry_mask = features.geometry_mask(
            geometries=geometries,
            out_shape=src.shape,
            transform=src.transform,
            all_touched=all_touched,
            invert=True)
        
        return geometry_mask
    
    def _valid_range(self, mask):
        mask_col = np.any(mask, axis=0)
        mask_row = np.any(mask, axis=1)
        col_index = np.array(np.where(mask_col)[0])
        min_col, max_col = min(col_index), max(col_index)
        row_index = np.array(np.where(mask_row)[0])
        min_row, max_row = min(row_index), max(row_index)

        return (min_row, max_row), (min_col, max_col)
    
    def _read_from_geometry(self, geometries, all_touched=False):
        src = rio.open(self.raster_path)
        mask = self._geometry_mask(src, geometries, all_touched)
        (min_row, max_row), (min_col, max_col) = self._valid_range(mask)
        window = Window.from_slices(rows=(min_row, max_row+1), 
                                    cols=(min_col, max_col+1))

        geom_array = src.read(1, window=window)
        geom_mask = ~mask[min_row:max_row+1, min_col:max_col+1]
        nodata_mask = (geom_array == src.nodata)
        nan_mask = np.isnan(geom_array)
        geom_array = np.ma.masked_array(geom_array, 
                                        geom_mask | nodata_mask | nan_mask)

        return geom_array
    
    def _statistics(self, geom_array, statistics_type):
        if statistics_type == 'min':
            return np.ma.min(geom_array)
        elif statistics_type == 'max':
            return np.ma.max(geom_array)
        elif statistics_type == 'median':
            return np.ma.median(geom_array)
        elif statistics_type == 'sum':
            return np.ma.sum(geom_array)
        elif statistics_type == 'std':
            return np.ma.std(geom_array)
        elif statistics_type == 'range':
            return np.ma.max(geom_array)-np.ma.min(geom_array)
        elif statistics_type == 'mean':
            return np.ma.mean(geom_array)
        else:
            raise ValueError
    
    def zonal_statistics(self, zone_field, statistics_type=None, 
                         func=None, all_touched=False):
        for i, geom in enumerate(self.gdf.geometry.to_list()):
            geom_array = self._read_from_geometry([geom], all_touched)

            if (isinstance(func, type(None)) and 
                isinstance(statistics_type, type(None))):
                raise ValueError
            
            if isinstance(func, type(None)):
                value = self._statistics(geom_array, statistics_type)
            else:
                value = func(geom_array)
            
            if isinstance(value, type(np.ma.masked)):
                continue
            else:
                self.gdf.loc[i, zone_field] = value

Guess you like

Origin blog.csdn.net/weixin_44785184/article/details/129904388