For the classification of K-means, I use jupyter notebook , which is more convenient and more visible .
Unsupervised classification of satellite data with Python requires GDAL , Numpy and Sklearn . If you view the data, you also need Matplotlib :
import numpy as np
from sklearn import cluster
from osgeo import gdal, gdal_array
import matplotlib.pyplot as plt
# 让GDAL抛出Python异常,并注册所有驱动程序
gdal.UseExceptions()
gdal.AllRegister()
1. Classification on a single frequency band
# 栅格文件读取,使用spot遥感影像
# 获取第一个波段数据
# 将img_ds转化为一个numpy数组,数组的形状为(6000, 6000)
# 可以使用print(img.shape)检查
img_ds = gdal.Open('E:/BaiduNetdiskDownload/spot_pan.tif', gdal.GA_ReadOnly)
band = img_ds.GetRasterBand(1)
img = band.ReadAsArray()
# 将数据展平为行(未知长度),并将列的值保持为1
X = img.reshape((-1,1))
# 对数据运行k-means分类器
# 选择6个分类集群
# 拟合到定义的数据X中
# 给拟合的结果分配一个新变量X_Cluster
# 重新调整原始图像的尺寸
# 此分类过程耗时比较长
k_means = cluster.KMeans(n_clusters=6)
k_means.fit(X)
X_cluster = k_means.labels_
X_cluster = X_cluster.reshape(img.shape)
# 可视化数据
plt.figure(figsize =(10,10))
plt.imshow(X_cluster,cmap = “ hsv”)
plt.show()
Classification results:
4 clusters (better results)
6 clusters:
2. Classification on all frequency bands
# 栅格文件读取
img_ds = gdal.Open('E:/BaiduNetdiskDownload/spot_pan.tif', gdal.GA_ReadOnly)
# 将多波段图像加载到numpy中(最快的方法)
img = np.zeros((img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount),
gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType))
for b in range(img.shape[2]):
img[:, :, b] = img_ds.GetRasterBand(b + 1).ReadAsArray()
new_shape =(img.shape [0] * img.shape [1],img.shape [2])
# spot有4个波段,将列重塑保持为4
X = img[:, :, :4].reshape(new_shape)
k_means = cluster.KMeans(n_clusters=4)
k_means.fit(X)
X_cluster = k_means.labels_
X_cluster = X_cluster.reshape(img[:, :, 0].shape)
plt.figure(figsize=(10,10))
plt.imshow(X_cluster, cmap="hsv")
plt.show()
The result is shown in the figure below: The
visible effect is better than a single band. One of the real benefits of this code is how to change the classifier. Therefore, if you use the Mini-Batch K-Means clustering algorithm after the data is loaded , you only need to change one line.
The K-Means algorithm is a commonly used clustering algorithm, but its algorithm itself has certain problems, such as theCalculation time is too long. For this reason, Mini Batch K-Means , a variant clustering algorithm based on K-Means came into being. Generally, when the sample size is greater than 10,000 for clustering, the Mini Batch K-Means algorithm needs to be considered.
MB_KMeans = cluster.MiniBatchKMeans(n_clusters=4)
MB_KMeans.fit(X)
X_cluster = MB_KMeans.labels_
X_cluster = X_cluster.reshape(img[:, :, 0].shape)
The result is shown in the figure below:
This is a faster implementation of K-Means, but there may be more noise.
3. Save the classification
Finally, you need to save the classification results:
ds = gdal.Open("E:/Download/spot_pan.tif")
band = ds.GetRasterBand(1)
arr = band.ReadAsArray()
[cols, rows] = arr.shape
format = "GTiff"
driver = gdal.GetDriverByName(format)
outDataRaster = driver.Create("E:/Download/k_means.gtif", rows, cols, 1, gdal.GDT_Byte)
outDataRaster.SetGeoTransform(ds.GetGeoTransform())
outDataRaster.SetProjection(ds.GetProjection())
outDataRaster.GetRasterBand(1).WriteArray(X_cluster)
outDataRaster.FlushCache()
del outDataRaster