Simple classification of K-means based on Python

  For the classification of K-means, I use jupyter notebook , which is more convenient and more visible .
  Unsupervised classification of satellite data with Python requires GDAL , Numpy and Sklearn . If you view the data, you also need Matplotlib :

import numpy as np
from sklearn import cluster
from osgeo import gdal, gdal_array
import matplotlib.pyplot as plt

# 让GDAL抛出Python异常,并注册所有驱动程序
gdal.UseExceptions()
gdal.AllRegister()

1. Classification on a single frequency band

# 栅格文件读取,使用spot遥感影像
# 获取第一个波段数据
# 将img_ds转化为一个numpy数组,数组的形状为(6000, 6000)
# 可以使用print(img.shape)检查
img_ds = gdal.Open('E:/BaiduNetdiskDownload/spot_pan.tif', gdal.GA_ReadOnly)
band = img_ds.GetRasterBand(1)
img = band.ReadAsArray()
# 将数据展平为行(未知长度),并将列的值保持为1
X = img.reshape((-1,1))

# 对数据运行k-means分类器
# 选择6个分类集群
# 拟合到定义的数据X中
# 给拟合的结果分配一个新变量X_Cluster
# 重新调整原始图像的尺寸
# 此分类过程耗时比较长
k_means = cluster.KMeans(n_clusters=6)
k_means.fit(X)

X_cluster = k_means.labels_
X_cluster = X_cluster.reshape(img.shape)

# 可视化数据 
plt.figure(figsize =10,10))
plt.imshow(X_cluster,cmap = “ hsv”)
plt.show()

  Classification results:
  4 clusters (better results)
Insert picture description here

  6 clusters:
Insert picture description here

2. Classification on all frequency bands

# 栅格文件读取
img_ds = gdal.Open('E:/BaiduNetdiskDownload/spot_pan.tif', gdal.GA_ReadOnly)

# 将多波段图像加载到numpy中(最快的方法)
img = np.zeros((img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount),
               gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType))

for b in range(img.shape[2]):
    img[:, :, b] = img_ds.GetRasterBand(b + 1).ReadAsArray()

new_shape =(img.shape [0] * img.shape [1],img.shape [2])

# spot有4个波段,将列重塑保持为4
X = img[:, :, :4].reshape(new_shape)

k_means = cluster.KMeans(n_clusters=4)
k_means.fit(X)
X_cluster = k_means.labels_
X_cluster = X_cluster.reshape(img[:, :, 0].shape)

plt.figure(figsize=(10,10))
plt.imshow(X_cluster, cmap="hsv")
plt.show()

  The result is shown in the figure below: The
4 band classification images
  visible effect is better than a single band. One of the real benefits of this code is how to change the classifier. Therefore, if you use the Mini-Batch K-Means clustering algorithm after the data is loaded , you only need to change one line.
  The K-Means algorithm is a commonly used clustering algorithm, but its algorithm itself has certain problems, such as theCalculation time is too long. For this reason, Mini Batch K-Means , a variant clustering algorithm based on K-Means came into being. Generally, when the sample size is greater than 10,000 for clustering, the Mini Batch K-Means algorithm needs to be considered.

Insert picture description here

MB_KMeans = cluster.MiniBatchKMeans(n_clusters=4)
MB_KMeans.fit(X)

X_cluster = MB_KMeans.labels_
X_cluster = X_cluster.reshape(img[:, :, 0].shape)

  The result is shown in the figure below:
Insert picture description here
  This is a faster implementation of K-Means, but there may be more noise.


3. Save the classification

  Finally, you need to save the classification results:

ds = gdal.Open("E:/Download/spot_pan.tif")
band = ds.GetRasterBand(1)
arr = band.ReadAsArray()
[cols, rows] = arr.shape

format = "GTiff"
driver = gdal.GetDriverByName(format)


outDataRaster = driver.Create("E:/Download/k_means.gtif", rows, cols, 1, gdal.GDT_Byte)
outDataRaster.SetGeoTransform(ds.GetGeoTransform()) 
outDataRaster.SetProjection(ds.GetProjection()) 


outDataRaster.GetRasterBand(1).WriteArray(X_cluster)

outDataRaster.FlushCache()
del outDataRaster

Insert picture description here

Guess you like

Origin blog.csdn.net/amyniez/article/details/113244805