基于随机森林的冰湖提取算法（python语言实现）

一、背景

这一篇本身是和《基于双峰阈值分割的冰湖提取算法（python语言实现）》一起做的。而随机森林比阈值分割麻烦一点，就是需要先验知识作为训练数据。训练数据我也同时一并上传了，可以参见：随机森林冰湖提取的训练数据_随机森立水质提取python-电信文档类资源-CSDN下载

然后算法本身也不必多说，随机森林也是很成熟的算法了，原理什么的就不多说了。下面直接看代码。

二、代码

随机森林稍微复杂一点点，因为需要先验数据来训练模型。但是相对于阈值分割来说，它的优势在于不用做DN转TOA，无脑训练就可以了。

def random_forest(img, train_img, train_mask, txt_path):
    # 参考https://zhuanlan.zhihu.com/p/114069998
    ######################################## 首先对随机森林进行训练 ##############################################

    """ 获取训练数据 """
    file_write_obj = open(txt_path, 'w')
    # 获取水体样本
    count = 0
    for i in range(train_img.shape[0]):
        for j in range(train_img.shape[1]):
            #  水体类别在标签图中像元值为1
            if (train_mask[i][j] == 255):
                var = ""
                for k in range(train_img.shape[-1]):
                    var = var + str(train_img[i, j, k]) + ","
                var = var + "water"
                file_write_obj.writelines(var)
                file_write_obj.write('\n')
                count = count + 1
    # 获取背景样本
    Threshold = count
    count = 0
    for i in range(60000):
        X_random = random.randint(0, train_img.shape[0] - 1)
        Y_random = random.randint(0, train_img.shape[1] - 1)
        #  非水体类别在标签图中像元值为0
        if (train_mask[X_random, Y_random] == 0):
            var = ""
            for k in range(train_img.shape[-1]):
                var = var + str(train_img[X_random, Y_random, k]) + ","
            var = var + "non-water"
            file_write_obj.writelines(var)
            file_write_obj.write('\n')
            count = count + 1
        if (count == Threshold):
            break

    file_write_obj.close()

    """ 训练随机森林 """
    # 读取相应的txt
    from sklearn.ensemble import RandomForestClassifier
    from sklearn import model_selection

    #  定义字典，便于来解析样本数据集txt
    def Iris_label(s):
        it = {b'water': 1, b'non-water': 0}
        return it[s]

    path = r"data.txt"
    SavePath = r"model.pickle"

    #  1.读取数据集
    data = np.loadtxt(path, dtype=float, delimiter=',', converters={7: Iris_label})

    #  2.划分数据与标签
    x, y = np.split(data, indices_or_sections=(7,), axis=1)  # x为数据，y为标签
    x = x[:, 0:7]  # 选取前7个波段作为特征
    train_data, test_data, train_label, test_label = model_selection.train_test_split(x, y, random_state=1,
                                                                                      train_size=0.9, test_size=0.1)

    #  3.用100个树来创建随机森林模型，训练随机森林
    classifier = RandomForestClassifier(n_estimators=100,
                                        bootstrap=True,
                                        max_features='sqrt')
    classifier.fit(train_data, train_label.ravel())             # ravel函数拉伸到一维

    #  4.计算随机森林的准确率
    print("训练集：", classifier.score(train_data, train_label))
    print("测试集：", classifier.score(test_data, test_label))

    #  5.保存模型
    # 以二进制的方式打开文件：
    file = open(SavePath, "wb")
    # 将模型写入文件：
    pickle.dump(classifier, file)
    # 最后关闭文件：
    file.close()

    """ 模型预测 """

    RFpath = r"model.pickle"
    SavePath = r"save.png"

    ################################################调用保存好的模型
    # 以读二进制的方式打开文件
    file = open(RFpath, "rb")
    # 把模型从文件中读取出来
    rf_model = pickle.load(file)
    # 关闭文件
    file.close()
    ################################################用读入的模型进行预测
    #  在与测试前要调整一下数据的格式
    data = np.zeros((img.shape[2], img.shape[0] * img.shape[1]))
    # print(data[0].shape, img[:, :, 0].flatten().shape)
    for i in range(img.shape[2]):
        data[i] = img[:, :, i].flatten()
    data = data.swapaxes(0, 1)
    #  对调整好格式的数据进行预测
    pred = rf_model.predict(data)
    #  同样地，我们对预测好的数据调整为我们图像的格式
    pred = pred.reshape(img.shape[0], img.shape[1]) * 255
    pred = pred.astype(np.uint8)

    #  保存到tif
    gdal_array.SaveArray(pred, SavePath)

三、实验结果

这里给出一个实验结果：

这个算法本来是和双峰阈值分割算法一起做比较的，（在不适用任何辅助数据，比如DEM），可以看到，名下你的随机森林要比双峰阈值分割效果好很多，起码可以将大部分冰川过滤掉。而双峰阈值分割则受限于阈值的选择，如果阈值选的不好，那冰川就会被提取出来。做大范围的冰湖提取时，难以做到用一个统一的阈值来提取冰湖，这时候双峰阈值分割算法的劣势就会被放大。而随机森林则相对能好一些，可避免冰川的影响，但是也有一点点小问题，就是一些融水，河流也会被提取出来。

基于随机森林的冰湖提取算法（python语言实现）

一、背景

二、代码

三、实验结果

猜你喜欢