Julia机器学习实战——使用Random Forest随机森林进行字符图像识别

文章目录

0 Preface
1 加载数据
2 训练随机森林（train RF）
3 完整代码

0 Preface

1 加载数据

安装需要使用到的包：

using Images
using DataFrames
using Statistics #use mean(), sum()... function
using DataFrames
using CSV

	注：如果没有安装包，使用以下脚本安装

import Pkg
Pkg.add([PKG NAME]) #例如：Pkg.add("Images")

读取图片文件数据，并返回矩阵

function read_data(type_data, labelsInfo, imageSize, path)
    x = zeros(size(labelsInfo, 1), imageSize)
    for (index, idImage) in enumerate(labelsInfo.ID)
        nameFile = "$(path)/$(type_data)Resized/$(idImage).Bmp"
	    img = load(nameFile)
        temp = float32(img)
        temp = Gray.(temp)
        x[index, :] = reshape(temp, 1, imageSize)
    end
    return x
end

解释：

float32(): 将其中的值转化为浮点数
Gray.(): 将RGB图像转化为灰度图像
reshape(): 在这里做的是平铺工作

设置图像大小以及项目路径：

imageSize = 400
path = "..."

读取训练数据Label

labelsInfoTrain = CSV.read("$(path)/trainLabels.csv")

读取训练数据Label
读取训练图像数据：

xTrain = read_data("train", labelsInfoTrain, imageSize, path)

读取训练图像数据
读取测试数据Label:

labelsInfoTest = CSV.read("$(path)/sampleSubmission.csv")

读取测试数据Label

读取测试图像数据：

xTest = read_data("test", labelsInfoTest, imageSize, path)

读取测试图像数据

2 训练随机森林（train RF）

训练：

model = build_forest(yTrain, xTrain, 20, 50, 1.0)

解释：

$3(20)：number of features chosen at each random split
$4(50): number of trees
$5(1.0): ratio of subsampling

获得测试结果：

predTest = apply_forest(model, xTest)

转化预测结果：

labelsInfoTest.Class = Char.(predTest)

写入文件：

CSV.write("$(path)/predTest.csv", labelsInfoTest, header=true)

四折交叉验证：

accuracy = nfoldCV_forest(yTrain, xTrain, 20, 50, 4, 1.0);
println("4 fold accuracy: $(mean(accuracy))")

3 完整代码

using Images
using DataFrames
using Statistics 
using DataFrames
using CSV
using DecisionTree

function read_data(type_data, labelsInfo, imageSize, path)
    x = zeros(size(labelsInfo, 1), imageSize)
    for (index, idImage) in enumerate(labelsInfo.ID)
        nameFile = "$(path)/$(type_data)Resized/$(idImage).Bmp"
        img = load(nameFile)
        temp = float32(img)
        temp = Gray.(temp)
        x[index, :] = reshape(temp, 1, imageSize)
    end
    return x
end


imageSize = 400
path = "/Users/congying/cyWang/projects/julia/kaggleFirstStepsWithJulia/all"
labelsInfoTrain = CSV.read("$(path)/trainLabels.csv")
xTrain = read_data("train", labelsInfoTrain, imageSize, path)
labelsInfoTest = CSV.read("$(path)/sampleSubmission.csv")
xTest = read_data("test", labelsInfoTest, imageSize, path)
yTrain = map(x -> x[1], labelsInfoTrain.Class)
yTrain = Int.(yTrain)


model = build_forest(yTrain, xTrain, 20, 50, 1.0)
predTest = apply_forest(model, xTest)
labelsInfoTest.Class = Char.(predTest)
CSV.write("$(path)/juliaSubmission.csv", labelsInfoTest, header=true)
accuracy = nfoldCV_forest(yTrain, xTrain, 20, 50, 4, 1.0);
println("4 fold accuracy: $(mean(accuracy))")

Julia机器学习实战——使用Random Forest随机森林进行字符图像识别

文章目录

0 Preface

1 加载数据

2 训练随机森林（train RF）

3 完整代码

猜你喜欢