Improved Random Forest Regression Algorithm Based on Seagull Algorithm-with Code
Article Directory
Abstract: In order to improve the regression prediction accuracy of random forest data, the seagull search algorithm is used to optimize the parameters of the number of trees and the minimum number of leaf points in the random forest.
1. Dataset
The data information is as follows:
data.mat contains input data and output data
The input data dimension is: 2000*2
The output data dimension is 2000*1
So the data input dimension of the RF model is 2; the output dimension is 1.
2. RF model
For random forest, please refer to relevant machine learning books.
3. RF based on seagull algorithm optimization
The specific principle of the seagull search algorithm refers to the blog: https://blog.csdn.net/u011835903/article/details/107535864
The optimization parameters of the seagull algorithm are the number of trees in the RF and the minimum number of leaf nodes. The fitness function is the mean square error (MSE) of RF on the training set and test set, and the lower the mean square error MSE, the better.
finteness = MSE [ predict ( train ) ] + MSE [ predict ( test ) ] finteness = MSE[predict(train)] + MSE[predict(test)]finteness=MSE[predict(train)]+MSE[predict(test)]
4. Test results
The data division information is as follows: The number of training sets is 1900 groups, and the number of test sets is 100 groups
Seagull parameters are set as follows:
%% 定义海鸥优化参数
pop=20; %种群数量
Max_iteration=30; % 设定最大迭代次数
dim = 2;%维度,即树个数和最小叶子点树
lb = [1,1];%下边界
ub = [50,20];%上边界
fobj = @(x) fun(x,Pn_train,Tn_train,Pn_test,Tn_test);
From the MSE results, the improved Seagull-RF is significantly better than the unimproved results.