Extreme Learning Machine (ELM) Regression Prediction Based on Butterfly Algorithm-with Code

Regression Prediction of Extreme Learning Machine (ELM) Based on Butterfly Algorithm


Abstract: This paper uses the butterfly algorithm to optimize the extreme learning machine and use it for regression prediction

1. Overview of the principle of extreme learning machine

A typical single-hidden layer feedforward neural network structure is shown in Figure 1. It consists of an input layer, a hidden layer, and an output layer. The neurons in the input layer and the hidden layer, and between the hidden layer and the output layer are fully connected. Among them, the input layer has n neurons, corresponding to n input variables, and the hidden layer has l neurons; the output layer has m neurons, corresponding to m output variables. Without loss of generality, let the connection weight w between the input layer and the hidden layer be:
w = [ w 11 w 12 . . . w 1 , nw 21 w 22 . . . w 2 n . . . wl 1 wl 2 . . . wln ] (1) w =\left[\begin{matrix}w_{11}&w_{12}&...&w_{1,n}\\ w_{21}&w_{22}&.. .&w_{2n}\\ ...\\ w_{l1}&w_{l2}&...&w_{ln} \end{matrix}\right]\tag{1}w= w11w21...wl 1w12w22wl 2.........w1,nw2 nwln ( 1 )
Among them,wn w_nwnIndicates the input layer iii neurons and hidden layerjjConnection weights between j neurons.

Let the connection weight between hidden layer and output layer be β \betaβ :
β = [ β 11 β . . . . . . . . β 1 m β 21 β . . . . . . . . β 2 m . . . . . . . . β l 1 β l . . . . . . . . β lm ] (2) \beta =\left[\begin{matrix} \beta_{11}&\beta_{12}&...&\beta_{1m}\\ \beta_{21}&\beta_{22 }&...&\beta_{2m}\\ ...\\ \beta_{l1}&\beta_{l2}&...&\beta_{lm} \end{matrix}\right] \tag{ 2}b= b11b21...bl 1b12b22bl 2.........b1 mb2 mblm ( 2 )
Among them, sinceβ jk \beta_{jk}bjkIndicates the connection weight between the jth neuron in the hidden layer and the kth neuron in the output layer.

Let the threshold value b of hidden layer neurons be:
b = [ b 1 b 2 . . . bl ] (3) b =\left[\begin{matrix}b_1\\ b_2\\ ...\\ b_l \ end{matrix}\right]\tag{3}b= b1b2...bl ( 3 )
Let the input matrix X and output matrix Y of the training set with Q samples be
X = [ x 11 x 12 . . . x 1 Q x 21 x 22 . . . x 2 Q . . . xn 1 xn 2 . . . xn Q ] (4) X =\left[\begin{matrix}x_{11}&x_{12}&...&x_{1Q}\\ x_{21}&x_{22}&...&x_ {2Q}\\ ...\\ x_{n1}&x_{n2}&...&x_{nQ} \end{matrix}\right]\tag{4}X= x11x21...xn 1x12x22xn 2.........x1Qx2QxnQ (4)

KaTeX parse error: Undefined control sequence: \matrix at position 11: Y =\left[\̲m̲a̲t̲r̲i̲x̲{y_{11},y_{12},…

Assuming that the activation function of neurons in the hidden layer is g(x), it can be obtained from Figure 1 that the output T of the network is:
T = [ t 1 , . . , t Q ] m ∗ Q , tj = [ t 1 j , . . . , tmj ] T = [ ∑ i = 1 t β i 1 g ( wixj + bi ) ∑ i = 1 t β i 2 g ( wixj + bi ) . . . ∑ i = 1 t β img ( wixj + bi ) ] m ∗ 1 , ( j = 1 , 2 , . . . , Q ) (6) T = [t_1,..,t_Q]_{m*Q},t_j = [t_{1j},. ..,t_{mj}]^T =\left[\begin{matrix}\sum_{i=1}^t\beta_{i1}g(w_ix_j + b_i)\\ \sum_{i=1}^t \beta_{i2}g(w_ix_j + b_i)\\ ...\\ \sum_{i=1}^t\beta_{im}g(w_ix_j + b_i) \end{matrix}\right]_{m* 1},(j=1,2,...,Q)\tag{6}T=[t1,..,tQ]mQ,tj=[t1 j,...,tmj]T= i=1tbi 1g(wixj+bi)i=1tbi2g(wixj+bi)...i=1tbimg(wixj+bi) m1,(j=1,2,...,Q)( 6 )
Formula (6) can be expressed as:
H β = T ' (7) H\beta = T' \tag{7}Hβ=T( 7 )
Among them, T' is the transpose of the matrix T; H is called the hidden layer output matrix of the neural network, and the specific form is as follows:
H ( w 1 , . . . , wi , b 1 , . . . , bl , x 1 , . . . , x Q ) = [ g ( w 1 ∗ x 1 + b 1 ) g ( w 2 ∗ x 1 + b 2 ) . . . g ( wl ∗ x 1 + bl ) g ( w 1 ∗ x 2 + b 1 ) g ( w 2 ∗ x 2 + b 2 ) . . . g ( wl ∗ x 2 + bl ) . . . g ( w 1 ∗ x Q + b 1 ) g ( w 2 ∗ x Q + b 2 ) . . . g ( wl ∗ x Q + bl ) ] Q ∗ l H(w_1,...,w_i,b_1,...,b_l,x_1,...,x_Q) =\left [\begin{matrix} g(w_1*x_1 + b_1)&g(w_2*x_1 + b_2)&...&g(w_l*x_1 + b_l)\\ g(w_1*x_2 + b_1)&g(w_2*x_2 + b_2)&...&g(w_l*x_2 + b_l)\\ ...\\ g(w_1*x_Q + b_1)&g(w_2*x_Q + b_2)&...&g(w_l*x_Q + b_l)\ end{matrix}\right]_{Q*l}H(w1,...,wi,b1,...,bl,x1,...,xQ)= g(w1x1+b1)g(w1x2+b1)...g(w1xQ+b1)g(w2x1+b2)g(w2x2+b2)g(w2xQ+b2).........g(wlx1+bl)g(wlx2+bl)g(wlxQ+bl) Ql

2. ELM learning algorithm

From the previous analysis, we can see that ELM can randomly generate w and b before training, and only need to determine the number of neurons in the hidden layer and the activation function (infinitely differentiable) of the hidden layer and neurons to calculate β \ betabeta . Specifically, the learning algorithm of ELM mainly has the following steps:

(1) Determine the number of neurons in the hidden layer, and randomly set the connection weight w between the input layer and the hidden layer and the bias b of the hidden layer neurons;

(2) Select an infinitely differentiable function as the activation function of hidden layer neurons, and then calculate the hidden layer output matrix H;

(3) Calculate the weight of the output layer: β = H + T ′ \beta = H^+T'b=H+T

It is worth mentioning that related research results show that not only many nonlinear activation functions can be used in ELM (such as Sigmoid function, sinusoidal function and composite function, etc.), but also non-differentiable functions, and even discontinuous function as the activation function.

3. Regression problem data processing

The training set and test set are generated by random method, in which the training set contains 1900 samples, and the test set contains 100 samples. In order to reduce the influence of large variance of variables on the performance of the model, normalize the data before building the model.

4. ELM optimized based on butterfly algorithm

The specific principle of the butterfly algorithm refers to the blog: https://blog.csdn.net/u011835903/article/details/107855860

As can be seen from the foregoing, the initial weights and thresholds of ELM are randomly generated. The initial weights and thresholds generated each time are full-scale. In this paper, the butterfly algorithm is used to optimize the initial weight and threshold. The fitness function is designed as the MSE of the error of the training set:
fitness = argmin ( MSE pridect ) fitness = argmin(MSE_{pridect})fitness=argmin(MSEpridect)

The fitness function selects the MSE error after training. The smaller the MSE error, the higher the coincidence between the predicted data and the original data. The output of the final optimization is the best initial weight and threshold. Then the network trained with the optimal initial weight threshold is tested on the test data set.

5. Test results

The relevant parameters of the butterfly algorithm are as follows:

%训练数据相关尺寸
R = size(Pn_train,1);
S = size(Tn_train,1);
N = 20;%隐含层个数
%% 定义蝴蝶优化参数
pop=20; %种群数量
Max_iteration=50; %  设定最大迭代次数
dim = N*R + N*S;%维度,即权值与阈值的个数
lb = [-1.*ones(1,N*R),zeros(1,N*S)];%下边界
ub = [ones(1,N*R),ones(1,N*S)];%上边界

Compare the butterfly-optimized ELM with the base ELM.

The prediction results are as shown in the figure below

insert image description here

Base ELM MSE error: 0.00049385
BOA-ELM MSE error: 3.3277e-06

Butterfly-ELM is significantly better than base ELM from MSE

6. References

Book "MATLAB Neural Network 43 Case Analysis"

7. Matlab code

Guess you like

Origin blog.csdn.net/u011835903/article/details/130473563