Time series prediction | MATLAB implements SSA-XGBoost (sparrow algorithm optimization limit gradient boosting tree) time series prediction

Time series prediction | MATLAB implements SSA-XGBoost (sparrow algorithm optimization limit gradient boosting tree) time series prediction

predictive effect

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

basic introduction

Matlab implements SSA-XGBoost time series prediction, sparrow algorithm optimizes extreme gradient boosting tree, optimizes maximum number of iterations, depth, and learning rate; 1. data
is a data set, a univariate time series data set.
2. MainSSAXGBoostTS.m is the main program file, and the others are function files, which do not need to be run;
3. Evaluation indicators R2, MAE, MAPE, MSE, MBE;
4. Note that the program and data are placed in a folder, and the folder cannot be named XGBoost , because some functions have been used, and the operating environment is Matlab2018 and above.

Model description

The Sparrow Search Algorithm (SSA) was proposed in 2020. SSA is mainly inspired by the foraging behavior and anti-predation behavior of house sparrows. The algorithm is relatively novel, and has the advantages of strong search ability and fast convergence speed. Algorithm flow:
Step1: Initialize the population, the number of iterations, and initialize the ratio of predators and joiners.
Step2: Calculate the fitness value and sort it.
Step3: Utilize formula (3) to update the position of the predator.
Step4: Utilize formula (4) to update the position of the joiner.
Step5: Utilize formula (5) to update the position of the vigilante.
Step6: Calculate the fitness value and update the sparrow position.
Step7: Whether the stop condition is satisfied, exit if it is satisfied, and output the result, otherwise, repeat Step2-6;
xgboost belongs to the boosting family and is an engineering implementation of the GBDT algorithm. During the training process of the model, it focuses on the residual, and in the objective function The second-order Taylor expansion is used and regularization is added. In the process of generating the decision tree, the idea of ​​precise greed is adopted. When looking for the best split point, a pre-sorting algorithm is used to pre-order all the features according to the value of the feature. Sort, and then traverse all split points on all features, calculate the objective function gain of all samples split according to these candidate split points, find the feature and candidate split point corresponding to the largest gain, and then split.
In this way, the tree building process is completed layer by layer. When xgboost is trained, it is trained by addition, that is, each time a tree is trained by focusing on the residual, and the final prediction result is the sum of all trees.

The parameters optimized this time include the maximum number of iterations, depth, and learning rate.

programming



   P_percent = 0.2;    % The population size of producers accounts for "P_percent" percent of the total population size       


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
pNum = round( pop *  P_percent );    % The population size of the producers   


lb= c.*ones( 1,dim );    % Lower limit/bounds/     a vector
ub= d.*ones( 1,dim );    % Upper limit/bounds/     a vector
%Initialization
for i = 1 : pop
    
    x( i, : ) = lb + (ub - lb) .* rand( 1, dim );  
    fit( i ) = fobj( x( i, : ) ) ;                       
end
pFit = fit;                      
pX = x;                            % The individual's best position corresponding to the pFit
[ fMin, bestI ] = min( fit );      % fMin denotes the global optimum fitness value
bestX = x( bestI, : );             % bestX denotes the global optimum position corresponding to fMin
 

 % Start updating the solutions.
%
for t = 1 : M    
  
      
  [ ans, sortIndex ] = sort( pFit );% Sort.
     
  [fmax,B]=max( pFit );
   worse= x(B,:);  
         
   r2=rand(1);

 
    for i = 1 : pNum  %r2小于0.8的发现者的改变(1-20% Equation (3)
         r1=rand(1);
        x( sortIndex( i ), : ) = pX( sortIndex( i ), : )*exp(-(i)/(r1*M));%对自变量做一个随机变换
        x( sortIndex( i ), : ) = Bounds( x( sortIndex( i ), : ), lb, ub );%对超过边界的变量进行去除
        fit( sortIndex( i ) ) = fobj( x( sortIndex( i ), : ) );   %就算新的适应度值

  x( sortIndex( i ), : ) = pX( sortIndex( i ), : )+randn(1)*ones(1,dim);
  x( sortIndex( i ), : ) = Bounds( x( sortIndex( i ), : ), lb, ub );
  fit( sortIndex( i ) ) = fobj( x( sortIndex( i ), : ) );
       
  end
      
end
%---------------------------------------------------------------------------------------------------------------------------
      %%%%%%%%%%%%%5%%%%%%这一部位为加入者(追随者)的位置更新%%%%%%%%%%%%%%%%%%%%%%%%%
   for i = ( pNum + 1 ) : pop     %剩下20-100的个体的变换                % Equation (4)
     
         A=floor(rand(1,dim)*2)*2-1;
         
          if( i>(pop/2))%这个代表这部分麻雀处于十分饥饿的状态(因为它们的能量很低,也是是适应度值很差),需要到其它地方觅食
           x( sortIndex(i ), : )=randn(1)*exp((worse-pX( sortIndex( i ), : ))/(i)^2);
          else%这一部分追随者是围绕最好的发现者周围进行觅食,其间也有可能发生食物的争夺,使其自己变成生产者
        x( sortIndex( i ), : )=bestXX+(abs(( pX( sortIndex( i ), : )-bestXX)))*(A'*(A*A')^(-1))*ones(1,dim);  

         end  

   end
  %%%%%%%%%%%%%5%%%%%%这一部位为意识到危险(注意这里只是意识到了危险,不代表出现了真正的捕食者)的麻雀的位置更新%%%%%%%%%%%%%%%%%%%%%%%%%
  c=randperm(numel(sortIndex));%%%%%%%%%这个的作用是在种群中随机产生其位置(也就是这部分的麻雀位置一开始是随机的,意识到危险了要进行位置移动,
                                                                         %处于种群外围的麻雀向安全区域靠拢,处在种群中心的麻雀则随机行走以靠近别的麻雀)
%---------------------------------------------------------------------------------------------------------------------------

        x( sortIndex( b(j) ), : )=bestX+(randn(1,dim)).*(abs(( pX( sortIndex( b(j) ), : ) -bestX)));

    else                       %处于种群中心的麻雀的位置改变

        x( sortIndex( b(j) ), : ) =pX( sortIndex( b(j) ), : )+(2*rand(1)-1)*(abs(pX( sortIndex( b(j) ), : )-worse))/ ( pFit( sortIndex( b(j) ) )-fmax+1e-50);

          end
        x( sortIndex(b(j) ), : ) = Bounds( x( sortIndex(b(j) ), : ), lb, ub );
       
       fit( sortIndex( b(j) ) ) = fobj( x( sortIndex( b(j) ), : ) );
 end
    for i = 1 : pop 
        if ( fit( i ) < pFit( i ) )
            pFit( i ) = fit( i );
            pX( i, : ) = x( i, : );
        end
        

            
        end
    end
  
    Convergence_curve(t)=fMin;
  
end

%---------------------------------------------------------------------------------------------------------------------------
% Application of simple limits/bounds
function s = Bounds( s, Lb, Ub)
  % Apply the lower bound vector

%---------------------------------------------------------------------------------------------------------------------------  
  % Apply the upper bound vector 
  J = temp > Ub;
  temp(J) = Ub(J);
  % Update this new move 
  s = temp;

%---------------------------------------------------------------------------------------------------------------------------

References

[1] https://blog.csdn.net/category_11833757.html?spm=1001.2014.3001.5482
[2] https://blog.csdn.net/article/details/125125787
[3] https://blog.csdn.net/article/details/124928579

Guess you like

Origin blog.csdn.net/kjm13182345320/article/details/132527866