Prediction algorithm | Improved particle swarm algorithm optimization extreme learning machine IDM-PSO-ELM

Regression fit:
Insert image description here
Insert image description here
Insert image description here

Classification

Insert image description here

Insert image description here

Insert image description here

This article is the fourth in the author's prediction algorithm series. The previous article introduced BP, SVM, RF and their optimization. Interested readers can learn about it in the author's previous articles. This article will introduce - Extreme Learning Machine

In the past few decades, gradient-based learning methods have been widely used to train neural networks. For example, the BP algorithm uses the back propagation of errors to adjust the weights of the network. However, due to inappropriate learning step sizes, the convergence speed of the algorithm is reduced. It is very slow and prone to local minima, so a large number of iterations are often required to obtain satisfactory accuracy. These problems have become the main bottlenecks restricting its development in application fields.

Huang et al.[1] proposed a simple and efficient single hidden layer feedforward neural network learning algorithm called extreme learning machine (ELM). Its typical advantage is that the training speed is very fast, and it has the characteristics of fast training speed, The advantage of low complexity overcomes the problems of local minima, overfitting and inappropriate selection of learning rate of traditional gradient algorithms [2]. It is currently widely used in pattern recognition, fault diagnosis, machine learning, soft measurement, etc. field.

00 catalog

1 Standard extreme learning machine ELM

2 Code directory

3 Optimized implementation of ELM

4 Source code acquisition

5 Outlook

references

01 Standard extreme learning machine ELM

1.1 ELM principle

Given a data set: T={(x1,y1),…,(xl,yl)}, where xi∈Rn, y∈R, i=1,…,l; contains N hidden layer nodes, activation function The extreme learning machine regression model for G can be expressed as:Insert image description here

Among them: bi is the output weight of the i-th hidden layer node and the output neuron; ai is the input weight of the input neuron and the i-th hidden layer node; bi is the bias of the i-th hidden layer node; h( x)=[G(a1,b1,x1),…,(aN,bN,xN)] is called the hidden layer output matrix. And ai, bi are randomly selected at the beginning of training and remain unchanged during the training process. The output weights can be obtained by solving the least squares solution of the following linear equations:Insert image description here

The least squares solution to this system of equations is:Insert image description here

Among them, H+ is called the Moore-Penrose generalized inverse of the hidden layer output matrix H.

1.2 ELM optimization

Since ELM randomly gives the input weight matrix and hidden layer bias, the calculation formula in Section 1.1 shows that the output weight matrix is ​​calculated from the input weight matrix and hidden layer bias. There may be some input weight matrices. The deviation from the hidden layer is 0, that is, some hidden layer nodes are invalid. Therefore, in some practical applications, ELM requires a large number of hidden layer nodes to achieve ideal accuracy. And ELM has poor response ability to samples that do not appear in the training set, that is, insufficient generalization ability. To address the above problems, a learning algorithm that combines the optimization algorithm with the extreme learning machine network can be used, that is, the optimization algorithm is used to optimize the input layer weights and hidden layer bias of the extreme learning machine to obtain an optimal network [3].

For ELM optimization, take PSO optimization ELM as an example, the process is as follows:Insert image description here

02 Code directory

In this paper, the improved dynamic multi-swarm particle swarm optimization ELM is compared with the standard particle swarm optimization ELM and ELM algorithm.

Improved dynamic multi-swarm particle swarm optimization algorithm (IDM-PSO)

Classification question:

Among them, IDM_PSO_ELM.m, MY_ELM_CLA.m and PSO_ELM.m are all main programs that can be run independently. And result.m can be used to compare the effects of these three algorithms. If you want to compare the algorithms, just run this program alone.

Part of the source code is as follows:Insert image description here

Regression fitting problem:Insert image description here

The procedure of regression fitting is similar to that of classification and will not be repeated here.

Part of the source code:
Insert image description here

03 Comparison of prediction results of extreme learning machine and its optimization

3.1 Regression fitting

The application problem is a multiple-input single-output problem.

3.1.1 Evaluation indicators

In order to verify the accuracy and accuracy of the built model, the root mean square error (Root Mean Square Error, RMSE), the mean absolute percentage error (Mean Absolute Percentage Error, MAPE) and the mean absolute value error (Mean Absolute Error, MAE) were used respectively. ) as evaluation criteria.Insert image description here

In the formula, Yi and Y ^ i are the true value and the predicted value respectively; n is the number of samples.

3.1.2 Simulation results
Insert image description here
Insert image description here

Insert image description here
Insert image description here

It can be seen that the optimization is effective.

3.2 Classification

3.2.1 Evaluation indicators

In order to verify the accuracy of the model, this paper measures the confusion matrix, accuracy, precision, recall, and F1-score.

1. Confusion matrix

The confusion matrix is ​​a visualization tool. Each column of the confusion matrix represents the predicted category, and its total number is the total number of data predicted by the classifier for that category. Each row represents the true category of the data, and its total number is the data instances under that category. total. The elements on the main diagonal are the number of accurate classifications for each category. The classification accuracy of the multi-classification model can be intuitively seen through the confusion matrix.

2. Accuracy

The simplest way to judge the multi-classification effect of SVM is to use the following formula to calculate the accuracy r
Insert image description here

Where ncorrect represents the number of correctly classified samples, and N represents the total number of samples in the test set.

3. Accuracy

The accuracy is calculated as the number of samples that are correctly predicted as positive class, as a proportion of the number of samples that are predicted to be positive class. The formula is as follows
Insert image description here

Among them, TP represents the number of positive samples predicted as positive samples, and FP represents the number of negative samples predicted as positive samples.

4. Recall rate

The recall rate is calculated as the number of samples that predict the positive class correctly as a proportion of the number of samples that are actually positive class. The formula is as follows:
Insert image description here

Among them, FN represents the number of positive samples that are judged to be negative samples.

5、F1-score

F1-score is the harmonic average of precision and recall. The higher the F1-score, the more robust the model is. The calculation formula is:
Insert image description here

6. Macro-averaging

Macro-averaging refers to the arithmetic average of each statistical indicator value of all categories, that is, macro-precision, macro-recall, and macro-F1 score. ), and its calculation formula is as follows:
Insert image description here

3.2.2 Simulation results
Insert image description here

Insert image description here

Insert image description here
Insert image description here

Insert image description here

It is still valid for classification problems. It should be noted that because some categories do not have predicted values, it can be seen from the calculation formulas of accuracy and F1 that NaN will appear.

04 Source code acquisition

Follow the author or send a private message

05 Outlook

This article introduces the case of applying the improved dynamic multi-swarm particle swarm algorithm to optimize ELM. The Sparrow algorithm mentioned by the author can also be used. At the same time, the extreme learning machine has problems such as inability to converge when processing large-scale data. , there have been variants such as nuclear extreme learning machines, multi-core extreme learning machines, and deep extreme learning machines. The author will also update the implementation of these algorithms in the future.

references

[1]Huang G B,Zhu Q Y,Siew C K. Extreme learning machine:A new learning scheme of feedforward neural networks//Proceedings of the 2004 1EEE International Joint Conferenceon Neural Networks. Budapest,Hungary,2004:985-990

[2]FAN Shu-ming,QIN Xi-zhong,JIA Zhen-hong,et al.Time series forecasting based on ELM improved layered ensemble architecture[J].Computer Engineering and Design,2019,40(7):1915-1921.

[3] Wang Jie, Bi Haoyang. An extreme learning machine based on particle swarm optimization [J]. Journal of Zhengzhou University (Science Edition), 2013, 45(01): 100-104.

Another note: If any partners have optimization problems to be solved (in any field), you can send them to me, and I will selectively update articles that use optimization algorithms to solve these problems.

If this article is helpful or inspiring to you, you can click the like (ง •̀_•́)ง in the lower right corner (you don’t have to click). If you have any customization needs, you can send a private message to the author.

Guess you like

Origin blog.csdn.net/sfejojno/article/details/132258547