Quantification method of indicator importance Entropy Weight Method

01.Definition    

    In the previous article, we introduced the analytic hierarchy process, but this method has certain limitations: evaluation is highly subjective. But in actual data evaluation and analysis, we need more objective method evaluation.

    This article introduces an objective weighted data analysis method - the entropy method. Entropy Weight Method is a method commonly used in multi-index decision analysis, and its calculation principle is based on information entropy theory. This method can quantify the importance of different indicators and apply it to fields such as multi-objective decision-making, evaluation and ranking.
    Entropy is a concept in information theory, used to describe the uncertainty and amount of information in a random event. In the entropy value method, the calculation formula of entropy value is based on information entropy. It is calculated by dividing the value range of the indicator into several equal parts, calculating the probability of each equal part, and then bringing the probability into the information entropy formula. The entropy value of the indicator. Entropy is a metric used to quantify the volatility and uncertainty of an indicator. The greater the entropy value, the greater the fluctuation and uncertainty of the indicator, and the greater its impact on the decision-making results. In multi-objective decision-making, by calculating the entropy value of each indicator, the importance of each indicator can be quantified, and then the weight of each indicator can be determined, making the decision-making more scientific, objective and accurate.

02.Calculation principle

    The calculation principle of the entropy method is relatively simple and is mainly divided into four steps. However, the original data needs to be preprocessed before calculation. In actual problems, there may be multiple indicators for one problem, and different indicators may have the following situations: ① The larger the indicator value, the better; ② The smaller the indicator value, the better. The better; ③ It is similar to a normal distribution curve, and it is best at a certain point in the middle.

    There are different data preprocessing methods for indicators in different types of situations, but it should be understood that our original purpose of preprocessing data is to eliminate the influence of units and dimensions between different indicators. Therefore, we usually choose to scale all indicators to the [0,1] interval for analysis.

    It is worth noting that if you directly choose normalization at this time, there will be a fatal problem, which is the three dispersion situations of different indicator data mentioned earlier. Before formal normalization, we need to homogenize all indicators (positive homogeneity of indicators: that is, the larger the value of all indicators, the better; negative homogeneity of indicators: that is, the smaller the value of all indicators, the better). In this article, the indicator positive isotropic is used to process the data, that is, the following calculations are performed on the original data according to the situation:

    There are m objects to be evaluated and n evaluation indicators, which can form a data matrix:

picture

Assume that the elements in the data matrix after index forward processing are:

picture

Step0: Indicators are moving in the same direction

    Note : The indicator positive isotropic method is not necessarily certain. You can also derive it by yourself, as long as the basis of the indicator properties is not changed. In addition, attention should be paid to de-negative numbers, that is, if there are negative numbers in the indicator, the original data should be normalized to the [0,1] interval first, and then the indicator can be homogeneous.

    ①The smaller the better indicator:

picture

    ②The bigger the better indicator:

picture

    ③Normal distribution curve type:

picture

Step1: Normalization processing

    After all indicators are forward isotropic, they are then normalized. The normalized matrix is ​​(Rij)m*n:

picture

    The normalization here has been discussed in detail in previous blog posts. If you forget, you can refer to the link:

Data preprocessing regression model

Step2: Calculate the entropy value of each indicator

picture

    Note in this step that Pij cannot be 0, otherwise an error will be reported when performing logarithmic operation. If you want to solve this situation, you only need to set the lower limit of the interval slightly greater than 0 during the previous step of normalization. If the information entropy of an indicator is smaller, it means that the degree of variation of the indicator value is greater and the amount of information provided is greater. It can be considered that the indicator plays a greater role in comprehensive evaluation.

Step3: Calculate the weight of the indicator

picture

Step4: Find the weighted sum and draw a conclusion

picture

03.Code implementation

    This article uses a classic car purchase decision case. The original data is as follows:

Fuel consumption power price safety maintain operate
Honda 5 1.4 6 3 5 7
Audi 9 2 30 7 5 9
Santana 8 1.8 11 5 7 5
Buick 12 2.5 18 7 5 5
%% 计算代码如下所示:
%% 程序初始化
clear all
clc

%% 原始数据读取
data = [5,1.4,6,3,5,7;9,2,30,7,5,9;8,1.8,11,5,7,5;12,2.5,18,7,5,5];
[m,n] = size(data);

%% 指标正同向化
% 我们再这里认为油耗、费用、操作性越小越好;功率、安全性、维护性越高越好
% 越小越好型数据处理
index = [1,3,6];
for i =1:length(index)
    data(:,index(i)) = 1./data(:,index(i));
end

%% 数据归一化
% 由于mapminmax是按行归一化,所以归一化前先将其转置,且为了不出现0,所以将归一化范围下线设定为0.001
new_data = mapminmax(data',0.001,1);
new_data = new_data';

%% 计算个指标熵值
% Step1:求Pij
for i = 1:m
    for j = 1:n
        P(i,j) = new_data(i,j)/sum(new_data(:,j));
    end
end


% Step2:求Ej
for i = 1:m
    for j = 1:n
        e(i,j) = P(i,j)*log(P(i,j));
    end
end

for j=1:n
    E(j)=-1/log(m)*sum(e(:,j));
end

%% 差异系数
g = 1-E;

%% 计算指标权重
for j = 1:n
    w(j) = g(j)/sum(g);
end

disp('各指标权重为:')
disp(w)

%% 计算得分
for i =1:m
    score(i,1) = sum(new_data(i,:).*w);
end

disp('各品牌综合分数为:')
disp(score)

picture

    Although the entropy method determines the index weight based on the degree of variation in the value of each indicator, it is an objective weighting method that avoids deviations caused by human factors. However, there are still certain shortcomings: the importance of the indicator itself is ignored, and sometimes the determined indicator weights are far from the expected results. At the same time, the entropy value method cannot reduce the dimensionality of the evaluation indicators.

Original text: Data evaluation analysis-entropy method

Guess you like

Origin blog.csdn.net/u013288190/article/details/132993935