Mathematical modeling - detailed explanation of data envelopment analysis steps and procedures

Mathematical modeling - detailed explanation of data envelopment analysis steps and procedures



foreword

Data envelopment analysis (DEA) is a method of operations research and the study of economic production boundaries. This method is generally used to measure the productivity of some decision-making departments. Here, the data envelopment analysis is relatively single for a specific topic, and there are relatively few points that can be explained (water papers). However, in the optimization of specific input-output problems, it has strong practical applicability, so it is widely used in real life, and it is also very common in master's and doctoral dissertations.


1. Introduction to data envelopment analysis

1. Principle

  There are many models of data envelopment analysis, including CCR model, BBC model, crossover model, etc. The most commonly used models are the CCR model and the BBC model. Let's get to know these two models below, and first understand the two principles :

1. Decision-making unit: A decision-making unit refers to an operating entity that can convert certain inputs into corresponding outputs. That is, the process from input to output.
2. Efficient frontier: When the input is fixed, for the output set D, no matter how we change the production mix, the output cannot exceed D, then (X, Y) is called an efficient production activity, and this input-output corresponds to a frontier, given by The convex hull formed by many "effective productions" is the frontier.

  The essential principle of data envelopment analysis is to conduct a comprehensive analysis of the input and output data of the DMU to obtain the relative index of each DMU efficiency, and then sort all the efficiency indexes of the DMU to determine the relatively effective DMU. At the same time, it can also be pointed out by the projection method. Reasons why non-DEA is effective or weak DEA is effective, as well as the direction and degree of improvement, provide management decision-making information for managers.

2. CCR model

  The CCR model is the earliest and most classic DEA model. It judges the relative rationality and effectiveness of each unit through the economic system that “inputs a certain amount of production factors and outputs a certain amount of products. From the perspective of input resources, Under the current output level, compare the use of input resources as the basis for benefit evaluation. This model is called "input-oriented model". In plain English, the input is fixed, and the output should be as much as
  possible The model is as follows
:
  Suppose the data can be divided into n DMUs (number of samples), m input (input) variables, s output (output) variables, vector Xi = ( x 1 i , x 2 i , ⋯ , xmi ) X_i=(x_{1i},x_{2i},\cdots,x_{mi})Xi=(x1 i,x2 i,,xmi) means theiiInput variables of i decision-making units (samples), vectorY i = ( y 1 i , y 2 i , ⋯ , ysi ) Y_i=(y_{1i},y_{2i},\cdots,y_{si})Yi=(y1 i,y2 i,,yand i) means theiiOutput variable for i decision units (samples).
  Define the efficiency evaluation index of decision-making unit j as:
hi = u TY jv TX j , i = 1 , 2 , ⋯ , n hi=\frac{u^TY_j}{v^TX_j},i=1,2,\cdots ,nhi=vTXjuTYj,i=1,2,,n
  其中, u T = ( u 1 , u 2 , ⋯   , u m ) u^T=(u_1,u_2,\cdots,u_m) uT=(u1,u2,,um), v T = ( v 1 , v 2 , ⋯   , v m ) v^T=(v_1,v_2,\cdots,v_m) vT=(v1,v2,,vm) ,u = uiu=u_iu=uifor secondThe weight of i inputs, vi v_ivifor secondThe weight of the i outputs.
  The mathematical model for evaluating the efficiency of decision-making unit j is:
  objective function:
max ⁡ u TY jv TX j \max\frac{u^TY_j}{v^TX_j}maxvTXjuTYj
  约束条件
s . t . { u T Y j v T X j ≤ 1 u T Y j = 1 u ≥ 0 , v ≥ 0 , s.t.\left\{ \begin{array}{lr} \frac{u^TY_j}{v^TX_j} \le1 \\ u^TY_j=1 \\ u\ge 0,v\ge 0, \\ \end{array} \right. s.t.vTXjuTYj1uTYj=1u0,v0,
  Yes, the CCR model is a very simple mathematical programming model. The objective function is the ratio of the output multiplied by the weight to the upper output multiplied by the weight. The greater the efficiency evaluation index, the higher the efficiency of input and output. The constraint condition is that the input is agreed as 1. The efficiency evaluation index is less than 1 (the efficiency evaluation index cannot exceed 1, because your output cannot exceed your input), and the weight is greater than 0 (here the weight sum is not necessarily 1, which is the optimal result of the decision-making unit Calculated, in fact, it should be called a coefficient is more appropriate).

3. BCC model

  The BCC model is also a very classic DEA model. The BCC model takes into account that in the case of variable scale returns (VRS), that is, when some decision-making units are not operating at the optimal scale, the measurement of technology efficiency (TE) will be affected. The impact of scale efficiency (Scale efficiency, SE). This model is called "output-oriented model".
  In the vernacular, the input can continue to increase, and the relationship between input and output in this case is studied.
  Therefore, when constructing the BCC model, we need to assume variable returns to scale, make simple improvements to the constraints of the CCR model, and add convexity assumptions: ∑ λ j = 1 , j = 1 , 2 , ⋯ , n \sum {\lambda_j}=1,j=1,2,\cdots,nlj=1,j=1,2,,n , the mathematical model can be obtained as follows:

Objective function:
min ⁡ θ \min \thetaminθ
约束条件:
s . t . { ∑ j = 1 n λ j y j + s + = θ x 0 ∑ j = 1 n λ j y j − s − = θ y 0 ∑ λ j = 1 , j = 1 , 2 , , n s.t.\left\{ \begin{array}{l} \sum\limits_{j=1}^n{\lambda _jy_j+s^+=\theta x_0}\\ \sum\limits_{j=1}^n{\lambda_jy_j-s^-=\theta y_0}\\ \sum{\lambda _j=1,j=1,2,,n}\\ \end{array} \right. s.t.j=1nljyj+s+=θx0j=1nljyjs=θy0lj=1,j=1,2,,n
  Aiming at the situation brought about by the efficiency of scale, whether the output can keep up with the increase of input, the most ideal situation is when θ is 1, the output can keep up with the input.

4. Practical application of CCR and BBC

  In general, the CCR model alone can be used to complete a small question in a mathematical modeling paper. BCC is an advanced model of CCR, which is usually not used much. Everyone only needs to master CCR.
  We can do DEA analysis of CCR model and BCC model on the data at the same time to judge the scale efficiency (SE) of the decision-making unit. If there is a difference in the technical benefits of the decision-making unit CCR and BCC, it indicates that the size of the decision-making unit is invalid, and the scale-ineffectiveness efficiency can be calculated from the difference between the technical benefits of the BCC model and the technical benefits of the CCR model.
  The CCR and BCC models can only compare the production efficiency of decision-making units at the same time point horizontally. When using them, you should pay attention to: if there are n samples, you need to compare n times.

2. Code program

The matlab code is as follows:

clear
clc
format long
data=[39414	2823	34877	44562	2036	603	322	934936	929914	1492	29811
54934	1911	52242	35262	3862	908	396	1075563	1030664	1780	29811
96442	2743	88737	303221	4307	1596	694	1104835	1010146	1936	32678
107079	3036	98513	478883	3956	2530	1089	909220	862077	2160	36063
124359	3326	116897	378318	4102	2669	1179	1117851	1123109	2349	38951
140167	3900	130355	261203	4180	3538	1991	1116429	1100510	2446	40324
161523	3989	153722	444755	4309	3727	1593	878466	880226	2637	43211
177681	4669	167161	422267	4630	6629	1867	1048053	1003952	2904	47116
124969	4416	111415	286399	3829	5665	2591	1142395	1112661	3092	49406
146015	3200	129997	228695	5308	4911	2506	1202365	1112475	3252	51119
]';
 
X=data([1:5],:);%X为输入变量
Y=data([6:11],:);%Y为输出变量
[m,n]=size(X);% m为输入变量个数,n为样本数
s=size(Y,1);%s为一共有多少个输出变量
A=[-X' Y'];
%由于目标函数求最小,这里的-X就转化成了求最大
b=zeros(n,1);
LB=zeros(m+s,1);UB=[];

for i=1:n
   f=[zeros(1,m) -Y(:,i)'];
   Aeq=[X(:,i)',zeros(1,s)];
   beq=1;
   w(:,i)=linprog(f,A,b,Aeq,beq,LB,UB);
   E(i,i)=Y(:,i)'*w(m+1:m+s,i);
   %迭代是为了防止出现linprog出现局部最优的情况
   %这种情况比较少见,也可以直接去掉
   for j=1:100
       w(:,i)=linprog(f,A,b,Aeq,beq,LB,UB,randn(11,1));
       D(i,i)=Y(:,i)'*w(m+1:m+s,i);%产出值*产出系数
       if D(i,i)<E(i,i)
           E(i,i)=D(i,i)
       end
   end%前m列为投入系数,后s列为产出系数
end
theta=diag(E)';
fprintf('使用CCR-DEA方法对此的相对评价结果为:\n');
disp(theta);

The Stata code is as follows

ssc install dea, replace
dea ivars = ovars [if] [in] [using/filename ][,  ///
    rts(string) ort(string) stage(#)  ///
    trace saving(filename) ]

ivars represents the input variable
ovars represents the output variable
rts(string) You can choose the corresponding model of different returns to scale: the default value is rts(crs), that is, constant returns to scale (corresponding to the CCR model), rts(vrs), rts(drs) and rts(nirs) represent variable returns to scale (corresponding to the BCC model), decreasing returns to scale and non-increasing returns to scale
ort(string) specifies the direction: the default value is ort(i), which means an input-oriented DEA model; ort(o ) represents the output-oriented DEA model; the input-oriented DEA model refers to minimizing the input while at least satisfying the existing output level, while the output-oriented DEA model refers to the model that does not require more input In the case of maximizing the output
stage(#) default value is stage(2), namely two-stage DEA model; stage(1) is single-stage DEA model
trace allows all sequences to be displayed in the result window and saved in "dea.log "
Note: The decision unit variable dmu needs to be imported.

The following is the essence of using matlab data envelopment method to search for optimization, so I wrote a simulated annealing algorithm to solve it myself, which was written by the author himself (defective, the generation of new solutions has not been written well, and the follow-up modification is completed):

clear
clc
format long
data=[39414	2823	34877	44562	2036	603	322	934936	929914	1492	29811
54934	1911	52242	35262	3862	908	396	1075563	1030664	1780	29811
96442	2743	88737	303221	4307	1596	694	1104835	1010146	1936	32678
107079	3036	98513	478883	3956	2530	1089	909220	862077	2160	36063
124359	3326	116897	378318	4102	2669	1179	1117851	1123109	2349	38951
140167	3900	130355	261203	4180	3538	1991	1116429	1100510	2446	40324
161523	3989	153722	444755	4309	3727	1593	878466	880226	2637	43211
177681	4669	167161	422267	4630	6629	1867	1048053	1003952	2904	47116
124969	4416	111415	286399	3829	5665	2591	1142395	1112661	3092	49406
146015	3200	129997	228695	5308	4911	2506	1202365	1112475	3252	51119
]';
 
X=data([1:5],:);%X为输入变量
Y=data([6:11],:);%Y为输出变量
[m,n]=size(X);% m为输入变量个数,n为样本数
s=size(Y,1);%s为一共有多少个输出变量
A=[-X' Y'];%由于目标函数求最小,这里的-X就转化成了求最大
b=zeros(n,1);
LB=zeros(m+s,1);UB=[];
A=[-X' Y'];
for i=1:n
%    f=[zeros(1,m) -Y(:,i)'];
%    Aeq=[X(:,i)',zeros(1,s)];
%    beq=1;
%    w(:,i)=linprog(f,A,b,Aeq,beq,LB,UB);%前3列为投入系数,后2列为产出系数
%    E(i,i)=Y(:,i)'*w(m+1:m+s,i);%产出值*产出系数
    temperature=1000;%初始温度
    iter=1000;%迭代次数
    L=1;%用于记录迭代的次数
    sj=randi([1,5]);
    w=zeros(11,10);
    w(sj,i)= 1/X(sj,i);
    x=w(:,i);
    kns=find(w([1,5],i) ~= 0);
    if size(kns)==[0 1]
        x=x;
    else
        kns2=randi([1,length(kns)]);
    %     aeq=[X(:,i)',zeros(1,s)];
    %     beq=1;
        r = randi([1,5]);
        x(r,1)=x(kns(kns2),1)*X(kns(kns2),i)/X(r,i);
    end
    f=[zeros(1,m) -Y(:,i)']*x;
    sj2=randi([6,11]);
    x(sj2,1)=1/data(sj2,i);
%     [-X' Y']*x <= zeros(n,1);
    if x <= 0  %惩罚函数
        f=f+1000000*temperature;
    elseif A*x > zeros(n,1)
        f=f+1000000*temperature;
    end
    while temperature>0.01
        for j=L:iter
            %产生新解
            kns=find(x([1,5]) ~= 0);
            if size(kns)==[0 1]
                x1=x;
            else
                kns2=randi([1,length(kns)]);
                r = randi([1,5]);
                x1=x;
                x1(r,1)=x(kns(kns2),1)*X(kns(kns2),i)/X(r,1);
            end
            kns3=find(x1([6,11],1) ~= 0);
            x1(kns3+5)=x1(kns3+5).*rand(length(kns3),1);
            sj2=randi([6,11]);
            x1(sj2,1)=1/data(sj2,i)*rand();
            f1=[zeros(1,m) -Y(:,i)']*x1;%计算适应度
            if x1 <= 0  %惩罚函数
                f=f+1000000*temperature;
            elseif A*x1 > zeros(n,1)
                f=f+1000000*temperature;
            end
            delta_e=f1-f;
            if delta_e<0
                x=x1;
            else
                if exp(delta_e/temperature)>rand()
                    x=x1;
                end
            end
        end
        L=L+1;
        %退火的效率
        temperature=temperature*0.99;
    end
    w(:,i)=x;
    E(i,i)=Y(:,i)'*w(m+1:m+s,i);%产出值*产出系数
end
theta=diag(E)';
fprintf('用DEA方法对此的相对评价结果为:\n');
disp(theta);

Three, actual combat

1. Interpretation of results

  We will use the results of matlab as a demonstration analysis, and hope to give you some hints.
  The output of matlab is as follows:
insert image description here

  According to the output results, we can get the production efficiency of 10 decision-making units: in 2010, 2011, 2017, 2018, and 2019, the DMU is effective in 5 years, and the score is 1; in other years, the DMU is inefficient, and the scores are 0.88 and 0.85 respectively. , 0.93, 0.94, 0.86.
insert image description here

  For 2012 (the third column), reducing 0.403 units of government appropriation input, 0.597 units of full-time equivalent input of R&D personnel, 0.107 units of patent application volume output, and 0.107 units of new product output value output will make production more efficient . Do the same for the other years.
  Analyze why this situation occurs, and make targeted recommendations when making suggestions to Maoming City in the future.

2. Advantages and disadvantages of the model

  Data envelopment analysis itself does not need to input any weight, it is based on the most weight sought by the data itself, which is beneficial to the evaluation of the decision-making unit (sample), and can avoid the situation where the index is determined in the priority sense. However, this is also his shortcoming. Some indicators are naturally high. For example, in terms of output, combined with Maoming’s socialist road with Chinese characteristics, economic development is an absolute priority, and it can be appropriately lowered in terms of paper results. .
  DEA is simple in structure, easy to understand, easy to use, and easy to explain.

Summarize

  Nowadays DEA has received considerable attention as a management tool for evaluating organizational performance, and it is widely used in evaluating the efficiency of public and private sectors such as banks, airlines, hospitals, universities, and manufacturing industries. The dea command enables the efficiency of the decision-making unit to be evaluated directly in Stata, and it is no longer necessary to use statistical analysis software and data envelopment analysis software at the same time, which greatly simplifies the operation. We can see that many current papers evaluating organizational efficiency use Stata to solve the DEA model. I like to use matlab, so I use matlab to write. However, these two commands can only specify the relatively basic models mentioned above. To specify the FG model, ST model, CCGSS additive model and CCW model with infinitely many DMUs, further optimization and optimization of the commands are required. Improve.
  The above are some of my humble opinions on data envelopment analysis, and I hope everyone can point out my mistakes.

References

Zhihu (Lian Yujun): https://zhuanlan.zhihu.com/p/130289495
CSDN (breeding ape): https://blog.csdn.net/qq_48774513/article/details/120198871
Baidu Encyclopedia: https:/ /baike.baidu.com/item/%E6%95%B0%E6%8D%AE%E5%8C%85%E7%BB%9C%E5%88%86%E6%9E%90/82754

Guess you like

Origin blog.csdn.net/weixin_52952969/article/details/125476685