SAS discriminant analysis (Bayes criterion and proc discrim process)

The following table gives the relevant financial data of two types of companies. One is bankrupt companies. The data in the table are the four-year financial indicators of these companies in the two years before bankruptcy. One category consists of the same four financial indicators for approximately the same period for nonbankrupt and bankrupt firms. These four indicators are

 The data of each company is as follows ("0" in the last column of the table indicates a bankrupt company, and "1" indicates a non-bankrupt company)

number x1 x2 x3 x4 group
1 -0.45 -0.41 1.09 0.45 0
2 -0.56 -0.31 1.51 0.16 0
3 0.06 0.02 1.01 0.4 0
4 -0.07 -0.09 1.45 0.26 0
5 -0.1 -0.09 1.56 0.67 0
6 -0.14 -0.07 0.71 0.28 0
7 0.04 0.01 1.5 0.71 0
8 -0.06 -0.06 1.37 0.4 0
9 0.07 -0.01 1.37 0.34 0
10 -0.13 -0.14 1.42 0.44 0
11 -0.23 -0.3 0.33 0.18 0
12 0.07 0.02 1.31 0.25 0
13 0.01 0 2.15 0.7 0
14 -0.28 -0.23 1.19 0.66 0
15 0.15 0.05 1.88 0.27 0
16 0.37 0.11 1.99 0.38 0
17 -0.08 -0.08 1.51 0.42 0
18 0.05 0.03 1.68 0.95 0
19 0.01 0 1.26 0.6 0
20 0.12 0.11 1.14 0.17 0
21 -0.28 -0.27 1.27 0.51 0
1 0.51 0.1 2.49 0.54 1
2 0.08 0.02 2.01 0.53 1
3 0.38 0.11 3.27 0.35 1
4 0.19 0.05 2.25 0.33 1
5 0.32 0.07 4.24 0.63 1
6 0.31 0.05 4.45 0.69 1
7 0.12 0.05 2.52 0.69 1
8 -0.02 0.02 2.05 0.35 1
9 0.22 0.08 2.35 0.4 1
10 0.17 0.07 1.8 0.52 1
11 0.15 0.05 2.17 0.55 1
12 -0.1 -0.01 2.5 0.58 1
13 0.14 -0.03 0.46 0.26 1
14 0.14 0.07 2.61 0.52 1
15 0.15 0.06 2.23 0.56 1
16 0.16 0.05 2.31 0.2 1
17 0.29 0.06 1.84 0.38 1
18 0.54 0.11 2.33 0.48 1
19 -0.33 -0.09 3.01 0.47 1
20 0.48 0.09 1.24 0.18 1
21 0.56 0.11 4.29 0.45 1
22 0.2 0.08 1.99 0.3 1
23 0.47 0.16 2.92 0.45 1
24 0.17 0.04 2.45 0.14 1
25 0.58 0.04 5.06 0.13 1

 Experimental code:

proc import out=temp1                                                                                                                   
datafile="C:\Users\86166\Desktop\IT\SAS实验\实验9\1.xls"                                                                                
DBMS=EXCEL2000 replace;                                                                                                                 
run;   

/*1、2、3*/ 
proc discrim data=temp1  wcov simple pool=no manova method=normal crosslisterr listerr;
class group;
var x1-x2;
priors equal;
run;
/*4*/
proc discrim data=temp1  pool=no manova method=normal crosslisterr listerr;
class group;
var x1-x2;
priors '0'=0.05 '1'=0.95;
run;
/*5*/ 
proc discrim data=temp1  pool=yes manova method=normal crosslisterr listerr;
class group;
var x1-x2;
priors equal;
run;
/*6*/
proc discrim data=temp1  wcov simple pool=no manova method=normal crosslisterr listerr;
class group;
var x1 x3;
priors equal;
run;
proc discrim data=temp1  pool=no manova method=normal crosslisterr listerr;
class group;
var x1 x3;
priors '0'=0.05 '1'=0.95;
run;

proc discrim data=temp1  wcov simple pool=no manova method=normal crosslisterr listerr;
class group;
var x1 x4;
priors equal;
run;
proc discrim data=temp1  pool=no manova method=normal crosslisterr listerr;
class group;
var x1 x4;
priors '0'=0.05 '1'=0.95;
run;
/*7*/ 
proc discrim data=temp1  wcov simple pool=no manova method=normal crosslisterr listerr;
class group;
var x1-x4;
priors equal;
run;
proc discrim data=temp1  pool=no manova method=normal crosslisterr listerr;
class group;
var x1-x4;
priors '0'=0.05 '1'=0.95;
run;

Experimental results:——》Discriminant analysis code picture results and data sets

Analyze the experimental results:

Problems existing in the experiment and solutions:

Question: How to determine whether the results obtained under different prior probability conditions are more reliable?

Solution: Currently, we directly use the probability of misjudgment to compare

Experimental experience (conclusion, evaluation, reflections and suggestions)

  1. simple获取均值等简单统计量,wcov获取组内协方差,pool=yes/no/test分别对应使用联合协方差矩阵,组内协方差矩阵,组内协方差矩阵的齐性检验。manova得到4个统计量,Wilks'lambda用来衡量组内平方和与总平方和之比Wilks'lambda值大,表示各个组的均值基本相等在判别分析中,只有组均值不等时,判别分析才有意义
  2. crosslisterr listerr分别采用最大后验概率,刀切法求误判概率,method=normal指定了总体是正态分布的,priors equal指定先验概率是相等,也可以按分类的内容指定不同类的先验概率。
  3. 当总体属于正态分布时,若总体之间的协方差矩阵不相等,则采用组内协方差矩阵,pool=no,method=normal,priors可以相等,也可以按频数或者特殊值指定;若总体之间的协方差矩阵相等,则采用联合协方差矩阵,pool=yes,method=normal,priors可以相等,也可以按频数或者特殊值指定。一般小样本优先推荐用联合协方差矩阵,先验概率一般指定相等。当总体不属于正态分布时method=npar,采用非参数法进行判别。
  4. 总体和每个类的均值向量可以由simple得到
    wcov得到组内协方差,也就是样本协方差
    pcov得到合并协方差,这两种协方差的对应的使用条件与pool关联
    pool为yes的时候采用合并协方差矩阵,意味着对应的总体协方差矩阵不相同
    为no的时候采用组内协方差矩阵,意味着对应的总体均服从协方差矩阵相等的正态总体
    为test的时候对组内协方差矩阵进行齐性的似然比检验修正,和slpool用来指定齐性检验水平,默认0.1
    method为normal表示类服从多元正态分布,为npar即不服从该分布采用非参数方法
    crosslisterr以交叉表的形式输出回判结果,用的是刀切法
    listerr由后验概率产生的回判错误信息,要求按距离准则得到判别结果
    priors为equal表示先验概率相等,为proportional表示先验概率等于样本频率,也可以指定分类标志的先验概率,但总和为1
    比较判别准则的好坏,看误判结果的Total选项,一般来说谁更小则谁的准则更好

Guess you like

Origin blog.csdn.net/weixin_56115549/article/details/125021486