R implementation case of Fisher discrimination, Bayes discrimination and distance discrimination of discriminant analysis

        The shape, performance, price and level data of the products produced by an enterprise are shown in the following table:

The shape, performance, price, grade and other indicators of an enterprise's products

  The topic comes from the after-school exercises of "Multivariate Statistical Analysis - Based on R"

          The following uses Fisher discriminant method and Bayes discriminant method for discriminant analysis.

           Advanced Fisher discrimination, the R program is as follows:

#Fisher判别
ex5.3<-read.csv("ex5.3.csv",header = T)
ex5.3
library(MASS) #加载MASS程序包,以便使用其中lda函数
ld=lda(G~x1+x2+x3,data=ex5.3)
ld

       The result of the operation is as follows:

       The program outputs the function used by lda(), the prior probability of each group, the mean vector of each group, and the coefficients of the first and second discriminant functions

, the contribution of the two discriminant 2 functions to the discriminant.

Next, we use the predict() function to judge the original data, compare the classification of the lda() function with the classification of the original data, and examine the size of the error. The R program is as follows:

Z=predict(ld)
Z
newG=Z$class
cbind(G=ex5.3[,4],newG,Z$post)

       The result of the operation is as follows:

        The G column of the running result represents the original data classification, newG represents the back judgment classification, and the last three columns give the posterior probability of each sample judgment as each class. The Fisher discriminant method judges the sample as the category with the largest posterior probability, which is also the reason for the misjudgment of 3, 6, 8, and 13.

If it is necessary to further distinguish the categories of the two new products, their indicators are (17,46,79), (77,54,84), and use the predict() function to distinguish them. The R program is as follows:

newdata=data.frame(x1=c(17,77),x2=c(46,54),x3=c(79,84))
(predict(ld,newdata))

        The result of the operation is as follows:

        The results of the discrimination showed that the two products were classified into the first category and the second category respectively.

The Bayesian discrimination is performed below, and the R program is as follows:

ex5.3<-read.csv("ex5.3.csv",header = T)
attach(ex5.3)
library(MASS) #加载MASS程序包,以便使用其中lda函数
ld=lda(G~x1+x2+x3,prior=c(5,9)/14) #用先验概率进行判别
ld

        The result of the operation is as follows:

       Then use predict to judge back, and compare it with the original data classification:

Z=predict(ld)
newG=Z$class
cbind(G,newG,Z$post,Z$x)

        The result of the operation is as follows:

        The running results show that the samples in the 3rd, 6th, 8th, and 13th groups were misjudged. If it is necessary to further distinguish the categories of the two new products, their indicators are (17,46,79), (77,54,84), Use the predict() function to distinguish them, the R program is as follows

若需要进一步判别两个新产品的类别,它们的指标分别为(17,46,79),(77,54,84),利用predict()函数对它们进行判别,R程序如下

        The result of the operation is as follows:

         Two new products are judged as Class I and Class III respectively, and the discriminant result of the second product is different from that of Fisher discriminant.

The following distance discrimination case

        According to experience, the humidity temperature difference x1 and temperature difference x2 between today and yesterday are important factors for predicting whether it will rain or not tomorrow, and use the distance discrimination method to distinguish according to the data in the table below:

       The R program is as follows:

# 距离判别
setwd("D:/学习资料/R软件/多元统计分析 基于R》  14797268/多元统计分析——基于R(第2版) R-data")
ex5.2<-read.csv("ex5.2.csv",header = T)
classG1=ex5.2[1:10,2:3]
classG2=ex5.2[11:17,2:3]
newdata=ex5.3[1:20,2:3]

#进行距离判别
source("DDA2.R") #载入自编程序DDA2.R
DDA2(classG1,classG2)

       The result of the operation is as follows:

       The new data is judged below, and the R program is as follows:

#进行距离判别
source("DDA2.R") #载入自编程序DDA2.R
DDA2(classG1,classG2,newdata)

        The result of the operation is as follows:

 Samples 18, 19, and 20 were all classified into the second category, which was consistent with the classification of the original data.

Appendix: DDA2.R program

 DDA2<-function (TrnG1, TrnG2, TstG = NULL, var.equal = FALSE){
    if (is.null(TstG) == TRUE) TstG<-rbind(TrnG1,TrnG2)
    if (is.vector(TstG) == TRUE)  TstG<-t(as.matrix(TstG)) else if (is.matrix(TstG) != TRUE)
       TstG<-as.matrix(TstG)
    if (is.matrix(TrnG1) != TRUE) TrnG1<-as.matrix(TrnG1)
    if (is.matrix(TrnG2) != TRUE) TrnG2<-as.matrix(TrnG2); 
    nx<-nrow(TstG)
    blong<-matrix(rep(0, nx), nrow=1, byrow=TRUE, dimnames=list("blong", 1:nx))
    mu1<-colMeans(TrnG1); mu2<-colMeans(TrnG2) 
    if (var.equal == TRUE  || var.equal == T){
        S<-var(rbind(TrnG1,TrnG2))
       w<-mahalanobis(TstG, mu2, S)-mahalanobis(TstG, mu1, S)
        } else{
        S1<-var(TrnG1); S2<-var(TrnG2)
       w<-mahalanobis(TstG, mu2, S2)-mahalanobis(TstG, mu1, S1)
        }
    for (i in 1:nx){if (w[i]>0) blong[i]<-1 else blong[i]<-2}; blong
}

Guess you like

Origin blog.csdn.net/weixin_44734502/article/details/129330757