Experimental data mining (V): Naive Bayesian classification R language

First, the purpose of the experiment:

  1. Naive Bayes algorithm to understand the basic principles;
  2. Naive Bayes algorithm can be used to classify the data
  3. Write functions to implement the example output data set.

Second, the experimental software:

Rstudio

Third, experimental ideas

Data ready data, classification desired test tuple X is
the main function NaiveBayes = function () {}
formula P (Ci | X) = P (X | Ci) P (Ci) / P (X)

  • 1. division C1, C2 class, "yes", "no"

Seeking P (C1), P (C2)

  • 2. seeking P (Xi | Ci)

PXi_Ci = function(data,test,class_result){}
得PXi_C1,PXi_C2

  • 3.求PX_Ci = function(PXi_Ci)

P(X|C1) = P(X1|C1)*P(X2|C1)*P(X3|C1)…
P(X|C2) = P(X1|C2)*P(X2|C2)*P(X3|C2)…

  • 4. Comparison PX_C1 the PC [. 1], PX_C2 the PC [2], SELECT function to determine the predicted class is a tuple X yes or No, the output final result

Test NaiveBayes (data, test)

Fourth, the source code:

data<-data.frame(
  
  Age=c("youth","youth","middle_aged","senior","senior","senior","middle_aged","youth","youth","senior","youth","middle_aged","middle_aged","senior"),
  income=c("high","high","high","medium","low","low","low","medium","low","medium","medium","medium","high","medium"),
  student=c("no","no","no","no","yes","yes","yes","no","yes","yes","yes","no","yes","no"),
  credit_rating=c("fair","excellent","fair","fair","fair","excellent","excellent","fair","fair","fair","excellent","excellent","fair","excellent"),
  buys_computer=c("no","no","yes","yes","yes","no","yes","no","yes","yes","yes","yes","yes","no))

#yes
test<-data.frame(Age="youth",income="medium",student="yes",credit_rating="fair")

NaiveBayes = function(data,test){

rowCount=nrow(data) #计算数据集中有几行,也即有几个样本点
colCount=ncol(data) #几列
class_result = levels(factor(data[,colCount]))  # "no""yes"

class_Count = c() #存放个数
class_Count[class_result]=rep(0,length(class_result)) 

 #pC1,pC2

for(i in 1:rowCount){ 
  if(data[i,colCount] %in%  class_result)
    temp=data[i,colCount]
  class_Count[temp]=class_Count[temp]+1
}
PC = c() 
for (i in 1:length(class_result)) {
  PC[i] = class_Count[i]/rowCount
  
}
PC[1] 
PC[2]

#####求P(Xi|Ci)


PXi_Ci = function(data,test,class_result){
 
  xCount= c()
for(k in 1:ncol(test)){
   xCount[k] = 0
   temp = 0
   for(i in 1:nrow(data)){
      if(as.vector(data[i,k]) == as.vector(test[1,k]) & data[i,ncol(data)] == class_result){
        xCount[k] <- xCount[k]+1
      }
   
  }
}
  temp = subset(data,data[,ncol(data)] == class_result)
  Pxi_Ci = xCount/nrow(temp)
  return(Pxi_Ci)
} 

PXi_C1= PXi_Ci(data,test,class_result[1]) #"no"
PXi_C2= PXi_Ci(data,test,class_result[2]) #"yes"

#####求P(X|Ci) 



PX_Ci = function(PXi_Ci){ 
result = 1
for(i in 1:length((PXi_Ci))){
    result<-result*PXi_Ci[i]
  }
  
  
  return(result)
}

PX_C1 = PX_Ci(PXi_C1)
PX_C2 = PX_Ci(PXi_C2)


#######P(Ci|X)



Ci_X = data.frame(C1_X = PX_C1*PC[1],C2_X = PX_C2*PC[2])

select = function(data){  #比较找出最大Ci_X的类,存入新的属性decide
  if(data[,1]>data[,2]){
    data$decide = "no"
  }else{
    data$decide = "yes"
  }
  return(data)
}

final = select(Ci_X)
return(final)
}
#测试
NaiveBayes(data,test)

V. Results:

Here Insert Picture Description
Thus for the tuple X, naive Bayes classifier prediction type tuples X is yes

Published 11 original articles · won praise 0 · Views 64

Guess you like

Origin blog.csdn.net/qq_43863790/article/details/104069510