R language study notes - naive Bayes classification

Naive Bayesian classification (naive bayesian, nb) is derived from Bayesian theory, and its basic idea is: Assuming that the sample attributes are independent of each other , for a given item to be classified, solve the problem that other categories appear when this item appears. The probability of , which is the largest, is considered to be the category to be classified. The filtering of spam in the mailbox is the application of Naive Bayes algorithm.

Three stages of naive Bayes classification implementation:

The first stage is preparation. The feature attributes are determined according to the specific situation, and each feature attribute is divided, and then some items to be classified are manually classified to form a training sample set. The input of this stage is all the data to be classified, and the output is the feature attributes and training samples. The only stage that requires manual processing, the quality requirements are higher.

The second stage, the classifier training stage (generating a classifier). Calculate the frequency of occurrence of each category in the training samples and the conditional probability estimation of each feature attribute for each category, and record the results. Its input is feature attributes and training samples, and the output is a classifier.

The third stage is the application stage. The classifier is used to classify the items to be classified, the input is the classifier and the item to be classified, and the output is the mapping relationship between the item to be classified and the category.

The train function in the R language Bayesian classification function package caret, the NavieBayes function in the klaR package, and the naiveBayes function in the e1071 package.

 

How to use the naiveBayes( ) function in the e1071 package:

Computes the conditional a-posterior probabilities of a categorical class variable given independent predictor variables using the Bayes rule.

naiveBayes(formula, data, laplace = 0, ..., subset, na.action = na.pass)

 

        formula: Similar to a general linear regression expression, without constant terms.

data: The training data object to be analyzed.

laplace: Laplace estimate, defaults to 0.

subset: Extract the subset of training data to analyze.

na.action: Handling method for missing values. By default, missing values ​​are not included in the model calculation. If set to na.omit, missing values ​​will be removed for calculation.

 

How to use the NaiveBayes( ) function in the klaR package:

NaiveBayes(formula, data, ..., subset, na.action = na.pass)
NaiveBayes(x, grouping, prior, usekernel = FALSE, fL = 0, ...) 

formula, data, subset, na.omit as above.
x: Specifies the data to be processed, in the form of a data frame or matrix.
grouping: Categorical variables of factor type.
prior: Prior probabilities can be specified for each category. By default, the proportion of each sample is the prior probability.
usekernel: Whether to use the kernel density estimator to estimate the density function.
fL: Whether to perform Laplace correction or not. By default, no correction is performed. When the amount of data is small, it can be set to 1 to perform Laplace correction.

The klaR package is an extension and extension of the e1071 package, adding estimation methods for prior probability and density functions.

 

Using the sample data statement:

This is a car satisfaction car dataset, the specific indicators are as follows:

buy: purchase price (high, high, medium, low)

main: maintenance price (high, high, medium, low)

doors: the number of doors (2, 3, 4, 5, ...)

capacity: the number of passengers

lug boot: body size

safety: level of safety (high, medium, low)

accept: the degree of acceptance (very good, good, satisfied, dissatisfied)

Code:

#naive bayes classification

#Import Data
car <- read.table(file.choose(),sep = ',')
head(car)
#variable rename
colnames(car) <- c('buy',"main",'doors','capacity',
                   'lug boot','safety','accept')
#Select 75% of the data as the training set and 25% of the data as the test set
#Construct the subscript set of the training set
library(caret)
ind <- createDataPartition(car$accept,times = 1,p=0.75,list = F)
cartrain <- car[ind,]
cartest <- car[-ind,]

###e1071 Function package usage
library(e1071)
nb.model <- naiveBayes(accept~.,data = cartrain)
#forecast result
nb_predict <- predict(nb.model,newdata = cartest)
#Generate actual vs forecast crosstab and forecast accuracy
nb.table <- table(actual=cartest$accept,predict=nb_predict)
nb_ratio <- sum(diag(nb.table))/sum(nb.table)

###klaR function package use, extension based on e1071 package
library (klaR)
knb.model <- NaiveBayes(accept~.,data = cartrain)
#forecast result
knb_predict <- predict(knb.model,newdata = cartest[,1:6])
#Generate actual vs forecast crosstab and forecast accuracy
knb.table <- table(actual=cartest$accept,predict=knb_predict$class)
knb_ratio <- sum (diag (knb.table)) / sum (knb.table)
#Analysis results
nb.table;knb.table
nb_ratio;knb_ratio

  operation result:

90.3% accuracy was obtained by naive Bayes classification.

 

refer to:

Source of sample data: https://edu.hellobi.com/course/192/announcement https://pan.baidu.com/s/1nvJpp9J Password: guwv

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325121950&siteId=291194637