Li Hang "statistical learning methods," the study notes - Chapter 4: Naive Bayes method

Copyright notice: reproduced please indicate the source and marked "AI algorithms Wuhan study" https://blog.csdn.net/qq_36931982/article/details/91360346

"Li Air" statistical learning methods, "the study notes" series of tutorials to Li Hang teacher "statistical learning" as the basis, mainly including series of notes I understand summary of the learning process and knowledge of the principles in the book focus algorithm.

Due to limited capacity, lack of support please correct me a lot, you have any ideas are welcome leave comments!

More about my study notes, welcome your interest in " Wuhan AI algorithm study " Public number!

This article is divided into three parts " [understanding for Naive Bayesian method] ," " [Naive Bayes algorithm theory] ", " [application] on text categorization " to expand the total reading time of about 10 minutes.

 

[Understanding] for naive Bayes method

1, that is simple, "simple", simple and naive Bayes method embodied in the independence assumption of each condition , the introduction of independent assumptions, parameters greatly reduce the hypothesis space, sometimes at the expense of a certain classification accuracy;

2, naive Bayes algorithm theory Bayesian formula, from simple Bayesian formula can start with the historical experience, predict the future ;

3, naive Bayes classifier: for a given input x , by the learned model calculations posterior probability  P ( the Y = C (K)  | X- = x) , the maximum value of the posterior probability of class as x Class output ;

 

Naive Bayes algorithm [principle]

1, formula derivation

Naive Bayes theory is a Bayes formula, the Bayesian formula is defined as follows (X: TABLE Characteristics; Y: category table), the known characteristics of the sample determined categories relevant to the case can be converted based on the historical statistics " and sample type known feature "for calculation .

Wherein X represents, in general, characteristic property values ​​are numerous, so are:

Naive Bayes thought: simplicity defined in the case of determining the classification of the characteristic attributes are independent between the respective mutually uncorrelated, so there are:

Naive Bayes classifier: In the case of the known history of the sample is calculated by Bayes' formula for the unknown sample was more likely probability value category, and can determine the final classification, so there are:

Because for each sample, as the denominator of the formula, to simplify the calculation, so there are:

2, Naive Bayesian parameter estimation

Know naive Bayes formula, mainly on the prior probability and the conditional probability formula is calculated, which is calculated into "maximum likelihood estimation" and "Bayesian estimation"

2.1, maximum likelihood estimation

  Estimate (estimating) is a means, a means of model parameters speculation, is to use the training data to model calibration. Prior to calibration, you first need to have a model. Model parameters need to be recognized. Maximum likelihood estimation is a phenomenon ---> Process principles.

Maximum likelihood estimation formula below:

2.2, Bayesian estimation

[Text classification application]

Background: Text classification is widely used, such as junk mail and SMS filtering, we need to establish a classification model, the text classification. The Bayesian model using text classification: By a document d, determine Ck category to which it belongs, in fact, is to calculate the probability that the value of the document d in each category, whichever is the largest category for the final classification results . Sample Characteristics classification process required for text type, the most intuitive is the " characteristic words " in a document d can be classified as part of the feature words <t1, t2, ..., tndthese features words can directly determine the final classification results of our articles.

Methods Introduction:

According naive Bayes classifier formula:

Get text classifier for the formula: formulas

As the actual calculation process, the probability P (tj | ck) of the product even easier to cause an overflow at 0, and therefore is converted to logarithm, for even add operation:

We need the final text by the sample data, calculated:

 

references:

Li Hang statistical learning methods. Tsinghua University Press

Naive Bayesian classification algorithm Principles and Practice

 

Guess you like

Origin blog.csdn.net/qq_36931982/article/details/91360346