"Machine learning" Watermelon Chapter 7 Bayesian classifier

Prior probability : based on existing knowledge of the driver estimated probability event, but does not consider any relevant factors.

The posterior probability: random events the probability estimates based on existing knowledge, taking into account relevant factors P (c | X).

 

7.1 Bayesian decision theory

Bayesian decision theory is the basic method of decision-making under probabilistic framework. Bayesian decision theory to consider how to choose the best category tags based on these probabilities and false losses.

In the sample x on "conditional risk":

 

 Our task is to find a decision criterion h: to minimize the overall risk  

 

Bayesian decision criteria: In order to minimize the overall risk, which can simply select the conditional risk R (c | x) on each sample smallest category tags.

 

h * called Bayes optimal classifier, the corresponding R & lt overall risk (h *) ​​is called Bayes risk. 1-R (h *) ​​reflect the best classification performance can be achieved.

Minimizing the classification error rate for the Bayes optimal classifier  i.e. for each sample x, can select the posterior probability P (c | x) is the largest category tags.

 Machine learning to be achieved is the posterior probability estimate accurately as possible based on limited training samples P (c | x) Broadly speaking, there are two main strategies: Given x, by direct modeling P (c | x) C is predicted, such that the resulting " discriminant model "; may be first distribution P (x, c) of the joint probability model, and then the thus obtained P (c | x), obtained is such " generative model ." Decision tree, BP neural networks, support vector machine as the discriminant model.

Of the formula with a model based on Bayes' theorem, P (c | x) can be written as ;

 

Class priori probability P (c) expressing the ratio of the sample occupied by all kinds of sample space, according to the law of large numbers, when the exercise machine contains enough samples independently and identically distributed, P (c) may occur through various samples frequency estimate.

7.2 Maximum Likelihood Estimation

Estimate the class conditional probability: first assumed to have some form of probability distributions determined, then the probability distribution of the parameters are estimated based on training samples. Training process is probabilistic model parameter estimation process.

Statistics in the two schools: School frequencies and Bayesian

Frequency school: emphasis on the probability of "objectivity", the probability of objective randomness. Fixed parameter model, randomized. Observers believe that the information obtained is the same. We consider the frequency of events in the trial should be repeated as the estimated probability of its occurrence. That parameter Although unknown, but it is an objective determination value, the parameter value may be determined by the likelihood function optimization criterion and the like;

Bayesians: emphasis on subjective probability that the conditional probability. Fixed sample, as a key model parameters. Different observers believe that the information obtained is not the same. We consider the probability of objectivity happened only because the viewer does not know the outcome of the event. Randomness is not the root cause of the event, but rather an observer state of knowledge of the event. The school believes that the root of the frequency of random events that has nothing to do with the viewer. That the parameter is not observed random variables , which itself may have a distribution, it may be assumed parameters obey a prior distribution, the posterior distribution is then calculated based on the parameters observed data.

Maximum likelihood estimation frequency --------- school of thought

DC ---- training set of class c set D consisting of a sample, assumed that these samples independently and identically distributed, then the parameter [theta] c of the data set D c likelihood is: , the log-likelihood:

 

For example, in the continuous case, we assume that the probability density function of the maximum likelihood estimation of the parameters are:

 

 In other words, get through the normal maximum likelihood mean that the sample mean, variance is the mean.

 

7.3 Naive Bayes classifier

Simple: property conditional independence

Class conditional probability P (c | x) is the joint probability of all of the properties, it is difficult from a limited training samples obtained estimated directly.

 

Guess you like

Origin www.cnblogs.com/ttzz/p/11583740.html