Machine Learning - Naive Bayes and Bayesian

Bayesian decision criteria: minimizing the overall risk, the risk of conditions can simply select R (c | x) in each sample the smallest category tags

 

First, the maximum likelihood estimate

1. Estimate common strategy class : first assumed to have some form of probability distributions determined, then the probability distribution of the parameters are estimated based on training samples. That training process is probabilistic model parameter estimation process.

2. Parameter Estimation of Two Schools : frequency and Bayesian school of thought.

(1) Frequency doctrine : Although unknown parameters, but it is an objective of the fixed value, therefore, may wish to determine the parameter values and the like by optimizing criterion likelihood function (maximum likelihood).

(2) Bayesian : the parameter is a random variable is not observed, there may itself be distributed, thus, the parameters may be subject to a prior distribution is assumed, the posterior distribution is then calculated based on the observed parameters of the data.

Second, naive Bayes

(1) idea: For a given item to be classified x, the posterior probability distribution calculated by the model learned, namely: the probability of occurrence of this condition in the emergence of various target categories, the largest posterior probability of class as x It belongs to the category. The posterior probability is calculated based on Bayes' theorem. 

(2) key: To avoid facing a combination of Bayes' theorem when solving explosion, sparse sample problem, the introduction of conditional independence assumptions. Independent of each other which assumes various features

(3) How it works: 

Bayesian formula: 

  

To do a conditional probability conditional independence assumptions, the formula is:

 

 

 

(4) workflow:

1) preparation phase: determining a characteristic attribute, and wherein for each attribute appropriately divided, and then classified by a classifier to be a part of the artificial, form training samples.

2) training phase: calculated for each category in the sample frequency of occurrence p (y), and calculates the probability of each feature attribute divided p (yi conditions for each category | X);

3) application phase : classification using a classifier, the classifier is an input sample to be classified and the output is a sample belonging to a category of classification. In the attribute condition independence assumption, d: the number of attributes, xi is the x value at the i-th property.

Fourth, example

Now to our problem is that if a couple friends, boys and girls want to marry him, features four boys are not handsome, good character, tall and short, not motivated, you determine what is a girl to marry or not to marry?

This is a typical two-class problem, according to solving naive Bayes, converted to a P (married | not handsome, character is not good, short, not motivated) and P (not married | not handsome, good character, short, not motivated) probability, chose not to marry and marry answer.

Here we According to Bates
formula: ----------------
Disclaimer: This article is the original article CSDN blogger "baidu-liuming", and follow CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
Original link: https: //blog.csdn.net/fisherming/article/details/79509025

 

 

 

 

 

Fifth, the interview content

1 , Bayes classifier and Bayesian learning different

The former : a single point estimated by the maximum a posteriori;

The latter : distributed estimation. 

2, the posterior probability maximization significance?   

<==> expected risk minimization (only required for each item individually minimized)

 

3, naive Bayes important considerations? 

(1) given feature vector length may be different, it is necessary through a normalized vector length (here, an example in text categorization) into, for example, a sentence of words, then the length of the entire length of the vocabulary, the corresponding position is the number of times the word appears. (2) calculated points: 

 

4, the classic question: Navie Bayes and Logistic regression What is the difference? The former is model formula, which is a discriminant model, the difference between the two is the difference between the model and the discriminant formula model. 

1) First, Navie Bayes known samples obtained by the prior probability P (Y), and the conditional probability P (X | Y), for a given instance, the joint probability calculation, and then the posterior probability is obtained. In other words, it tries to look for in the end is how this data is generated (produced), and then classified. Which categories are most likely to produce this signal, it belongs to that category. 

Advantages: increase the sample size, faster convergence; also be applied when the hidden variables exist. 

Disadvantages: a long time; needs samples and more; waste of computing resources 2) In contrast, Logistic regression probability characteristics of the sample without care category and the proportion of the category, which gives direct expression prediction model. Each feature has a set weight training sample data updated weights w, get the final expression. Gradient method. 

Advantages: direct prediction is often higher accuracy; simplification; feature may be reacted difference distribution data, category; for many types of identification. 

Cons: Slow convergence; does not apply to cases of hidden variables.

Guess you like

Origin www.cnblogs.com/StarZhai/p/12212751.html