Naive Bayes (NB)

Category technical knowledge:

Definition: Given an object X, to be divided into a certain pre-defined category of Yi.

- Input: X

- Output: Y (the value in a finite set {y1, y2, y3 .... yn})

Application: mail is spam, patient classification, click on the effectiveness and so on.

Popular understanding:

Here is an article input object X, then in the end military or Finance is Y.

Common classification binary classification (men and women) and multi-value classification (Article Categories {politics, sports, science fiction})

 

Classification task solving process:

categories of news

A: Characteristics Category: X = {yesterday, is that the market ...} feature classification premise is the Chinese word]

Two: Feature selection: X = {domestic and foreign ...} to the results after the Chinese word a keyword extraction].

Three: Model selection: naive Bayes classifier classification model [select]

Four: Training data preparation:

Five: model training:

Six: prediction (classification):

Seven: Evaluation: Evaluation results obtained

 

Common classification:

- Probability selector

  --NB

  - calculation of the object to be selected probability of belonging to each class, select the most probable class is used as an output

- space division

  --SVM: SVM follows: Sample disadvantage unsuitable excessive

 

There are four different categories on FIG sample, RBI in space, after a number of lines obtained by learning, such as the blue part, by several lines, can be clearly separated to a different sample, this is equivalent to the a two-dimensional space, marking the samples were split, this algorithm is space segmentation algorithm class, SVM is the case.

 

Earlier said a lot now enter today's topic, learn common classification algorithm naive Bayes classifier.

Three naive Bayes classifier:

official:

P (| X) = P () P (X |) / P (X)

Formula dismantling process:

yi refers to a particular classification

Y = {military, finance, sports}

X = article

xi = a specific article in a word

P (yi | X): Given an article, a probability value belongs to a category of

P (yi): prior probability

Give you 100 articles, of which 50 are military, 30 finance, 20 sports

P (y = military) = 50/100 

P (y = FINANCE) = 30/100

P (y = PE) = 20/100

 

P (X): = probability of this article is a fixed value, can be ignored, there will be a certain probability that article 1.

 因此: P (| X) = P () P (X |) / P (X)

Further abbreviated as: P (yi | X) ≈ P (yi) P (X | yi)

P (X | yi): y for specified categories, the probability of occurrence of X

P (xi | yi): y for specified categories, the probability of occurrence of the word x

y = military, x = warships military this category a total number of words in which the number of warships appeared What is the probability warship is the emergence of military articles.

X = {ships, guns, aircraft}

P (X | y = military) = P (x = warship | y = military) * P (x = cannon | y = military) * P (x = aircraft carrier | y = military)

Premise: iid = "naive Bayes simplify complex issues []

 

P (yi | X) ≈ P (yi) P (X | yi) == final Naive Bayes

For each label are seeking correspondence probability, the largest for the classification.

 

Naive Bayesian derivation:

Naive Bayes formula can be derived from the conditional probability formula, the following is a specific process of derivation:

1.P(X|Y) = P(X,Y)/P(Y)

2.P(X,Y)=P(X|Y)*P(Y)

3.P(X,Y)=P(Y,X)

4.P(Y,X)=P(Y|X)*P(X)

5.P(X,Y)=P(Y|X)*P(X) 

The 5 P (X, Y) obtained into a 6:

6.P(X|Y) = P(Y|X)*P(X)/P(Y)

The final Naive Bayes formula:

P (| X) = P () P (X |) / P (X)

 

Five: [] parameter estimation and model training

Training and implementation model parameter estimation strategy used is: maximum likelihood estimation

Maximum likelihood estimation: the probability behind the rendering behavior by off-line, including the characteristics, the greater the amount of training data, the more accurate the resulting parameters,

Closer to reality. Too little training data, a label which beat the article're wrong, it is the noise, easy to be disturbed.

 

Conditional probability formula:
P (XJ | yi) = P (XJ, yi) / the p-(yi)
P (XJ | yi) are not directly calculated and then calculated into the maximum likelihood estimation to solve:

 

NB To complete the classification problem, we need to support Class 2 parameters

1, the prior probability P (yi)

2, the conditional probability P (X | yi)

Parameters need to be calculated, where parameter is the model

Prior probability:
    P (Yi) calculated exactly

Conditional probability:
    Molecular: military articles with "Google" the word of the number of
    denominator: the number of military articles in all the words of
    p (x = "Google" | y = "Military"): numerator / denominator
    

| (Yi xj) There are two ways P:
The first is the number of the article, the second is the number of words
first:
    the number of articles military articles with "Google" the word: Molecular
    denominator: The number of military articles
    p (x = "Google" | y = "military"): numerator / denominator

The second: This is recommended
    molecules: military articles with "Google" the word of the number of
    denominator: the number of military articles in all the words of
    p (x = "Google" | y = "Military"): Numerator and denominator

Naive Bayes get through a bunch of models, this model is a good effect, the essence of the model is a pile of probability.

 

Seven evaluation model to evaluate the problem

Need to use the confusion matrix (or confusion table) to do the evaluation of the effect of:

General assessment of the positive samples, confusion table for evaluation only for binary, if the multi-classification can specify only one category, and the rest is other categories.

Angle model is based:

 

Accuracy Accuracy: (50 + 35) / (35 + 5 + 10 + 50) [number of correctly predicted sample divided by the total number]

Precision ratio Precision (y1): 50 / (50 + 5) = 90.9%

Recall Recall (y1): 50 / (50 + 10) = 83.3%

Accuracy rate: In the beginning I do not know the article label, the model I predicted 55 100 articles from the military, but only 50 military rate is accurate, the model is 50/55.

Recall: is there 60 military, model judged by 50, but there are 10 articles have not been judged model, if the model validation, then there is the case of article 10 missing, the missing article is lost.

 

Recommended system which: a high-quality article 100, which only entered the candidate queue 90, 10 is also displaced state, is not recalled

In practical applications, the stage is not the same in the most different requirements: Recommended system focused on NoSQL database index to recall, in ordering the model focuses on the precise rate.

By correctly, recall, can be obtained a PR curve [P: correct rate R: recall], PR curve is generally high accuracy, the recall rate, recall rate, low accuracy.

 

PR curve is an evaluation index, used to help you to choose the threshold, then we look ROC curve evaluation indicators

 

Vertical axis: true positive, recall, TP / (TP + FN) and the horizontal axis: false positive rate FP / (FP + FN)

So what's the use ROC curve, in fact, in order to obtain AUC ROC curve is a graph that is the area under the ROC curve,

 

 

And finally in order to obtain AUC is the area under the ROC curve ROC is not a real purpose, AUC.

AUC is area 1, the area under the ROC curve must be decimal between 0-1.

Meaning ROC curve is to get AUC.

But such calculations too much trouble.

 

Another interpretation AUC method: negative samples ahead of the sample positive probability. Be solved by way of awk:

cat auc.raw | sort -t$'\t' -k2g |awk -F'\t' '($1==-1){++x;a+=y;}($1==1){++y;}END{print 1.0-a/(x*y);}'

x * y is a pair of positive and negative samples

a number representative of the error

The probability of a / x * y errors

1-a / x * y probability of being correct

Example 1:

A 0.1  B 0.9

: Assumptions: scoring between any two samples should be sorted in the order of go, if ranked in that order, that is right, row backwards, or else is wrong

(A, B) correctly

(B, A) Error

If all the negative samples all came in front of the positive samples, the AUC was 100%

If the model hit points is relatively small, the model considers the sample is negative cases, if true negative cases, it should be the top surface of the probability of fishes.

 

 

Example 2:

Military representatives of the military +1 +1

Finance -1-1 representatives of Finance

1. A sample to be true to the label: Military themes [+1] article, then its scoring model,

2. predicted by the model of the article, the predicted time will score, as predicted scores

3. predicted predicted scores by the number of predicted labels are: military

4. forecasts and real description of a military model to predict success

5. If the forecast for the Finance, then the model hit're wrong

6. Real label +1, -1 tag prediction 

7. scoring model to your article, the higher the score close to +1 or close to -1, so there is a greater than a threshold value is a positive example above, otherwise it is a negative example.

8. prediction tag by the control threshold value, the threshold value is manually set. If the default is 0 positive example for all models of the article hit the fraction is greater than 0 and less than or equal to 0 as a negative example.

 

It stands to reason the model prediction is correct, the front is a great probability -1,

The perfect case, all of the model prediction is correct, all the foregoing is -1, +1 behind all

Example 2:

It does not prove that the probability is greater than 1:

100 samples

In extreme cases: the front 50 +1, -1 followed by 50

Total: 50 * 50 = 2500 pair of

y = 50

x=50

a=50*50=2500

Final: 1-a / (x * y) is the probability of 0, i.e., 0 is the worst case AUC, AUC but generally the worst result at this time is 0.5 [AUC] without any ability to distinguish

AUC only for dichotomous: multiple categories, each category only do a AUC, respectively.

Guess you like

Origin www.cnblogs.com/chen8023miss/p/11302807.html