Review - generating vs discriminant model - and

Intuitive understanding

Generation model, the model of training, can be obtained p (x | y), which generated model is essentially through the old sample -> Models -> new samples , by some kind of probability distribution can generate their own samples, it is very powerful .

Discrimination model, it is the most we usually used only for determination, a new sample can not be generated.

case1: picture identification

With picture identification, for example, suppose we have two waves of kittens and puppies picture.

For models, the model generates a cat and dog model, when a new picture is a cat or dog will be able to know, at the same time because of their training, will try to remember each variety, all, this model , can also generate new pictures oh.

For discriminant model, as long as the two keep in mind the differences can be judged. The specific relationship Tun Town have time what it was like, oh.

case2: Analog height

Generating: \ (P (x, y) = P (y) P (x | y) \) may be generated from the y-x

Discrimination: \ (P (the y-| the X-) \)

Student Height:

Male: 170, 172, 174, 176, 168

Female: 160, 162, 164, 166

Then assumed to be Gaussian, that with the overall sample estimates can be obtained:

\ (P (height | M) ~ \ thicksim N (172, \ sigma) \)

\ (P (height | M) \ thicksim N (163, \ sigma) \)

When the probability distribution model has been generated, the sample can be sampled, newborn x

Of course, I think the visual point of view or from the equation, to generate a model \ (p (x, y) = p (x) p (y | x) \) and discriminant model \ (p (y | x) \) more a \ (p (x) \) items that generate a model not only do judgment, but also consider the distribution of samples , and discrimination is to distinguish.

In general, for classification, discriminant model> generated model, because discrimination is more focused. But when less data than when there may be generated model> discriminant model.

Generating a model \ (p (x) \) can be understood as a prior probability , that before with whole regularization are linked, i.e., Ll regular associated with the, L2 canonical distribution associated with the Gaussian. That is, each a priori, in a sense, equivalent to adding a regularization term in our model. This explains the case in a small amount of data, to generate the model p (x) plays a constrained regularization term can be reduced model of over-fitting .

Generation model applications: picture production, chatting robot, a robot writing code, writing poetry, writing songs ...

There undirected graph to vs

Also do review before a wave, from the calculation of the joint probability point of view

Directed graph

1 -> 2

3 -> 2

2 -> 4

5 -> 4

According to this dependency is, the conditional probability can be used to calculate the joint probabilities manner:

\(p(1,2, 3, 4, 5) = p(1) p(3) p(2|1, 3) p(5) p(4|2, 5)\)

Undirected graph

1 - 2

3 - 2

1-3

2 - 4

4 -5

This lack of detail probabilistic relationships, to seek ways by score or energy function, i.e. divided into a plurality of clusters (max clique), respectively, as calculated score:

\ (P (1, 2, 3, 4, 5) = \ frac {1} {z} \ phi_1 (1,2,3) \ phi_2 (2.4) \ phi_3 (4, 5) \)

Normalized to turn probability.

Also can do twenty-two wave (pairwise) division:

\ (\ Frac {1} {z} p (1, 2, 3, 4, 5) = \ phi_1 (1.3) \ phi_2 (1.2) \ phi_3 (2.3) \ phi_4 (2, 4 ) \ phi_5 (4, 5) \)

None that how to define the core of FIG function score, the score results of the different functions, are different because of

Chestnuts: Defining score

Consider an undirected graph, the triangle on the three vertices a, b, c

For more intuitive, it is assumed that a, b, c representatives of three different people, the demand is to look at the three old iron, how competitive basketball team play (probability 0-1)

Then with respect to a feature extraction

  • a, b, c is a whether INSTITUTE -> extraction features -> F1
  • a, b, c if hit different positions -> extraction features -> F2
  • a, b, c how often played with -> extraction features -> F3

Defined score (a, b, c), see the different models in fact, this can be for example

\(score(a, b, c) = w_1 f_1 + w_2 f_2 + w_3 f_3\)

Can also like CRF as defined in the model:

\(log \ score(a, b, c) = \sum \limits _i w_i f_i\)

This category is also referred to as a typical logistic regression log linear model, CRF and the like .

No it depends entirely on the chart how to define the score function , to train a different model.

  • Naive Beyes is a directed graph (joint / conditional probability), when converted to undirected graph , it becomes a Logistic Regression
  • HMM (Hidden Markov) is a directed graph, when the undirected graph is turned, it becomes a Linear - chain CRFs
  • Bayesian network -> no back -> General CRFs

For non-directed graph in fact no absolute division block method, which also depends on how you define, for example, I'm going to split into pairs of the form, the same figure above terms:

Oh no fixed manner, and depends entirely on our understanding assumptions. This is a process to extract feature process, where 1, 2, 3, 4, 5 can be seen as examples (sample) is not a characteristic variable.

Before Benpian is wanted by HMM -> CRF (Conditional Random Fields) This sequence generation model, but try a wave, I feel like I can not handle being deduced, this is mainly based on my little probability of weak, unlike the micro points and lines on behalf of the so handy to be ... ah .. for the time being skilled API Well, give yourself a pit, fill it again later used the ...

Guess you like

Origin www.cnblogs.com/chenjieyouge/p/12149314.html