How to distinguish between generative model and discriminant model?

Generation model and discriminant model

Simple concept

Supervised learning goal is to learn to get a model by the model of a given output, to get a particular output, in order to predict the category of the data. This model can be called a classifier . This model corresponds to function normally Y=f(X)or P(Y|X)(in mathematical statistics, is a random variable x, is the sample X).

For decision function Y=f(X)type, to set a threshold value thresholdfor judging.

For conditional probability distribution P(Y|X), since the calculation of the probability of belonging to all types, thus selecting the maximum probability. Complete discrimination.

Both associations:

In fact, the two are essentially the same.

When the network is used Y=f(X)in the form of time, training time is used in MSEthe objective function enables the network output and real tags (commonly used one\_hotcoding) the closest Y, which is actually a Maximum Likelihood thought. For a given (X, Y), the time to train the network so that the output label and true Yapproach (to maximize the probability of its occurrence), is maximized P(Y|X)=1or P(Y|X)=0(where the likelihood is the occurrence of terms, which event occurred, its the corresponding probability would reach extremes.) Therefore, the output is actually here P(Y|X).

Of course, when the take P(Y|X)time, it is only used as essentially functions. . . .

Generating method and identification method

  Supervised learning method divided into generation (Generative approach) and identification methods (Discriminative approach). Model were generated model and discriminant model.

Discriminant model

  Directly from the data obtained by learning a discriminant function ( Y=f(X)or P(Y|X)). There are typical discriminant model: K neighbors, support vector machines, decision trees. . . Discriminant model focus only on how to classify (how the given data space mapping and distinguishing characteristics, to find the optimal classification surface). The main reaction model is the difference between the different categories. Discriminant model to predict the direct modeling, high efficiency, results were better.

Generation model

  The joint probability density distribution data for learning P(X,Y), (a probability density function for sampling data to produce more of the data set), then the Bayesian formula is obtained P(Y|X)as a prediction model, i.e., generate the model: P(Y|X)=P(X,Y)/P(X). Generation model requires an infinite number of samples before they can reach a theoretical prediction, because for P(X), need a lot of samples before they can make it more reliable. A typical model has generated naive Bayes, hidden Markov models. Generation model focuses on the data itself, rather than the discriminant model, focusing on the optimal classification interface. Generating a model may also be used with hidden layer model, in which case the discriminant model can not be used.

Generating a correspondence relationship model and the model is determined in the depth of the network

Depth network can simulate a lot of the probability distribution function.

Discriminant model   classified network output is fitted P(Y|X). Parameters assume that the network is \phi, the maximum likelihood based on the principle of training, the network is input X, the output of the network is P(Y|X). Written mathematical expression P (Y | X) = f_ \ phi (X)is: .

Generating a model of   the network is fitted P(X,Y), the joint probability density function, after recycling P(Y|X)=P(X,Y)/P(X), to discriminate. Generation model here is a very narrow concept! ! ! (Because there is only learning in supervised learning, the model generates a situation to solve classification). In practice, generating model (Generative Model) is a concept of probability and statistics and machine learning, refers to a series of randomly generated can be used to model the observed data. Generation model has two basic functions, one is learning a probability distribution, namely density estimation problem, the second is to generate data. For the next supervised learning, the typical generation model are: Naive Bayesian methods, hidden Markov models, Gaussian mixture model. These models are of the P(X,Y)direct modeling, Bayesian inference to obtain the final category data belongs. Generating a model in a broad sense, it is to model the data itself, for generating new data (GAN VAE and the like). E.g. VAE generated image is used in the form of hidden variables: P(X,Z)=P(Z)\times P(X|Z). Monte Carlo approximation E[f(X)]=\int f(x)p(x)dx \approx \frac{1}{S}\sum_{s=1}^{S}f(x_s), is finally obtained P(X) \approx P(X|Z), where WITHfrom the sampling time. Generation model is used to generate the data, particularly for the generation of images, then this is reflected in where? If the network can Xbe modeled, you get a P(X), and this P(X) \approx P_{gt}(X), then we can use the probability distribution function between sampling to obtain new data (note that this is not the time to join the label), this time we get a generation model P(X).

Network modeling depth of probability : The above two are deep network of established probabilities, but note that the output of the network is not necessarily P(X) or P(Y|X). For example P(Y|X)a gaussian time, may be a network (\ It \ sigma)output. (Modeled object appreciated that the network and the network outputs are not the same! Do not confuse !!) and the data network at the time of output, with the transfer logic, for example, the prior probability, after the formation of posterior probabilities and so on.

The following example uses the VAE to do some interpretation of the modeled probability of depth network:

VAE true posterior probability P(Z|X)fitting is used to fit the MLP, the output of the network is (\ Mu_1 \ sigma_1)the network modeling is Q (Z | X) = N (\ mu_1 \ sigma_1), the output of the network is X. The model here is called recognize. model. Half of the network is a P | modeling (X Z), the output of the network (\ Mu_2), and \sigma_2manually set to a small value, the probability distribution function of the final network modeling P (X | Z) = N (\ mu_2 \ sigma_2). Network is input WITH, and WITHis output by the probability of the preceding recognize sampled. (This can be understood directly from the P(Z)sampling can). Thus, the final output network logic P(Z)P(X|Z)=P(X,Z). If the result of the last sample only once, then (y due to the final result depends on the sampling of Z, therefore, is to Z sampled only once), then you will get P(X)\approx P(X|Z). Because when the \sigma_2value of the very young, the output of the network is \ mu_2, when sampling a lot of samples is very close \ mu_2, it is considered that the output of the network is X. And no longer to P(X)be sampled (sampling at this time can not ?? Because the sample needs to know the specific expression ??). The final output of the network are approximate X, network modeling is P(X|Z). These are the specific analysis. About VAE can see my next blog post.

I discussed there will be many problems, I hope you point out, I try to correct! ! ! !

references

[Comments] deep learning advanced models probabilistic graphical model / depth model generation / depth of intensive study, Fudan Qiuxi Peng teacher "neural networks and deep learning" course Share 05

[Base] generation machine learning model and discriminant model

Generation model and discriminant model

Machine learning "model determination," and "generative model" What is the difference?

Reproduced in: https: //juejin.im/post/5cfdc86751882562067bb09f

Guess you like

Origin blog.csdn.net/weixin_33724046/article/details/93182153