Generation model and discriminant model
Simple concept
Supervised learning goal is to learn to get a model by the model of a given output, to get a particular output, in order to predict the category of the data. This model can be called a classifier . This model corresponds to function normally or (in mathematical statistics, is a random variable , is the sample ).
For decision function type, to set a threshold value for judging.
For conditional probability distribution , since the calculation of the probability of belonging to all types, thus selecting the maximum probability. Complete discrimination.
Both associations:
In fact, the two are essentially the same.
When the network is used in the form of time, training time is used in the objective function enables the network output and real tags (commonly used coding) the closest , which is actually a Maximum Likelihood thought. For a given , the time to train the network so that the output label and true approach (to maximize the probability of its occurrence), is maximized or (where the likelihood is the occurrence of terms, which event occurred, its the corresponding probability would reach extremes.) Therefore, the output is actually here .
Of course, when the take time, it is only used as essentially functions. . . .
Generating method and identification method
Supervised learning method divided into generation (Generative approach) and identification methods (Discriminative approach). Model were generated model and discriminant model.
Discriminant model
Directly from the data obtained by learning a discriminant function ( or ). There are typical discriminant model: K neighbors, support vector machines, decision trees. . . Discriminant model focus only on how to classify (how the given data space mapping and distinguishing characteristics, to find the optimal classification surface). The main reaction model is the difference between the different categories. Discriminant model to predict the direct modeling, high efficiency, results were better.
Generation model
The joint probability density distribution data for learning , (a probability density function for sampling data to produce more of the data set), then the Bayesian formula is obtained as a prediction model, i.e., generate the model: . Generation model requires an infinite number of samples before they can reach a theoretical prediction, because for , need a lot of samples before they can make it more reliable. A typical model has generated naive Bayes, hidden Markov models. Generation model focuses on the data itself, rather than the discriminant model, focusing on the optimal classification interface. Generating a model may also be used with hidden layer model, in which case the discriminant model can not be used.
Generating a correspondence relationship model and the model is determined in the depth of the network
Depth network can simulate a lot of the probability distribution function.
Discriminant model classified network output is fitted . Parameters assume that the network is , the maximum likelihood based on the principle of training, the network is input , the output of the network is . Written mathematical expression is: .
Generating a model of the network is fitted , the joint probability density function, after recycling , to discriminate. Generation model here is a very narrow concept! ! ! (Because there is only learning in supervised learning, the model generates a situation to solve classification). In practice, generating model (Generative Model) is a concept of probability and statistics and machine learning, refers to a series of randomly generated can be used to model the observed data. Generation model has two basic functions, one is learning a probability distribution, namely density estimation problem, the second is to generate data. For the next supervised learning, the typical generation model are: Naive Bayesian methods, hidden Markov models, Gaussian mixture model. These models are of the direct modeling, Bayesian inference to obtain the final category data belongs. Generating a model in a broad sense, it is to model the data itself, for generating new data (GAN VAE and the like). E.g. VAE generated image is used in the form of hidden variables: . Monte Carlo approximation , is finally obtained , where from the sampling time. Generation model is used to generate the data, particularly for the generation of images, then this is reflected in where? If the network can be modeled, you get a , and this , then we can use the probability distribution function between sampling to obtain new data (note that this is not the time to join the label), this time we get a generation model .
Network modeling depth of probability : The above two are deep network of established probabilities, but note that the output of the network is not necessarily ) or . For example a gaussian time, may be a network output. (Modeled object appreciated that the network and the network outputs are not the same! Do not confuse !!) and the data network at the time of output, with the transfer logic, for example, the prior probability, after the formation of posterior probabilities and so on.
The following example uses the VAE to do some interpretation of the modeled probability of depth network:
VAE true posterior probability fitting is used to fit the MLP, the output of the network is the network modeling is , the output of the network is . The model here is called recognize. model. Half of the network is a P | modeling (X Z), the output of the network , and manually set to a small value, the probability distribution function of the final network modeling . Network is input , and is output by the probability of the preceding recognize sampled. (This can be understood directly from the sampling can). Thus, the final output network logic . If the result of the last sample only once, then (y due to the final result depends on the sampling of Z, therefore, is to Z sampled only once), then you will get . Because when the value of the very young, the output of the network is , when sampling a lot of samples is very close , it is considered that the output of the network is X. And no longer to be sampled (sampling at this time can not ?? Because the sample needs to know the specific expression ??). The final output of the network are approximate , network modeling is . These are the specific analysis. About VAE can see my next blog post.
I discussed there will be many problems, I hope you point out, I try to correct! ! ! !
references
[Base] generation machine learning model and discriminant model
Generation model and discriminant model
Machine learning "model determination," and "generative model" What is the difference?
Reproduced in: https: //juejin.im/post/5cfdc86751882562067bb09f