Probabilistic graphical model (PGM)

Excerpt from all over the place, only if their own study notes, Wuguai, invasion deleted!

Bayesian probabilistic graphical models into network (Bayesian Network) and Markov random field (Markov Network) two categories. Bayesian network may be a directed graph structure, Markov random field can be represented as a directed graph without the network structure.
In more detail, probabilistic graphical model includes a naive Bayes model, maximum entropy, hidden Markov model, CRFs, themes models, machine learning many scenes have a wide range of applications.
Probabilistic graphical models

Probabilistic graphical models used in practice (including industry) is very broad and successful. Here are some examples. Hidden Markov Model (HMM) speech recognition model is a pillar, a Gaussian mixture model (GMM) and variants thereof is the most basic model K-means clustering of data, conditional random field (CRF) is widely used in natural language processing (e.g. speech tagging, named entity recognition), Ising model won the Nobel Prize, the topic model heavy use (such as Tencent's recommendation system) in the industrial sector and so on
 
A core task is digging machine learning from observed data implicit knowledge, probabilistic graphical model to achieve this task is a very elegant, principled means. PGM cleverly combines graph theory and probability theory.
  From the perspective of graph theory, PGM is a graph having nodes and edges. Nodes can be divided into two categories: the implicit node and observation nodes. Side may be directed or undirected.
  From the viewpoint of probability theory, PGM is a probability distribution, the node corresponds to FIG random variable, corresponding to the edges of the random variable dependency relation or correlation.
  Given a real problem, we usually observe some of the data, knowledge and hope to be able to dig out hidden in the data. How to achieve it with PGM? We construct a diagram showing the observed data with the observation nodes represent potential knowledge with hidden nodes, with each side to describe the relationship between knowledge and data, and finally get a probability distribution. After a given probability distribution, by performing two tasks: inference (given observation nodes infer the hidden nodes posterior distribution) and learning (learning parameters of the probability distribution), to acquire knowledge. The power of PGM that, no matter how complex data and knowledge, our data processing means is the same: to build a map, define a probability distribution, performed inference and learning. This description of complex practical problems, to build large-scale artificial intelligence systems, it is very important.
 

FIG probability model, the data (sample) by the equation  [official] represented Modeling:

  • [official] Indicate nodes, i.e., the random variable (in here, may be a token or a label), in particular, with  [official] a random variable modeling, note  [official] now represent the number of random variables (corresponding to an imaginary Sequence, it contains many the token),  [official] the distribution of the random variable;
  • [official] Represent the edges, that is, the probability of dependence. Specifically ye understood or interpreted to the specific binding of CRF graph HMM or later.

Probabilistic graphical model can be divided into two types: directed and non-directed graph.

There vs undirected graph directed graph

The figure can be seen, Bayesian networks (belief networks) are directed and undirected Markov network. Therefore, there is a Bayesian network for modeling the one-way data dependent, Markov network between the entities for interdependent model. In particular, the performance of their core differences in how to find  [official] , that represents how  [official] the joint probability.

 Directed graph

For a directed graph model, so seek joint probability: [official]

For example, for the following random variables that have to graph (note that this figure I drew is quite broad):

This should represent their joint probability:

[official]

It should be well understood.

Undirected graph

For undirected graphs, I think in general it refers to the information Markov network (note that this figure I drew is relatively broad).

If a graph is too large, it can be broken down by a factor of the  [official] product written into a number of joint probabilities. Ye decomposing it, is divided into a plurality of FIG. "Small group", note that each group must be the "maximum group" (which is attached to any two points of a specific ...... Well no explanation is the largest connected sub- Figure), there are:

 

         [official]

 

Which  [official] formula should not be difficult to understand it, is to allow the normalization of results counted probability.

Therefore, like the above undirected graph:

[official]

Which  [official] is one of the largest group of  [official] joint probability random variables are on, and generally exponential function:

[official]

Well, this thing called the tube 势函数. Note that  [official] if there are to see the shadow of CRF.

The probability that the joint probability distribution may be represented in FIG factorization is:

[official]

Note that the understanding here is quite important to note recursive process, knock on the blackboard, which is CRF beginning!

 

Guess you like

Origin www.cnblogs.com/daguonice/p/11418529.html