Variable polynomial (polynomial distribution, Dirichlet distribution)

Probability distribution (B)

Variable polynomials

It represents possible values of binary variables only in the two possible values, if \ (K \) mutually exclusive state, it can be \ (1-of-K \ ) notation.

Take \ (. 6 K = \) , then \ (\ boldsymbol {x}, \ boldsymbol {\ mu} \) can be expressed as:
\ [\ boldsymbol {X} = (0,0,1,0,0,0 ) ^ T \\ \ boldsymbol {\
mu} = (\ mu_1, ..., \ mu_k) ^ T \] At this time, \ (\ boldsymbol {x} \ ) distribution of ( \ (x_k \) represents \ ( \ boldsymbol {x} \) first \ \ (K) item):
\ [P (\ boldsymbol {X} | \ boldsymbol {\ MU}) = \ prod_k \ mu_k x_k ^ {} \]
which can be considered Bernoulli distribution is for the promotion of a plurality of outputs, and this distribution is normalized:
\ [\ sum_kp (\ boldsymbol {X} | \ boldsymbol {\ MU}) = \ sum_k \ \\ mu_k. 1 = E ( \ boldsymbol {x} | \ boldsymbol
{\ mu}) = \ sum_xp (\ boldsymbol {x} | \ boldsymbol {\ mu}) \ boldsymbol {x} = \ boldsymbol {\ mu} \] when there are N independent observations value \ (\ boldsymbol {x} _1 , ..., \ boldsymbol {x} _N \) data set \ (D \) , the corresponding likelihood function is:
\ [P (D | \ boldsymbol {\ mu}) = \ prod_n \ prod_k \ mu_k ^ {x_ {nk}} = \ prod_k \ mu_k ^ {\ sum x_ {nk}} = \ prod_k \ mu_k ^ {m_k} \]
wherein \ (M_k = \ NK SUM X_ {} \) , the observation indicates \ (x_k = 1 \) number, this profile is called the sufficient statistics (sufficient statistics).

Because of restrictions \ (\ SUM \ mu_k. 1 = \) , so Lagrangian method seeking the maximum likelihood function to use when (determined \ (\ mu_k \) offspring in the limiting conditions to obtain \ (\ the lambda \) ):
\ [\ max \ sum_k M_k \ LN \ mu_k + \ the lambda (\ sum_k \ mu_k-. 1) \\ \ mu_ {K} = - \ FRAC {M_k} {\ the lambda}, \ \ \ \ the lambda = -N \\ \ mu_k ^ {MLE
} = \ frac {m_k} {N} \] essentially the same as Bernoulli distribution.

Polynomial distribution

Consider \ (m_1, ..., m_k \ ) parameter \ (\ boldsymbol {\ mu} \) Number of observation and \ (N \) joint distribution under the condition that the multinomial distribution:
\ [the Mult (m_1 ,. ..m_k) | \ boldsymbol {\ MU}, N) = \ dbinom {N} {\ prod_k M_k} \ prod_k \ mu_k M_k ^ {} \]
\ (\ dbinom {N} {\ prod_k M_k} \) represents object into the same N \ (K \) groups size \ (M_k \) number of programs.
\ [\ Dbinom {N} {
\ prod_k m_k} = \ frac {! N} {! M_1 ... m_K!} \\ \] satisfies the condition: \ (\ N = M_k sum_k \) .

Dirichlet distribution (Dirichlet distribution)

Dirichlet distribution is the distribution of polynomial conjugate prior distribution, Beta distribution will also be extended to form a high-dimensional space.

Multinomial distribution observed for the conjugate prior found:
\ [P (\ boldsymbol {\ MU} | \ boldsymbol {\ Alpha}) \ propto \ prod_. 1} ^ {K = K \ mu_k ^ {\-alpha_k. 1} \]
\ (\ boldsymbol {\ alpha} \) for the distribution parameters for the \ ((\ alpha_1, ..., \ alpha_K) ^ T \) , \ (\ mu_k \ in [0,1] \) , and, and 1. Since the sum of limits, \ ({\ mu_k} \) distribution space is limited dimensional simplex in K-1.

Dirichlet distribution normalized form:
\ [the Dir (\ boldsymbol {\ MU} | \ boldsymbol {\ Alpha}) = \ {FRAC \ the Gamma (\ alpha_0)} {\ the Gamma (\ alpha_1) ... \ Gamma (\ alpha_k)} \
prod_ {k = 1} ^ K \ mu_k ^ {\ alpha_k-1} \] where \ (\ alpha_0 = \ sum_k \ alpha_k \) .

We like the multinomial distribution as obtained by multiplying the a priori likelihood function posterior distribution:
\ [P (\ boldsymbol {\ MU} | D, \ boldsymbol {\ Alpha}) \ propto P (D | \ boldsymbol {\ MU }) p (\ boldsymbol {\
mu} | \ boldsymbol {\ alpha}) \ propto \ prod_k \ mu_k ^ {\ alpha_k + m_k-1} \] can be seen that the posterior distribution of the same form and prior Dirichlet the distribution of the conjugate prior indeed the multinomial distribution. After determining the normalization coefficient:
\ [P (\ boldsymbol {\ MU} | D, \ boldsymbol {\ Alpha}) = \ FRAC {\ the Gamma (\ alpha_0 + N)} {\ the Gamma (\ alpha_1 + m_1). .. \ Gamma (\ alpha_k + m_k )} \ prod_k \ mu_k ^ {\ alpha_k + m_k-1} \]

Gaussian distribution

Gaussian distribution due to excessive content, writing another chapter.

reference

Dirichlet distribution

Guess you like

Origin www.cnblogs.com/LvBaiYang/p/12236935.html