-EM machine learning algorithm notes

EM algorithm, also known as expectation maximization (Expectation-Maximum, referred to as EM) algorithm, which is a basic algorithm is the basis for many areas of machine learning algorithms, variational inference algorithm such as hidden Markov (HMM), LDA topic model , Gaussian mixture model GMM, pLSA model based on probability statistics.

Overview of the EM algorithm ( original )

    We often observe from the sample data, find the model parameters of the sample. The most common method is logarithmic likelihood function maximization model distribution.

    But in some cases, observations have implied we get the data was not observed at this time we have unknown underlying data and model parameters and therefore can not directly get the likelihood function parameter model for the distribution of a great number of . Then you can use the EM algorithm.

    EM algorithm to solve this idea is to use a heuristic iterative methods, since we can not be obtained directly distributed parameter model, then we can first guess implicit data (E step of the EM algorithm), and then the implicit data based on observed data and speculation work together to maximize the log-likelihood, we solve the model parameters (M step of the EM algorithm). The hidden data before us is to guess, so the model parameters obtained at this time is generally not the result we want. But never mind, we get the parameters of the current model based on continued speculation that the implicit data (E step of the EM algorithm), and then continue to maximize the log-likelihood, we solve the model parameters (M step of the EM algorithm). And so on, continuous iteration continues until the distributed parameter model essentially unchanged, the algorithm converges, find the right model parameters.

    As can be seen from the above description, EM algorithm is an iterative algorithm for solving the maximum, while the algorithm is divided into two steps, E-step and M-step at each iteration. A round of iteration and update data distributed parameter model hidden until convergence, to obtain the model parameters we need.

    One of the most intuitive understanding of the idea of the EM algorithm is K-Means algorithm. When K-Means clustering, each cluster cluster centroid is implicit data. We assume K initialization centroid, i.e., step of the EM algorithm E; then calculated for each sample nearest centroid and the cluster of this sample to the nearest centroid, i.e., M-step of the EM algorithm. Repeat this step E and M steps, until the center of mass does not change, thus completing the K-Means clustering.

    Of course, K-Means algorithm is relatively simple, practical problems are often not so simple. Description on the face of the EM algorithm is still very rough, we need precise mathematical description language.

Derivation of the EM algorithm

    For m sample observed data X = ( X ( . 1 ) , X ( 2 ) , . . . X ( m ) ) , the sample to find the model p (x, z) the parameter [theta] , the log-likelihood function is as follows:

                  

 

    There is an implicit model function z = ( z ( . 1 ) , z ( 2 ) , . . . Z ( m ) ) random variable, is not convenient directly to parameter estimation, so this time the introduction of Q (the Z (I) ), note that not just introduced a function, this function is a distribution of Z and Q (Z (i) ) ≥0, then in order to allow the original log-likelihood function is the same, we can make (2) conversion formula is as follows:

        

(3) using the Jensen's inequality step.

    Jensen's inequality: if f is a convex function:

    

 For the first (3) in step inequality, we are equal to meet the required conditions are satisfied:

      

c is a constant, and because:

 

It can be introduced:

如果Qi(z(i))=P(z(i)|x(i);θ)), 则第(3)式是我们的包含隐藏数据的对数似然的一个下界。如果我们能极大化这个下界,则也在尝试极大化我们的对数似然。即我们需要最大化下式:

    

去掉上式中为常数的部分,则我们需要极大化的对数似然下界为:

 

 上式也就是我们的EM算法的M步,那E步呢?注意到上式中Qi(z(i))是一个分布,因此可以理解为logP(x(i),z(i);θ)基于条件概率分布Qi(z(i))的期望。

EM算法的流程总结:

EM算法的流程

    输入:观察数据x=(x(1),x(2),...x(m)),联合分布p(x,z;θ), 条件分布p(z|x;θ), 最大迭代次数J。

    1) 随机初始化模型参数θ的初值θ0

    2) for j  from 1 to J开始EM算法迭代:

      a) E步:计算联合分布的条件概率期望:

        Qi(z(i))=P(z(i)|x(i),θj)

        

      b) M步:极大化L(θ,θj),得到θj+1:

         

      c) 如果θj+1已收敛,则算法结束。否则继续回到步骤a)进行E步迭代。

    输出:模型参数θ

关于EM算法的收敛性可以参照此博文的EM算法的收敛性思考

补充:

如果我们从算法思想的角度来思考EM算法,我们可以发现我们的算法里已知的是观察数据,未知的是隐含数据和模型参数,在E步,我们所做的事情是固定模型参数的值,优化隐含数据的分布,而在M步,我们所做的事情是固定隐含数据分布,优化模型参数的值。

    

Guess you like

Origin www.cnblogs.com/yang901112/p/11621101.html