Machine Learning-Whiteboard Derivation-Series (11) Notes: Gaussian Mixture Model


0 Notes

Derived from [Machine Learning] [Whiteboard Derivation Series] [Collection 1~23] , I will derive on paper with the master of UP when I study. The content of the blog is a secondary written arrangement of notes. According to my own learning needs, I may The necessary content will be added.

Note: This note is mainly for the convenience of future review and study, and it is indeed that I personally type one word and one formula by myself. If I encounter a complex formula, because I have not learned LaTeX, I will upload a handwritten picture instead (the phone camera may not take the picture Clear, but I will try my best to make the content completely visible), so I will mark the blog as [original], if you think it’s not appropriate, you can send me a private message, I will judge whether to set the blog to be visible only to you or something else based on your reply. Thank you!

This blog is the notes of (Series 11), and the corresponding videos are: [(Series 11) Gaussian Mixture Model 1-Model Introduction], [(Series 11) Gaussian Mixture Model 2-Maximum Likelihood], [( Series 11) Gaussian mixture model 3-EM solution-E-Step], [(Series 11) Gaussian mixture model 4-EM solution-M-Step].

The text starts below.


1 Model introduction

The Gaussian Mixture Model (GMM) refers to a linear combination of multiple Gaussian distribution functions and is a generative model.

From a geometric point of view : the Gaussian mixture model is a superposition of multiple Gaussian distribution functions. Suppose there are K Gaussian distributions. For the k-th Gaussian distribution, its form is N(μ kk ), where k=1,2,...,K, the PDF (probability density function) of the Gaussian mixture model , Probability density function) is P(X) as shown in the figure below, where α k in the figure is the weight of the k-th Gaussian distribution:
Insert picture description here
from the perspective of the mixture model : assuming there are N sample instances, X=(x 1 , x 2 ,…,x N ). Introduce the hidden variable Z=(z 1 ,z 2 ,…,z N ), for each sample x i , it corresponds to a z i , which is used to represent which one of the K Gaussian distributions x i belongs to, where i =1,2,...,N. Assuming that C k is the k-th Gaussian distribution, where k=1,2,...,K, when z i =C k , the probability P(z i =C k )=p k , where i=1,2,... ,N, if p mThe largest represents that x i belongs to the m-th Gaussian distribution C m . The following figure shows the distribution of z i . It can be seen that z i is a discrete random variable, where i=1,2,...,N. For z i , the parameters of the probability distribution are p=(p 1 ,p 2 ,…,p K ):
Insert picture description here
The probability density function P(X) of X can be written as (coincident with the first picture in this section) :
Insert picture description here
The probability diagram of the Gaussian mixture model is as follows:
Insert picture description here
N in the lower right corner of the above figure represents N sample instances.


2 Maximum likelihood

Assuming that there are N sample instances X=(x 1 ,x 2 ,…,x N ), X is called the observation data, and when i≠j, x i and x j are independent of each other, and the hidden variable Z=(z 1 ,z 2 ,…,z N ). (X,Z) is called complete data, x|z=C k~N(μ kk ), namely:
Insert picture description here
the parameters of the Gaussian mixture model θ={(p 1 ,p 2 ,…,p K ), (μ 12 ,…,μ K ),(Σ 12 ,…,Σ K )}. What if we use maximum likelihood estimation to find θ? As follows:
Insert picture description here
From the last line of observation, it can be seen that the logarithm in the objective function is in the form of continuous addition, so it is difficult or impossible to solve.

Next, use the EM algorithm to solve the parameter θ. For the EM algorithm, you can see here .


3 EM solution

3.1 E-Step

θ (t+1) is:
Insert picture description here
Let Q(θ,θ (t) ) be the objective function of the above figure, then Q(θ,θ (t) ) can be sorted out:
Insert picture description here
put the first term in the last line of the above figure Expansion: The
Insert picture description here
framed part is:
Insert picture description here
that is, the last framed part of the previous picture is equal to 1, then the first term of the formula in the last line of the previous picture is expanded to:
Insert picture description here
Then Q(θ,θ (t) ) For: And
Insert picture description here
because:
Insert picture description here
So Q(θ,θ (t) ) is:
Insert picture description here
write the framed part as p(z i |x i(t) ), and then continue to Q(θ,θ (t) ) Continue to simplify:
Insert picture description here
so θ (t+1) is:
Insert picture description here

3.2 M-Step

The parameters of the Gaussian mixture model θ={(p 1 ,p 2 ,…,p K ),(μ 12 ,…,μ K ),(Σ 12 ,…,Σ K )}, there are three Group parameters:

(1)p=(p1,p2,…,pK);

(2) μ = (μ 1 , μ 2 ,…, μ K ) ;

(3) Σ = (Σ 1 , Σ 2 ,…, Σ K )。

In the following, we will only exemplarily find p (t+1) = (p 1 (t+1) ,p 2 (t+1) ,...,p K (t+1) ). For p k (t+1) , Among them k=1,2,...,K, there are:
Insert picture description here
Among them:
Insert picture description here
this is a constrained optimization problem, the following uses the Lagrangian multiplier method to solve, first construct the Lagrangian function L(p k ,λ) as : Find the partial derivative
Insert picture description here
of L(p k ,λ) with respect to p k , and make the partial derivative equal to 0:
Insert picture description here
The two parts framed at the end of the above figure are both equal to 1, so there is:
Insert picture description here
So λ=-N, where N is the sample instance Quantity. Substitute λ=-N into the following formula: get
Insert picture description here
:
Insert picture description here
finally p k (t+1) is as follows, where k=1,2,...,K: the
Insert picture description here
solution to p (t+1) ends, where p (t+1) = (p 1 (t+1) ,p 2 (t+1) ,...,p K (t+1) ).


END

Guess you like

Origin blog.csdn.net/qq_40061206/article/details/113825016