Li Hang "statistical learning methods," the study notes - Chapter IX: EM algorithm and its promotion (Gaussian mixture model)

Copyright notice: reproduced please indicate the source and marked "AI algorithms Wuhan study" https://blog.csdn.net/qq_36931982/article/details/91345524

"Li Air" statistical learning methods, "the study notes" series of tutorials to Li Hang teacher "statistical learning" as the basis, mainly including series of notes I understand summary of the learning process and knowledge of the principles in the book focus algorithm.

Due to limited capacity, lack of support please correct me a lot, you have any ideas are welcome leave comments!

About my family more study notes, welcome your interest in " Wuhan AI algorithm study " Public number!

This article is divided into three parts " [understanding for Naive Bayesian method] ," " [Naive Bayes algorithm theory] ", " [application] on text categorization " to expand the total reading time of about 10 minutes.

[Understanding] for the Gaussian mixture model

1, is still a probability model, a Gaussian mixture model is a simple Gaussian model extension, using the GMM combination of a plurality of Gaussian distribution to characterize the distribution of data , each Gaussian model to represent a class (the Cluster a), the sample data respectively over several Gaussian model projection, the respective class probability respectively;

2, all the Gaussian components of the right weight coefficient is greater than zero, and is 1 ;

3, the Gaussian mixture model parameter θ to be solved include the Gaussian distribution of each component of the mean and variance of each component of weight ;

4, when the initialization GMM, generally passing " mixture Gaussian distribution number component " , " agreed covariance matrix properties (the shape of the Gaussian distribution) " , " the EM iteration number of runs " ;

5, GMM Lane, BIC (the Bayesian Information Criteria) is an effective criterion to score determining method;

. 6, and K-means like GMM, GMM learning only some probability density function, the probability of each point is given for each class, it may be used in addition GMM clustering , it may be used as a density estimation ;

7, Category GMM probability derived value, in many cases stronger applicability (the probability value continuity) , more explicable than a simple classification results obtained;

8, EM algorithm with convergence, but not guaranteed to find the global maximum , it is possible to find a local maximum. The solution is different initialization parameters several iterations, the result of taking the best of that.

 

[Mathematical principles of GMM]

Here part of the reference Hang Li "statistical learning methods" pdf book details, the public can reply No. " statistical learning methods " to obtain.

1, the basic meaning of mathematics

Sample mean and variance : one-dimensional feature sample data, the sample mean and standard deviation calculation is simple, the reaction of the sample mean midpoint sample set , the standard deviation and variance are described degree of dispersion of the sample points.

Covariance and covariance matrix : description may describe the covariance statistics between the two correlations , when the covariance is greater than 0, then the two variables are positively correlated; the same may be simply obtained in a two-dimensional co-defined variable based on variance, but if the case is multidimensional, which requires the covariance matrix to express the relationship between the various variables , the variance of each variable is the diagonal of the covariance matrix.

Covariance matrix may also be calculated, let the center of the sample matrix, i.e., subtracting the mean for each dimension of the dimension of the mean of 0 on each dimension, and then the new sample to matrix multiplication directly on its turn position, then dividing by (N-1) to.

% Matlab源码:中心化样本矩阵,使各维度均值为0
X = MySample - repmat(mean(MySample),10,1);
C = (X'*X)./(size(X,1)-1);

2, GMM using EM algorithm steps

EM iterative update method by Gaussian mixture model parameters, suppose we have sample data  x_{1}, x_{2}, ...,x_{N} and a  K Gaussian mixture model child model, you want to calculate the optimal parameters of the Gaussian mixture model.

2.1 First of all initialization parameters

Scheme 1: covariance matrix is a unit matrix, the ratio of the prior probability of each model ; mean to a random number.

Scheme 2: a sample of the k-means (k-means) clustering algorithm for clustering, as the use of various mean , and computing , taking samples of all kinds proportion of the total samples.

Step 2.2 E: depending on the current parameters, calculated for each data  from the sub-model   likelihood jk

2.3 M step: the model parameters of a new round of iteration

Repeat steps 2.4 Computation of the E and M steps until convergence||\theta_{i+1} - \theta_{i}|| < \varepsilon ,  \varepsilon is a small positive number, after the first iteration represents a very small parameter changes)

 

3, GMM pure mathematical derivation (compared to the "statistical learning" more clarity)

Step 3.1 E, Q function determined

Observation defined variable y, defined hidden variables is z, the number of full data likelihood function is:

For convenience of calculation, the number of the formula:

There are Q function is:

Q above functions, the posterior probability is:

3.2 M step

Step M can be calculated with reference to "a statistical learning method", parameters required to seek and allowed to strike the partial derivative to zero.

 

 

【references】

[1] Gaussian mixture model (theoretical + opencv implemented)

[2]  On the covariance matrix

[3]  from the maximum likelihood function to explain the EM algorithm

 

 

Guess you like

Origin blog.csdn.net/qq_36931982/article/details/91345524