Gaussian mixture model (Gaussian mixture model, GMM)

1 Introduction

That's why I want to learn about the causes of the two-dimensional Gaussian distribution:

The general feeling of mathematical knowledge is not enough na, incidentally, Gaussian mixture model also recap.

2. The single-Gaussian model (Gaussian single model, GSM)

2.1 one-dimensional Gaussian model

Gaussian distribution (Gaussian distribution) is sometimes called the normal distribution (normal distribution), is a large quantity of existence in nature, the most common form of distribution.

If we randomly sampling the height data of a large number of the population, and the mined height data as a histogram, it will give a pattern as shown in FIG. This figure shows the simulated statistics 334 adults, can be seen in Figure Height 2.5cm intervals up to appear at about 180cm in.

 

One-dimensional Gaussian distribution of probability density function as follows:

                           

Wherein  [公式] the data mean (desired),  [公式] a data standard deviation (Standard deviation). Mean normal distribution corresponding to the intermediate position, in this case we can speculate Means near 180cm. Standard deviation measures the dispersion of data around the mean degree.

The above equation is a probability density function, that is, in the case of known parameters, refer to input variables x, can be obtained the corresponding probability density. Also note one thing, before the actual use of the probability distribution must first be normalized, that is to say the area under the curve and need to 1, so as to ensure the return of the probability density within the permissible range of values.

If you need to calculate the probability distribution of the specified range, it is possible to calculate the size of the area in the interval between the first and last values. Another addition to directly calculate the area may also be a more convenient way to achieve the same results, is subtracted CDF (cumulative density function, CDF) corresponding to the interval x. Because the CDF indicates the probability distribution of the value of x or less.

3. Gaussian mixture model (Gaussian mixture model, GMM)

3.1 official

Gaussian mixture model is a Gaussian model simple extension, GMM using a combination of a plurality of Gaussian distributions to characterize the data distribution.

for example:

想象下现在咱们不再考察全部用户的身高,而是要在模型中同时考虑男性和女性的身高。假定之前的样本里男女都有,那么之前所画的高斯分布其实是两个高斯分布的叠加的结果。相比只使用一个高斯来建模,现在我们可以用两个(或多个)高斯分布

                             

 

其中, [公式] 为数据均值(期望), [公式] 为协方差(Covariance),D 为数据维度。

 

3.2 高斯混合模型

高斯混合模型可以看作是由 K 个单高斯模型组合而成的模型,这 K 个子模型是混合模型的隐变量(Hidden variable)。一般来说,一个混合模型可以使用任何概率分布,这里使用高斯混合模型是因为高斯分布具备很好的数学性质以及良好的计算性能。

举个不是特别稳妥的例子,比如我们现在有一组狗的样本数据,不同种类的狗,体型、颜色、长相各不相同,但都属于狗这个种类,此时单高斯模型可能不能很好的来描述这个分布,因为样本数据分布并不是一个单一的椭圆,所以用混合高斯分布可以更好的描述这个问题,如下图所示

 

 

 

首先定义如下信息:

  • [公式] 表示第 [公式] 个观测数据, [公式]
  • [公式] 是混合模型中子高斯模型的数量, [公式]
  • [公式] 是观测数据属于第 [公式] 个子模型的概率, [公式] , [公式]
  • [公式] 是第 [公式] 个子模型的高斯分布密度函数, [公式] 。其展开形式与上面介绍的单高斯模型相同
  • [公式] 表示第 [公式] 个观测数据属于第 [公式] 个子模型的概率

高斯混合模型的概率分布为:

[公式]

对于这个模型而言,参数 [公式] ,也就是每个子模型的期望、方差(或协方差)、在混合模型中发生的概率。

  

 

 

 4. 二维高斯分布的参数理解

4.1 均值和协方差矩阵对二维高斯分布的影响

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4.2 总结

 

 

 

 

 

 参考链接:

https://blog.csdn.net/lin_limin/article/details/81024228

https://blog.csdn.net/lin_limin/article/details/81048411

https://zhuanlan.zhihu.com/p/31103654

https://zhuanlan.zhihu.com/p/30483076

 

 

Guess you like

Origin www.cnblogs.com/jiashun/p/gmm.html