统计机器学习-多元高斯分布

多元高斯分布:

假设   y 1 , y 3 , . . . , y d y^1,y^3,...,y^d  分布独立,服从分布  N ( 0 , 1 ) N(0,1)
联合分布  y = ( y 1 , y 2 , . . . , y d ) y=(y^1,y^2,...,y^d)  服从分布
g ( y ) = j = 1 d 1 2 π e ( y j ) 2 2 = 1 ( 2 π ) d 2 e 1 2 y T y g(y)=\prod_{j=1}^d \frac{1}{\sqrt{2\pi}}e^{-\frac{(y^j)^2}{2}}=\frac{1}{(2\pi)^{\frac{d}{2}}}e^{-\frac{1}{2}y^Ty}
E ( y ) = 0         V a r ( y ) = 1 期望:E(y)=0 \ \ \ ;\ \ \ 方差:Var(y)=1
做变化: x = T y + μ x=Ty+\mu 令: = T T T \sum=TT^T
f ( x ) = g ( y ) d e t ( T ) 1                           ( d e t ) f(x)=g(y)|det(T)|^{-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (det指行列式)
= 1 ( 2 π ) d 2 ( d e t ) 1 2 e [ 1 2 ( x μ ) T 1 ( x μ ) ] =\frac{1}{(2\pi)^{\frac{d}{2}}(det\sum)^{\frac{1}{2}}}e^{[-\frac{1}{2}(x-\mu)^T\sum^{-1}(x-\mu)]}
E ( x ) = T E ( y ) + μ = μ 期望:E(x)=TE(y)+\mu=\mu V a r ( x ) = V a r ( T y + μ ) = T V ( y ) T T = 方差:Var(x)=Var(Ty+\mu)=TV(y)T^T=\sum

条件高斯分布与边缘分布:

如果两组变量是联合高斯分布,那么以⼀组变量为条件,另⼀组变量同样是高斯分布。类似地,任何⼀个变量的边缘分布也是高斯分布。
x = ( x b x a ) x=\big(_{x_b}^{x_a}\big) μ = ( μ b μ a ) \mu=\big(_{\mu_b}^{\mu_a}\big) = ( b a     b b a a     a b ) \sum=\bigg(_{\sum_{ba}\ \ \ \sum_{bb}}^{\sum_{aa}\ \ \ \sum_{ab}}\bigg) Λ = ( ) 1 \Lambda=(\sum)^{-1} Λ = ( Λ b a     Λ b b Λ a a     Λ a b ) \Lambda=\bigg(_{\Lambda_{ba} \ \ \ \Lambda_{bb}}^{\Lambda_{aa}\ \ \ \Lambda_{ab}}\bigg)
条件分布是高斯分布:
p ( x a x b ) = N ( x a μ a b , Λ a a 1 ) p(x_a|x_b)=N(x_a|\mu_{a|b},\Lambda_{aa}^{-1}) μ a b = μ a Λ a a 1 Λ a b ( x b μ b ) \mu_{a|b}=\mu_a-\Lambda_{aa}^{-1}\Lambda_{ab}(x_b-\mu_b)
边缘分布是高斯分布:
p ( x a ) = N ( x a μ a , ( ) a a ) p(x_a)=N(x_a|\mu_a,(\sum)_{aa})

极大似然估计:
L = n 2 l o g d e t ( ) 1 i = 1 n 1 2 ( x i μ ) T ( ) 1 ( x i μ ) L=\frac{n}{2}logdet(\sum)^{-1}-\sum_{i=1}^n\frac{1}{2}(x_i-\mu)^T(\sum)^{-1}(x_i-\mu)

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
多元高斯分布可视化:

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib as mpl

if __name__ == '__main__':
    mpl.rcParams['font.sans-serif'] = ['SimHei']
    mpl.rcParams['axes.unicode_minus'] = False

    d = np.random.randn(10000000, 2)
    N = 30
    density, edges = np.histogramdd(d, bins=[30, 30])
    print("样本总数: ", np.sum(density))
    density = density/density.max()
    x = y = np.arange(N)
    t = np.meshgrid(x,y)
    fig = plt.figure()
    ax = Axes3D(fig)
    ax.scatter(t[0], t[1], density, c='r', s=15*density, marker='o', depthshade=True)
    ax.plot_surface(t[0], t[1], density, cmap='rainbow', rstride=1, cstride=1, alpha=0.9, lw=1)
    ax.set_xlabel("x轴")
    ax.set_ylabel("y轴")
    ax.set_zlabel("z轴")
    plt.title("二元高斯分布")
    plt.tight_layout(0.1)
    plt.show()

在这里插入图片描述

发布了40 篇原创文章 · 获赞 40 · 访问量 1万+

猜你喜欢

转载自blog.csdn.net/qq_43613793/article/details/104967406