统计学习II.7 广义线性模型1 指数分布族


这一部分介绍广义线性模型,这是一类监督学习方法,通常用来构造分类器等。考虑 { ( X i , Y i ) } i = 1 N \{(X_i,Y_i)\}_{i=1}^N { (Xi,Yi)}i=1N,广义线性模型通常假设 Y i Y_i Yi服从某种指数分布族。因此这一部分先介绍指数分布族,然后介绍基于不同指数分布族导出的广义线性模型的不同效果。


指数分布族的定义

p ( x ∣ θ ) p(x|\theta) p(xθ)表示某个密度函数,称它是指数分布族(exponential family)如果:
p ( x ∣ θ ) = h ( x ) exp ⁡ ( θ T ϕ ( x ) − A ( θ ) ) p(x|\theta) = h(x)\exp(\theta^T \phi(x)-A(\theta)) p(xθ)=h(x)exp(θTϕ(x)A(θ))

根据密度函数的归一性,
∫ p ( x ∣ θ ) d x = ∫ h ( x ) exp ⁡ ( θ T ϕ ( x ) − A ( θ ) ) d x = exp ⁡ ( − A ( θ ) ) ∫ h ( x ) exp ⁡ ( θ T ϕ ( x ) ) d x = 1 \int p(x|\theta)dx =\int h(x)\exp(\theta^T \phi(x)-A(\theta))dx \\ = \exp(-A(\theta))\int h(x)\exp(\theta^T \phi(x))dx =1 p(xθ)dx=h(x)exp(θTϕ(x)A(θ))dx=exp(A(θ))h(x)exp(θTϕ(x))dx=1

于是

A ( θ ) = log ⁡ Z ( θ ) , Z ( θ ) = ∫ h ( x ) exp ⁡ ( θ T ϕ ( x ) ) d x A(\theta)=\log Z(\theta), Z(\theta)=\int h(x)\exp(\theta^T\phi(x))dx A(θ)=logZ(θ),Z(θ)=h(x)exp(θTϕ(x))dx

其中 θ \theta θ被称为natural parameter, ϕ ( X ) \phi(X) ϕ(X)是这个指数族的充分统计量(基于Fisher-Neyman定理), Z ( θ ) Z(\theta) Z(θ)是partition function, A ( θ ) A(\theta) A(θ)是cumulant function,如果 ϕ ( X ) = X \phi(X)=X ϕ(X)=X,称这样的指数族为自然指数族(natural exponential family)。

指数分布的另一种形式为
p ( x ∣ θ ) = h ( x ) exp ⁡ ( η ( θ ) T ϕ ( x ) − A ( η ( θ ) ) ) p(x|\theta) = h(x)\exp(\eta(\theta)^T \phi(x)-A(\eta(\theta))) p(xθ)=h(x)exp(η(θ)Tϕ(x)A(η(θ)))如果 dim ⁡ ( θ ) < dim ⁡ ( η ( θ ) ) \dim(\theta)<\dim(\eta(\theta)) dim(θ)<dim(η(θ)),称之为curved exponential family,此时充分统计量的数目比参数多;如果 dim ⁡ ( θ ) = dim ⁡ ( η ( θ ) ) \dim(\theta)=\dim(\eta(\theta)) dim(θ)=dim(η(θ)),称之为canonical form;

指数分布族的例子

Bernoulli分布

p ( x ∣ μ ) = μ x ( 1 − μ ) 1 − x = exp ⁡ ( ϕ ( x ) T θ ) p(x|\mu)=\mu^x(1-\mu)^{1-x}=\exp(\phi(x)^T\theta) p(xμ)=μx(1μ)1x=exp(ϕ(x)Tθ)

其中
ϕ ( x ) = [ 1 x = 0 , 1 x = 1 ] T , θ = [ log ⁡ ( μ ) , log ⁡ ( 1 − μ ) ] T \phi(x)=[1_{x=0},1_{x=1}]^T,\theta=[\log(\mu),\log(1-\mu)]^T ϕ(x)=[1x=0,1x=1]T,θ=[log(μ),log(1μ)]T

这并不是一个好的表示,因为 x ∈ { 0 , 1 } x \in \{0,1\} x{ 0,1} 1 T ϕ ( x ) = 1 1^T \phi(x)=1 1Tϕ(x)=1,也就是说 ϕ ( x ) \phi(x) ϕ(x)的两个分量是线性相关的,这会导致在估计的时候 θ \theta θ只有一个方程。一种更好的表示方法是
p ( x ∣ μ ) = ( 1 − μ ) exp ⁡ [ x log ⁡ ( μ 1 − μ ) ] = exp ⁡ ( ϕ ( x ) T θ ) = exp ⁡ ( ϕ ( x ) T θ ) p(x|\mu)=(1-\mu)\exp \left[ x\log \left( \frac{\mu}{1-\mu} \right) \right]=\exp(\phi(x)^T\theta)=\exp(\phi(x)^T\theta) p(xμ)=(1μ)exp[xlog(1μμ)]=exp(ϕ(x)Tθ)=exp(ϕ(x)Tθ)

其中
ϕ ( x ) = x , θ = log ⁡ ( μ 1 − μ ) \phi(x)=x,\theta = \log \left( \frac{\mu}{1-\mu} \right) ϕ(x)=x,θ=log(1μμ)

θ \theta θ为log-odds ratio;从natural parameter还原为 μ \mu μ的函数是sigmoid函数
μ = s i g m ( θ ) = 1 1 + e − θ \mu = sigm(\theta)=\frac{1}{1+e^{-\theta}} μ=sigm(θ)=1+eθ1

Multinoulli分布

p ( x ∣ μ 1 , ⋯   , μ K ) = ∏ k = 1 K μ k x k = exp ⁡ [ ∑ k = 1 K − 1 x k log ⁡ ( μ k μ K ) + log ⁡ μ K ] p(x|\mu_1,\cdots,\mu_K)=\prod_{k=1}^K \mu_k^{x_k}=\exp\left[ \sum_{k=1}^{K-1} x_k\log \left( \frac{\mu_k}{\mu_K}\right) +\log \mu_K\right] p(xμ1,,μK)=k=1Kμkxk=exp[k=1K1xklog(μKμk)+logμK]

其中
∑ k = 1 K μ k = 1 \sum_{k=1}^K \mu_k = 1 k=1Kμk=1

于是
p ( x ∣ θ ) = h ( x ) exp ⁡ ( θ T ϕ ( x ) − A ( θ ) ) p(x|\theta)=h(x)\exp(\theta^T \phi(x)-A(\theta)) p(xθ)=h(x)exp(θTϕ(x)A(θ))其中
θ = [ log ⁡ μ 1 μ K , ⋯   , log ⁡ μ K − 1 μ K ] T , ϕ ( x ) = [ 1 x = 1 , ⋯   , 1 x = K − 1 ] T A ( θ ) = log ⁡ ( 1 + ∑ k = 1 K − 1 e θ k ) \theta=[\log \frac{\mu_1}{\mu_K},\cdots,\log \frac{\mu_{K-1}}{\mu_K}]^T,\phi(x)=[1_{x=1},\cdots,1_{x=K-1}]^T \\ A(\theta)=\log \left( 1+ \sum_{k=1}^{K-1} e^{\theta_k} \right) θ=[logμKμ1,,logμKμK1]T,ϕ(x)=[1x=1,,1x=K1]TA(θ)=log(1+k=1K1eθk)

从natural parameter还原到 μ \mu μ的方法为
{ μ k = e θ k 1 + ∑ j = 1 K − 1 e θ j , k = 1 , ⋯   , K − 1 μ K = 1 ∑ j = 1 K − 1 e θ j \begin{cases} \mu_k = \frac{e^{\theta_k}}{1+\sum_{j=1}^{K-1}e^{\theta_j}},k=1,\cdots,K-1 \\ \mu_K = \frac{1}{\sum_{j=1}^{K-1}}e^{\theta_{j}} \end{cases} μk=1+j=1K1eθjeθk,k=1,,K1μK=j=1K11eθj

指数分布族的性质

性质1
d A d θ = E [ ϕ ( X ) ] \frac{dA}{d\theta}=E[\phi(X)] dθdA=E[ϕ(X)]

直接计算这个导数即可,下面的两个性质也都是直接计算导数
d A d θ = d d θ log ⁡ ∫ h ( x ) exp ⁡ ( θ T ϕ ( x ) ) d x = ∫ ϕ ( x ) p ( x ∣ θ ) d x \frac{dA}{d\theta}=\frac{d}{d\theta}\log \int h(x)\exp(\theta^T\phi(x))dx=\int \phi(x)p(x|\theta)dx dθdA=dθdlogh(x)exp(θTϕ(x))dx=ϕ(x)p(xθ)dx

性质2
d 2 A d θ 2 = V a r [ ϕ ( X ) ] \frac{d^2A}{d\theta^2}=Var[\phi(X)] dθ2d2A=Var[ϕ(X)]

性质3
∇ 2 A ( θ ) = C o v ( ϕ ( X ) ) \nabla^2 A(\theta)=Cov(\phi(X)) 2A(θ)=Cov(ϕ(X))

指数分布族的MLE

指数分布族MLE的moment matching equation
假设 X 1 , ⋯   , X N ∼ i i d p ( x ∣ θ ) X_1,\cdots,X_N \sim_{iid} p(x|\theta) X1,,XNiidp(xθ), 似然函数为
L ( θ ∣ X 1 , ⋯   , X N ) = [ ∏ i = 1 N h ( X i ) ] exp ⁡ ( θ T ∑ i = 1 N ϕ ( X i ) − N A ( θ ) ) L(\theta|X_1,\cdots,X_N)=\left[ \prod_{i=1}^N h(X_i) \right] \exp \left( \theta^T \sum_{i=1}^N \phi(X_i) -NA(\theta)\right) L(θX1,,XN)=[i=1Nh(Xi)]exp(θTi=1Nϕ(Xi)NA(θ))

对数似然为
log ⁡ L ( θ ∣ X 1 , ⋯   , X N ) = log ⁡ [ ∏ i = 1 N h ( X i ) ] + θ T ∑ i = 1 N ϕ ( X i ) − N A ( θ ) \log L(\theta|X_1,\cdots,X_N)=\log \left[ \prod_{i=1}^N h(X_i) \right] +\theta^T \sum_{i=1}^N \phi(X_i) -NA(\theta) logL(θX1,,XN)=log[i=1Nh(Xi)]+θTi=1Nϕ(Xi)NA(θ)

考虑MLE满足的方程
∇ log ⁡ L ( θ ∣ X 1 , ⋯   , X N ) = ∑ i = 1 N ϕ ( X i ) − N ∇ A ( θ ) = ∑ i = 1 N ϕ ( X i ) − N E [ ϕ ( X ) ] = 0 \nabla \log L(\theta|X_1,\cdots,X_N) = \sum_{i=1}^N \phi(X_i)-N\nabla A(\theta)=\sum_{i=1}^N \phi(X_i)-NE[\phi(X)]=0 logL(θX1,,XN)=i=1Nϕ(Xi)NA(θ)=i=1Nϕ(Xi)NE[ϕ(X)]=0

也就是
E [ ϕ ( X ) ] = 1 N ∑ i = 1 N ϕ ( X i ) E[\phi(X)]=\frac{1}{N}\sum_{i=1}^N \phi(X_i) E[ϕ(X)]=N1i=1Nϕ(Xi)

这里 ϕ ( X ) \phi(X) ϕ(X)是指数分布的充分统计量,称这个方程为moment matching equation,它的含义是充分统计量的样本均值等于理论均值。

指数分布族的贝叶斯方法

指数分布族是一个共轭分布族
我们把似然函数写成下面的形式:
L ( θ ∣ X 1 , ⋯   , X N ) ∝ g ( θ ) N e η ( θ ) T s N , s N = ∑ i = 1 N s ( X i ) L(\theta|X_1,\cdots,X_N)\propto g(\theta)^N e^{\eta(\theta)^T s_N},s_N = \sum_{i=1}^N s(X_i) L(θX1,,XN)g(θ)Neη(θ)TsN,sN=i=1Ns(Xi)

引入指数函数族先验,
p ( θ ∣ n u 0 , τ 0 ) ∝ g ( θ ) ν 0 e η ( θ ) T τ 0 p(\theta|nu_0,\tau_0) \propto g(\theta)^{\nu_0}e^{\eta(\theta)^T \tau_0} p(θnu0,τ0)g(θ)ν0eη(θ)Tτ0

则后验为
p ( θ ∣ ν 0 + N , τ 0 + s N ) ∝ g ( θ ) ν 0 + N e η ( θ ) T ( τ 0 + s N ) p(\theta|\nu_0+N,\tau_0+s_N)\propto g(\theta)^{\nu_0+N}e^{\eta(\theta)^T(\tau_0+s_N)} p(θν0+N,τ0+sN)g(θ)ν0+Neη(θ)T(τ0+sN)

猜你喜欢

转载自blog.csdn.net/weixin_44207974/article/details/112387622