"Statistical Learning Methods" - Maximum Likelihood Estimation of Naive Bayesian Parameters

References:

Step1. Likelihood function

In the Naive Bayesian model, the parameters we need to determine through the training set are θ k = P ( y = ck ) \theta_k=P(y=c_k)ik=P ( and=ck) μ j l k = P ( x ( j ) = a j l ∣ y = c k ) \mu_{jlk}=P(x^{(j)}=a_{jl}|y=c_k) mj l k=P(x(j)=ajly=ck)

Likelihood function:
L ( θ , μ ) = ∏ i = 1 N P ( x i , y i ) = ∏ i = 1 N P ( y i ) P ( x i ∣ y i ) (乘法公式) = ∏ i = 1 N ( P ( y i ) ∏ j = 1 n P ( x i ( j ) ∣ y i ) ) (条件独立假设) = ∏ i = 1 N ∏ k = 1 K ( P ( y = c k ) ∏ j = 1 n P ( x i ( j ) ∣ y i = c k ) ) I ( y i = c k ) = ∏ i = 1 N ∏ k = 1 K ( θ k ∏ j = 1 n ∏ l = 1 L j P I ( x i ( j ) = a j l ) ( x ( j ) = a j l ∣ y i = c k ) ) I ( y i = c k ) = ∏ i = 1 N ∏ k = 1 K ( θ k ∏ j = 1 n ∏ l = 1 L j μ j l k I ( x i ( j ) = a j l ) ) I ( y i = c k ) \begin{align} L(\theta,\mu)&=\prod\limits_{i=1}^{N}P(x_i,y_i)\notag\\ &=\prod\limits_{i=1}^{N}P(y_i)P(x_i|y_i)(乘法公式)\notag\\ &=\prod\limits_{i=1}^{N}\Big(P(y_i)\prod\limits_{j=1}^{n}P(x^{(j)}_i|y_i)\Big)(条件独立假设)\notag\\ &=\prod\limits_{i=1}^{N}\prod\limits_{k=1}^{K}\Big(P(y=c_k)\prod\limits_{j=1}^{n}P(x^{(j)}_i|y_i=c_k)\Big)^{I(y_i=c_k)}\notag\\ &=\prod\limits_{i=1}^{N}\prod\limits_{k=1}^{K}\Big(\theta_k\prod\limits_{j=1}^{n}\prod\limits_{l=1}^{L_j}P^{I(x^{(j)}_i=a_{jl})}(x^{(j)}=a_{jl}|y_i=c_k)\Big)^{I(y_i=c_k)}\notag\\ &=\prod\limits_{i=1}^{N}\prod\limits_{k=1}^{K}\Big(\theta_k\prod\limits_{j=1}^{n}\prod\limits_{l=1}^{L_j}\mu_{jlk}^{I(x^{(j)}_i=a_{jl})}\Big)^{I(y_i=c_k)}\notag\\ \end{align} L ( θ ,m )=i=1NP(xi,yi)=i=1NP ( andi)P(xiyi) (multiplication formula)=i=1N( P ( andi)j=1nP(xi(j)yi) ) (conditional independence assumption)=i=1Nk=1K( P ( and=ck)j=1nP(xi(j)yi=ck))I(yi=ck)=i=1Nk=1K( ikj=1nl=1LjPI(xi(j)=ajl)(x(j)=ajlyi=ck))I(yi=ck)=i=1Nk=1K( ikj=1nl=1Ljmjl k _I(xi(j)=ajl))I(yi=ck)
Among them, NNN is the number of samples,nnn isXXDimension of X , L j L_jLjfor X ( j ) X^{(j)}X( j ) The number of possible values,KKKYYThe number of possible values ​​for Y.

Form:
l ( θ , μ ) = ∑ i = 1 N ∑ k = 1 KI ( yi = ck ) ( log ⁡ θ k + ∑ j = 1 n ∑ l = 1 L j I ( xi ( j ) = ajl ) log ⁡ µ jlk ) \begin{align} l(\theta,\mu)&=\sum\limits_{i=1}^{N}\sum\limits_{k=1}^{K}I( y_i=c_k)\Big(\log\theta_k+\sum\limits_{j=1}^{n}\sum\limits_{l=1}^{L_j}I(x^{(j)}_i=a_{ jl})\log\mu_{jlk}\Big)\notag \end{align}l ( i ,m ).=i=1Nk=1KI(yi=ck)(logik+j=1nl=1LjI(xi(j)=ajl)logmj l k)

Step2 If k \theta_kik

Use the Lagrange multiplier method to introduce constraints ∑ k = 1 K θ k = 1 \sum\limits_{k=1}^{K}\theta_k=1k=1Kik=1 ,得:
F ( θ , μ , λ ) = ∑ i = 1 N ∑ k = 1 KI ( yi = ck ) ( log ⁡ θ k + ∑ j = 1 n ∑ l = 1 L j I ( xi ( j ) = ajl ) log ⁡ µ jlk ) + λ ( ∑ k = 1 K θ k − 1 ) \begin{align} F(\theta,\mu,\lambda)=\sum\limits_{i=1}^{; N}\sum\limits_{k=1}^{K}I(y_i=c_k)(\log\theta_k+\sum\limits_{j=1}^{n}\sum\limits_{l=1}^{ L_j}I(x^{(j)}_i=a_{jl})\log\mu_{jlk})+\lambda(\sum\limits_{k=1}^{K}\theta_k-1)\notag \end{align}F ( θ ,m ,l )=i=1Nk=1KI(yi=ck)(logik+j=1nl=1LjI(xi(j)=ajl)logmj l k)+l (k=1Kik1)

to FFF Find the partial derivative and let the partial derivative be0 00 ,i:
θ k = − ∑ i = 1 IF ( yi = ck ) λ ∑ k = 1 K θ k = − N λ = 1 \begin{align} \theta_k&=-\frac{\sum\limits_{i =1}^{N}I(y_i=c_k)}{\lambda}\notag\\ \sum\limits_{k=1}^{K}\theta_k&=-\frac{N}{\lambda}=1 \notag\end{align}ikk=1Kik=li=1NI(yi=ck)=lN=1
Among them, N k N_kNkFor the sample Y = ck Y=c_kY=ckquantity. Combining the above two equations, we get:
θ k = ∑ i = 1 NI ( yi = ck ) N \begin{align} \theta_k=\frac{\sum\limits_{i=1}^{N}I( y_i=c_k)}{N}\notag \end{align}ik=Ni=1NI(yi=ck)

Step3. Find μ lk \mu_{lk}mlk

Use the Lagrange multiplier method to introduce constraints ∑ l = 1 L j μ lk = 1 \sum\limits_{l=1}^{L_j}\mu_{lk}=1l=1Ljmlk=1 ,得:
F ( θ , μ , λ ) = ∑ i = 1 N ∑ k = 1 KI ( yi = ck ) ( log ⁡ θ k + ∑ j = 1 n ∑ l = 1 L j I ( xi ( j ) = ajl ) log ⁡ µ jlk ) + λ ( ∑ l = 1 L j µ lk − 1 ) \begin{align} F(\theta,\mu,\lambda)=\sum\limits_{i=1}^ {N}\sum\limits_{k=1}^{K}I(y_i=c_k)\Big(\log\theta_k+\sum\limits_{j=1}^{n}\sum\limits_{l=1 }^{L_j}I(x^{(j)}_i=a_{jl})\log\mu_{jlk})+\lambda(\sum\limits_{l=1}^{L_j}\mu_{lk }-1\Big)\notag\end{align}F ( θ ,m ,l )=i=1Nk=1KI(yi=ck)(logik+j=1nl=1LjI(xi(j)=ajl)logmj l k)+l (l=1Ljmlk1)

to FFF Find the partial derivative and let the partial derivative be0 00 ,得:
μ j l k = − ∑ i = 1 N I ( y i = c k , x i ( j ) = a j l ) λ ∑ l = 1 L j μ l k = − ∑ i = 1 N I ( y i = c k ) λ = 1 \begin{align} \mu_{jlk}&=-\frac{\sum\limits_{i=1}^{N}I(y_i=c_k,x^{(j)}_i=a_{jl})}{\lambda}\notag\\ \sum\limits_{l=1}^{L_j}\mu_{lk}&=-\frac{\sum\limits_{i=1}^{N}I(y_i=c_k)}{\lambda}=1\notag \end{align} mj l kl=1Ljmlk=li=1NI(yi=ck,xi(j)=ajl)=li=1NI(yi=ck)=1
联立上面两个方程,得:
μ j l k = ∑ i = 1 N I ( y i = c k , x i ( j ) = a j l ) ∑ i = 1 N I ( y i = c k ) \begin{align} \mu_{jlk}=\frac{\sum\limits_{i=1}^{N}I(y_i=c_k,x^{(j)}_i=a_{jl})}{\sum\limits_{i=1}^{N}I(y_i=c_k)}\notag \end{align} mj l k=i=1NI(yi=ck)i=1NI(yi=ck,xi(j)=ajl)

Guess you like

Origin blog.csdn.net/MaTF_/article/details/131458222