References:

Step1. Likelihood function

In the Naive Bayesian model, the parameters we need to determine through the training set are $\theta_k=P(y=c_k)$ 和 $\mu_{jlk}=P(x^{(j)}=a_{jl}|y=c_k)$

Likelihood function:
$\begin{align} L(\theta,\mu)&=\prod\limits_{i=1}^{N}P(x_i,y_i)\notag\\ &=\prod\limits_{i=1}^{N}P(y_i)P(x_i|y_i)（乘法公式）\notag\\ &=\prod\limits_{i=1}^{N}\Big(P(y_i)\prod\limits_{j=1}^{n}P(x^{(j)}_i|y_i)\Big)（条件独立假设）\notag\\ &=\prod\limits_{i=1}^{N}\prod\limits_{k=1}^{K}\Big(P(y=c_k)\prod\limits_{j=1}^{n}P(x^{(j)}_i|y_i=c_k)\Big)^{I(y_i=c_k)}\notag\\ &=\prod\limits_{i=1}^{N}\prod\limits_{k=1}^{K}\Big(\theta_k\prod\limits_{j=1}^{n}\prod\limits_{l=1}^{L_j}P^{I(x^{(j)}_i=a_{jl})}(x^{(j)}=a_{jl}|y_i=c_k)\Big)^{I(y_i=c_k)}\notag\\ &=\prod\limits_{i=1}^{N}\prod\limits_{k=1}^{K}\Big(\theta_k\prod\limits_{j=1}^{n}\prod\limits_{l=1}^{L_j}\mu_{jlk}^{I(x^{(j)}_i=a_{jl})}\Big)^{I(y_i=c_k)}\notag\\ \end{align}$
Among them, $N$ is the number of samples, $n$ isDimension of $X$ $L_j$ for $X^{(j)}$ The number of possible values, $K$ 为The number of possible values for $Y.$

Form:
$\begin{align} l(\theta,\mu)&=\sum\limits_{i=1}^{N}\sum\limits_{k=1}^{K}I( y_i=c_k)\Big(\log\theta_k+\sum\limits_{j=1}^{n}\sum\limits_{l=1}^{L_j}I(x^{(j)}_i=a_{ jl})\log\mu_{jlk}\Big)\notag \end{align}$

Step2 If $\theta_k$

Use the Lagrange multiplier method to introduce constraints $\sum\limits_{k=1}^{K}\theta_k=1$ ,得:
$\begin{align} F(\theta,\mu,\lambda)=\sum\limits_{i=1}^{; N}\sum\limits_{k=1}^{K}I(y_i=c_k)(\log\theta_k+\sum\limits_{j=1}^{n}\sum\limits_{l=1}^{ L_j}I(x^{(j)}_i=a_{jl})\log\mu_{jlk})+\lambda(\sum\limits_{k=1}^{K}\theta_k-1)\notag \end{align}$

to $F$ Find the partial derivative and let the partial derivative be $0$ ,i:
$\begin{align} \theta_k&=-\frac{\sum\limits_{i =1}^{N}I(y_i=c_k)}{\lambda}\notag\\ \sum\limits_{k=1}^{K}\theta_k&=-\frac{N}{\lambda}=1 \notag\end{align}$
Among them, $N_k$ For the sample $Y=c_k$ quantity. Combining the above two equations, we get:
$\begin{align} \theta_k=\frac{\sum\limits_{i=1}^{N}I( y_i=c_k)}{N}\notag \end{align}$

Step3. Find $\mu_{lk}$

Use the Lagrange multiplier method to introduce constraints $\sum\limits_{l=1}^{L_j}\mu_{lk}=1$ ,得:
$\begin{align} F(\theta,\mu,\lambda)=\sum\limits_{i=1}^ {N}\sum\limits_{k=1}^{K}I(y_i=c_k)\Big(\log\theta_k+\sum\limits_{j=1}^{n}\sum\limits_{l=1 }^{L_j}I(x^{(j)}_i=a_{jl})\log\mu_{jlk})+\lambda(\sum\limits_{l=1}^{L_j}\mu_{lk }-1\Big)\notag\end{align}$

to $F$ Find the partial derivative and let the partial derivative be $0$ ，得：
$\begin{align} \mu_{jlk}&=-\frac{\sum\limits_{i=1}^{N}I(y_i=c_k,x^{(j)}_i=a_{jl})}{\lambda}\notag\\ \sum\limits_{l=1}^{L_j}\mu_{lk}&=-\frac{\sum\limits_{i=1}^{N}I(y_i=c_k)}{\lambda}=1\notag \end{align}$
联立上面两个方程，得：
$\begin{align} \mu_{jlk}=\frac{\sum\limits_{i=1}^{N}I(y_i=c_k,x^{(j)}_i=a_{jl})}{\sum\limits_{i=1}^{N}I(y_i=c_k)}\notag \end{align}$

"Statistical Learning Methods" - Maximum Likelihood Estimation of Naive Bayesian Parameters

Step1. Likelihood function

Step2 If $\theta_k$

Step3. Find $\mu_{lk}$

Guess you like

"Statistical Learning Methods" - Maximum Likelihood Estimation of Naive Bayesian Parameters

Step1. Likelihood function

Step2 If k \theta_kik​

Step3. Find μ lk \mu_{lk}mlk​

Guess you like

Step2 If $\theta_k$

Step3. Find $\mu_{lk}$