Chapter 4: Naive Bayes

Learning and classification

basic method

Learning prior probability distribution:

\[P(Y=c_k)\]

Learning conditional probability distribution:

\[P(X=x|Y=c_k)\]

Then the joint probability distribution learned \ (P (X-, the Y) \) , so the model is generated

Conditional independence assumptions

\(\begin{align*}P(X=x|Y=c_k)=&P(X^{(1)}=x^{(1)},\cdots X^{(n)}=x^{(n)}|Y=c_k)\\=&\prod_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k)\end{align*}\)

The idea is simple, the expense classification accuracy

Posterior probability

\[\begin{align*}P(Y=c_m|X=x)=\frac{P(X=x|Y=c_m)P(Y=c_m)}{\sum_{k}P(Y=c_k)P(X=x|Y=c_k)} \end{align*}\]

\[y=f(x)=\underset{c_m}{argmax}P(Y=c_m|X=x)\]

\[y=\underset{c_m}{argmax} P(Y=c_m)\prod_{j=1}^nP(X^{(j)}=x^{(j)}|Y=c_m)\]

meaning

After maximizing the posterior probability is equivalent to the expected risk minimization

\(\begin{equation}L(Y,f(x))= \left\{ \begin{aligned}1,&Y\neq f(x)\\0,&Y=f(x)\end{aligned}\right. \end{equation}\)

\(R_{exp}(f) = E[L(Y,f(x))]\)

Because it is desirable to seek the joint probability P (X, Y) is converted into a desired condition

\(R_{exp}(f) = E_X \sum_{k=1}^{K}[L(c_k,f(x))]P(c_k|X)\)

In order to minimize the risk desired, for each \ (X = x \)

\[\begin{align*}f(x) = &\underset{y\in Y}{argmin}\sum_{k=1}^{K}L(x_k,y)P(c_k|X=x) \\=&\underset{y\in Y}{argmin}\sum_{k=1}^{K}P(y\neq c_k|X=x) \\=& \underset{y\in Y}{argmin}(1-P(y=c_k|X=x) ) \\=& \underset{y\in Y}{argmax}P(y=c_k|X=x)\end{align*} \]

and so

\[f(x)=\underset{c_k}{argmax}P(c_k|X=x)\]

Parameter Estimation

Maximum likelihood estimate

\[P(Y=c_k) = \frac{\sum_{i=1}^{N}I(y_i=c_k)}{N}\]

\[P(X^{(j)}=x^{(jl)}|Y=c_k) = \frac{\sum_{i=1}^{N}I(X_i^{(j)}=x^{(jl)},y_i=c_k)}{\sum_{i=1}^N I(Y=c_k)}\]

\ (X ^ {(jl) } \) represents the \ (J \) attribute of \ (L \) of possible values

Bayesian estimation

Maximum Likelihood may occur with probability 0.5

\[P_\lambda(X^{(j)}=x^{(jl)}|Y=c_k) = \frac{\sum_{i=1}^{N}I(X_i^{(j)}=x^{(jl)},y_i=c_k)+\lambda}{\sum_{i=1}^N I(Y=c_k)+S_j\lambda}\]

\ (S_j \) is the first \ (J \) attributes the number of possible values

\[P_\lambda(Y=c_k) = \frac{\sum_{i=1}^{N}I(y_i=c_k)+\lambda}{N+K\lambda}\]

Guess you like

Origin www.cnblogs.com/Lzqayx/p/12163107.html