Learning and classification
basic method
Learning prior probability distribution:
\[P(Y=c_k)\]
Learning conditional probability distribution:
\[P(X=x|Y=c_k)\]
Then the joint probability distribution learned \ (P (X-, the Y) \) , so the model is generated
Conditional independence assumptions
\(\begin{align*}P(X=x|Y=c_k)=&P(X^{(1)}=x^{(1)},\cdots X^{(n)}=x^{(n)}|Y=c_k)\\=&\prod_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k)\end{align*}\)
The idea is simple, the expense classification accuracy
Posterior probability
\[\begin{align*}P(Y=c_m|X=x)=\frac{P(X=x|Y=c_m)P(Y=c_m)}{\sum_{k}P(Y=c_k)P(X=x|Y=c_k)} \end{align*}\]
\[y=f(x)=\underset{c_m}{argmax}P(Y=c_m|X=x)\]
\[y=\underset{c_m}{argmax} P(Y=c_m)\prod_{j=1}^nP(X^{(j)}=x^{(j)}|Y=c_m)\]
meaning
After maximizing the posterior probability is equivalent to the expected risk minimization
\(\begin{equation}L(Y,f(x))= \left\{ \begin{aligned}1,&Y\neq f(x)\\0,&Y=f(x)\end{aligned}\right. \end{equation}\)
\(R_{exp}(f) = E[L(Y,f(x))]\)
Because it is desirable to seek the joint probability P (X, Y) is converted into a desired condition
\(R_{exp}(f) = E_X \sum_{k=1}^{K}[L(c_k,f(x))]P(c_k|X)\)
In order to minimize the risk desired, for each \ (X = x \)
\[\begin{align*}f(x) = &\underset{y\in Y}{argmin}\sum_{k=1}^{K}L(x_k,y)P(c_k|X=x) \\=&\underset{y\in Y}{argmin}\sum_{k=1}^{K}P(y\neq c_k|X=x) \\=& \underset{y\in Y}{argmin}(1-P(y=c_k|X=x) ) \\=& \underset{y\in Y}{argmax}P(y=c_k|X=x)\end{align*} \]
and so
\[f(x)=\underset{c_k}{argmax}P(c_k|X=x)\]
Parameter Estimation
Maximum likelihood estimate
\[P(Y=c_k) = \frac{\sum_{i=1}^{N}I(y_i=c_k)}{N}\]
\[P(X^{(j)}=x^{(jl)}|Y=c_k) = \frac{\sum_{i=1}^{N}I(X_i^{(j)}=x^{(jl)},y_i=c_k)}{\sum_{i=1}^N I(Y=c_k)}\]
\ (X ^ {(jl) } \) represents the \ (J \) attribute of \ (L \) of possible values
Bayesian estimation
Maximum Likelihood may occur with probability 0.5
\[P_\lambda(X^{(j)}=x^{(jl)}|Y=c_k) = \frac{\sum_{i=1}^{N}I(X_i^{(j)}=x^{(jl)},y_i=c_k)+\lambda}{\sum_{i=1}^N I(Y=c_k)+S_j\lambda}\]
\ (S_j \) is the first \ (J \) attributes the number of possible values
\[P_\lambda(Y=c_k) = \frac{\sum_{i=1}^{N}I(y_i=c_k)+\lambda}{N+K\lambda}\]