単純ベイズ アルゴリズムの研究ノート
この記事は、個人的な研究と理解のためだけのものです
単純ベイズ分類器(分類方法です)
ベイズ式
P ( A ∣ B ) = P ( A , B ) P ( B ) = P ( B ∣ A ) P ( A ) P ( B ) P(A \mid B)=\frac{P(A,B) } {P(B)}=\frac{P(B \mid A) P(A)}{P(B)}P ( A∣ロ)=P (ビー)P ( A ,B )=P (ビー)P ( B∣A ) P ( A )
ここで:
P ( A ) P(A)P ( A ) :事前確率
P ( A ∣ B ) P(A \mid B)P ( A∣B ) :事後確率
P ( B ∣ A ) P(B \mid A)P ( B∣A ) : 事象 A が発生する条件下で B が発生する確率、つまり尤度関数
P ( B ) P(B)P ( B ) : すべてのクラス ラベルで同じなので、証拠係数P ( B ) P(B)P ( B )与类标记无关
P ( A i ∣ B ) = P ( B ∣ A i ) P ( A i ) ∑ j P ( B ∣ A j ) P ( A j ) P\left(A_{i} \mid B\right)=\frac{P\left(B \mid A_{i}\right) P\left(A_{i}\right)}{\sum_{j} P\left(B \mid A_ {j}\right) P\left(A_{j}\right)}P( A私∣ロ)=∑じP( B∣あじ)P( Aじ)P( B∣あ私)P( A私)
分類問題
単純ベイジアン アルゴリズム (例の説明付き)
ベイジアン式は次のように表すことができます。
P ( y = cn ∣ x = X ) = P ( x = X , y = cn ) P ( x = X ) = P ( x = X ∣ y = cn ) P ( y = cn ) P ( x = X ) P\left(y=c_{n} \mid x=X\right)=\frac{P\left(x=X, y =c_{ n}\right)}{P(x=X)}=\frac{P\left(x=X \mid y=c_{n}\right) P\left(y=c_{n}\右)} {P(x=X)}P( _=cn∣バツ=X )=P (×=X )P( ×=X 、y=cn)=P (×=X )P( ×=バツ∣y=cn)P( _=cn)
各属性が互いに独立していると仮定すると:
P ( x = X ∣ y = cn ) = ∏ i = 1 m P ( xi = ai ∣ y = cn ) P\left(x=X \mid y=c_{n }\right )=\prod_{i=1}^{m} P\left(x^{i}=a_{i} \mid y=c_{n}\right)P( ×=バツ∣y=cn)=私は= 1∏メートルP( ×私=a私∣y=cn)
単純なベイジアン式: 機能セットxxx , yyの条件でP ( y = cn ∣ x = X ) = P ( y = cn ) ∏ i = 1 m P ( xi = ai ∣ y = cn ) P ( x = X ) P\left( y
=c_{n} \mid x=X \right)=\frac{P\left(y=c_{n}\right) \prod_{i=1}^{m} P\left(x^{i} =a_{i} \mid y=c_{n}\right)}{P(x=X)}P( _=cn∣バツ=X )=P (×=X )P( _=cn)∏私は= 1メートルP( ×私=a私∣y=cn)
条件付き確率yyを最大化しますy作は预测の結果:
f ( x ) = argmax ( P ( y = cn ) ∏ cnm P ( xi = ai ∣ y = cn ) P ( x = X ) ) f(x)=\operatorname{argmax}\left (\frac{P\left(y=c_{n}\right) \prod_{c_{n}}^{m} P\left(x^{i}=a_{i} \mid y=c_{n }\right)}{P(x=X)}\right)f ( x )=argmax(P (×=X )P( _=cn)∏cnメートルP( ×私=a私∣y=cn))
(証拠因子はクラス ラベルに依存しない) つまり、p ( x = X ) p(x=X)p ( x=X )省略可
f ( x ) = arg max cn ( P ( y = cn ) ∏ i = 1 m P ( xi = ai ∣ y = cn ) ) f(x)=\arg \max _{c_{ n}}\left(P\left(y=c_{n}\right) \prod_{i=1}^{m} P\left(x^{i}=a_{i} \mid y=c_{ n}\右)\右)f ( x )=ar gcn最大( P( _=cn)私は= 1∏メートルP( ×私=a私∣y=cn) )
P ( y = cn ) = ∑ i = 1 NI ( y = cn ) N , n = 1 , 2 , … KP\left(y=c_{n}\right)=\frac{\sum_{i= 1}^{N} I\left(y=c_{n}\right)}{N}, n=1,2, \ldots KP( _=cn)=N∑私は= 1N私( _=cn)、n=1 、2 、…K
P ( xi = aj ∣ y = cn ) = ∑ i = 1 NI ( xij = aj ∣ y = cn ) ∑ i = 1 NI ( yi = cn ) P\left(x^{i}=a_{j} \mid y=c_{n}\right)=\frac{\sum_{i=1}^{N} I\left(x_{i}^{j}=a_{j} \mid y=c_{n }\right)}{\sum_{i=1}^{N} I\left(y_{i}=c_{n}\right)}P( ×私=aじ∣y=cn)=∑私は= 1N私( _私=cn)∑私は= 1N私( ×私じ=aじ∣y=cn)
算例1
Naive Bayesian アルゴリズムのコード例の実装 (python)
価格A | レッスン B | 営業C | 価格A | レッスン B | 営業C |
---|---|---|---|---|---|
低い | 多くの | 高い | 0 | 2 | 2 |
高い | 真ん中 | 高い | 2 | 1 | 2 |
低い | 少し | 高い | 0 | 0 | 2 |
低い | 真ん中 | 低い | 0 | 1 | 0 |
真ん中 | 真ん中 | 真ん中 | 1 | 1 | 1 |
高い | 多くの | 高い | 2 | 2 | 2 |
低い | 少し | 真ん中 | 0 | 0 | 1 |
予想価格 A=2 (高) クラス時間 B=2 (それ以上) 販売時間
from __future__ import division
from numpy import array
def set_data(price, time, sale):
price_number = []
time_number = []
sale_number = []
for i in price:
if i == "低":
price_number.append(0)
elif i == "中":
price_number.append(1)
elif i == "高":
price_number.append(2)
for j in time:
if j == "少":
time_number.append(0)
elif j == "中":
time_number.append(1)
elif j == "多":
time_number.append(2)
for k in sale:
if k == "低":
sale_number.append(0)
elif k == "中":
sale_number.append(1)
elif k == "高":
sale_number.append(2)
return price_number, time_number, sale_number
price = ["低", "高", "低", "低", "中", "高", "低"]
time = ["多", "中", "少", "中", "中", "多", "少"]
sale = ["高", "高", "高", "低", "中", "高", "中"]
price_number, time_number, sale_number = set_data(price, time, sale)
print(price_number, time_number, sale_number)
P ( C = 0 ∣ x = X ) ∝ P ( C = 0 ) P ( A = 2 ∣ C = 0 ) P ( B = 2 ∣ C = 0 ) P(C=0\mid x=X) \propto P(C=0)P(A=2 \mid C=0)P(B=2 \mid C=0)P ( C=0∣バツ=X )∝P ( C=0 ) P ( A=2∣ハ=0 ) P ( B=2∣ハ=0 )
P ( C = 1 ∣ x = X ) ∝ P ( C = 1 ) P ( A = 2 ∣ C = 1 ) P ( B = 2 ∣ C = 1 ) P(C=1\mid x=X) \proto P(C=1)P(A=2 \mid C=1)P(B=2 \mid C=1)P ( C=1∣バツ=X )∝P ( C=1 ) P (あ=2∣ハ=1 )P (B=2∣ハ=1 )
P ( C = 2 ∣ x = X ) ∝ P ( C = 2 ) P ( A = 2 ∣ C = 2 ) P ( B = 2 ∣ C = 2 ) P(C=2\mid x=X) \proto P(C=2)P(A=2 \mid C=2)P(B=2 \mid C=2)P ( C=2∣バツ=X )∝P ( C=2 ) P (あ=2∣ハ=2 )P (B=2∣ハ=2 )
from __future__ import division
price_number = [0, 2, 0, 0, 1, 2, 0]
time_number = [2, 1, 0, 1, 1, 2, 0]
sale_number = [2, 2, 2, 0, 1, 2, 1]
exprice_number = 2
extime_number = 2
sale0p = sale_number.count(0)
sale1p = sale_number.count(1)
sale2p = sale_number.count(2)
a0 = 0
a1 = 0
a2 = 0
b0 = 0
b1 = 0
b2 = 0
for i in range(0, len(sale_number)):
if price_number[i] == 2:
if sale_number[i] == 0:
a0 = a0 + 1
elif sale_number[i] == 1:
a1 = a1 + 1
elif sale_number[i] == 2:
a2 = a2 + 1
if time_number[i] == 2:
if sale_number[i] == 0:
b0 = b0 + 1
elif sale_number[i] == 1:
b1 = b1 + 1
elif sale_number[i] == 2:
b2 = b2 + 1
pa0 = a0 / sale0p
pa1 = a1 / sale1p
pa2 = a2 / sale2p
pb0 = b0 / sale0p
pb1 = b1 / sale1p
pb2 = b2 / sale2p
pc0 = sale0p / len(sale_number)
pc1 = sale1p / len(sale_number)
pc2 = sale2p / len(sale_number)
pcc0 = pc0 * pa0 * pb0
pcc1 = pc1 * pa1 * pb1
pcc2 = pc2 * pa2 * pb2
indf = (pcc0, pcc1, pcc2)
print(indf)
max_indf = indf.index(max(indf))
if max_indf == 0:
print('销量低')
elif max_indf == 1:
print('销量中')
elif max_indf == 2:
print('销量高')
引用したブログ記事
Naive Bayesian 分類アルゴリズムの簡単な例
Naive Bayesian アルゴリズム (例の説明付き)
Naive Bayesian アルゴリズムのコード実装 (python)