朴素贝叶斯法(Naive Bayes,NB)

1. 朴素贝叶斯法的学习与分类

1.1 基本方法

  • 输入空间 χ R n \chi \subseteq R^n , n维向量的集合
  • 输出空间:类标记集合 Y = { c 1 , c 2 , . . . c k } Y'=\{c_1,c_2,...c_k\}
  • 输入:特征向量 x χ x \in \chi
  • 输出:类标记 y Y y \in Y'
  • X X 是空间 χ \chi 上的随机向量
  • Y Y 是输出空间 Y Y' 上的随机变量
  • 训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . ( x N , y N ) } T=\{(x_1,y_1),(x_2,y_2),...(x_N,y_N)\} P ( X , Y ) P(X,Y) 联合概率分布 独立同分布产生

目标 :通过训练数据集学习 联合概率分布 P ( X , Y ) P(X,Y)

  1. 先验概率分布: P ( Y = c k ) , k = 1 , 2 , . . . , K P(Y=c_k), k=1,2,...,K
  2. 条件概率分布: P ( X = x Y = c k ) = P ( X ( 1 ) = x ( 1 ) , . . . , X ( n ) = x ( n ) Y = c k ) , k = 1 , 2 , . . . , K P(X=x|Y=c_k)=P(X^{(1)}=x^{(1)},...,X^{(n)}=x^{(n)}|Y=c_k), k=1,2,...,K
    上面两项相乘即可得到联合概率 P ( X , Y ) P(X,Y)

但是 P ( X = x Y = c k ) P(X=x|Y=c_k) 指数级 数量的参数,如果 x ( j ) x^{(j)} 的取值有 S j S_j 个, j = 1 , 2 , . . . , n j=1,2,...,n Y Y 可取值有 K K 个,总的参数个数为 K j = 1 n S j K \prod_{j=1}^n S_j , 不可行

做出 条件独立性假设 X ( j ) X^{(j)} 之间独立
P ( X = x Y = c k ) = P ( X ( 1 ) = x ( 1 ) , . . . , X ( n ) = x ( n ) Y = c k ) = j = 1 n P ( X ( j ) = x ( j ) Y = c k ) ( 1 ) P(X=x|Y=c_k)=P(X^{(1)}=x^{(1)},...,X^{(n)}=x^{(n)}|Y=c_k)=\prod_{j=1}^nP(X^{(j)}=x^{(j)}|Y=c_k) \quad\quad (1)

朴素贝叶斯法实际上学习到生成数据的机制,所以属于生成模型。条件独立假设等于是说用于分类的特征在类确定的条件下都是条件独立的。这一假设使朴素贝叶斯法变得简单,但有时会牺牲一定的分类准确率。

朴素贝叶斯法分类时,对给定的输入 x x ,通过学习到的模型计算后验概率分布 P ( Y = c k X = x ) P(Y=c_k | X=x) ,将后验概率最大的类作为 x x 的类输出。

推导

P ( Y = c k X = x ) = P ( X = x Y = c k ) P ( Y = c k ) k P ( X = x Y = c k ) P ( Y = c k ) ( 2 ) P(Y=c_k | X=x) = \frac {P(X=x|Y=c_k)P(Y=c_k)}{\sum_k P(X=x|Y=c_k)P(Y=c_k)} \quad\quad (2) 贝叶斯定理

将(1)代入(2)有:

P ( Y = c k X = x ) = P ( Y = c k ) j P ( X ( j ) = x ( j ) Y = c k ) k P ( Y = c k ) j P ( X ( j ) = x ( j ) Y = c k ) , k = 1 , 2 , . . . , K P(Y=c_k | X=x) = \frac {P(Y=c_k)\prod_{j}P(X^{(j)}=x^{(j)}|Y=c_k)}{\sum_k P(Y=c_k)\prod_{j}P(X^{(j)}=x^{(j)}|Y=c_k)}, k=1,2,...,K \quad\quad

所以 朴素贝叶斯分类器表示为:

y = f ( x ) = a r g m a x c k P ( Y = c k ) j P ( X ( j ) = x ( j ) Y = c k ) k P ( Y = c k ) j P ( X ( j ) = x ( j ) Y = c k ) \color{red} y=f(x)=arg\quad max_{c_k} \frac {P(Y=c_k)\prod_{j}P(X^{(j)}=x^{(j)}|Y=c_k)}{\sum_k P(Y=c_k)\prod_{j}P(X^{(j)}=x^{(j)}|Y=c_k)}

上式中,分母对所有的 c k c_k 都是相同的,所以

y = a r g m a x c k P ( Y = c k ) j P ( X ( j ) = x ( j ) Y = c k ) \color{red} y=arg\quad max_{c_k} P(Y=c_k) \prod_{j} P(X^{(j)}=x^{(j)}|Y=c_k)

2. 参数估计

2.1 极大似然估计

  1. 先验概率: P ( Y = c k ) = ( y i = c k ) / N P(Y=c_k)=(y_i=c_k的样本数)/N
  2. 条件概率分布: P ( X ( j ) = x ( j ) Y = c k ) P(X^{(j)}=x^{(j)}|Y=c_k)
    设第 j j 个特征 x ( j ) x^{(j)} 可能的取值为 { a j 1 , a j 2 , . . . , a j S j } \{a_{j1},a_{j2},...,a_{jSj}\} , 条件概率的极大似然估计为:
    在这里插入图片描述
    x i ( j ) x_i^{(j)} 是第 i i 个样本的第 j j 个特征; a j l a_{jl} 是第 j j 个特征可能的第 l l 个值, I I 指示函数

2.2 学习与分类算法

朴素贝叶斯算法:

输入

  • 训练数据 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . ( x N , y N ) } T=\{(x_1,y_1),(x_2,y_2),...(x_N,y_N)\}
  • 其中 x i = ( x i ( 1 ) x i ( 2 ) , . . . , x i ( n ) ) T x_i=(x_i^{(1)},x_i^{(2)},...,x_i^{(n)})^T x i ( j ) x_i^{(j)} 是第 i i 个样本的第 j j 个特征
  • x i ( j ) { a j 1 , a j 2 , . . . , a j S j } x_i^{(j)} \in \{a_{j1},a_{j2},...,a_{jSj}\} , a j l a_{jl} 是第 j j 个特征可能的第 l l 个值, j = 1 , 2 , . . . , n ; l = 1 , 2 , . . . , S j j=1,2,...,n; l=1,2,...,S_j
  • y i { c 1 , c 2 , . . . c k } y_i \in \{c_1,c_2,...c_k\}
  • 实例 x x

输出

  • 实例 x x 的分类

步骤

  1. 计算先验概率及条件概率
    P ( Y = c k ) = ( y i = c k ) / N k = 1 , 2 , . . . , K P(Y=c_k)=(y_i=c_k的样本数)/N, k=1,2,...,K
    在这里插入图片描述
  2. 对于给定的实例 x = ( x ( 1 ) x ( 2 ) , . . . , x ( n ) ) T x=(x^{(1)},x^{(2)},...,x^{(n)})^T , 计算
    P ( Y = c k ) j = 1 n P ( X ( j ) = x ( j ) Y = c k ) , k = 1 , 2 , . . . , K P(Y=c_k) \prod_{j=1}^n P(X^{(j)}=x^{(j)}|Y=c_k), k=1,2,...,K
  3. 确定实例 x x 的类
    y = a r g m a x c k P ( Y = c k ) j = 1 n P ( X ( j ) = x ( j ) Y = c k ) y=arg\quad max_{c_k} P(Y=c_k) \prod_{j=1}^n P(X^{(j)}=x^{(j)}|Y=c_k)

2.2.1 例题

例题
用下表训练数据学习一个贝叶斯分类器并确定 x = ( 2 , S ) T x=(2,S)^T 的类标记 y y X ( 1 ) , X ( 2 ) X^{(1)},X^{(2)} 为特征,取值的集合分别为 A 1 = { 1 , 2 , 3 } , A 2 = { S , M , L } A_1=\{1,2,3\}, A_2=\{S,M,L\} , Y Y 为类标记, Y C = { 1 , 1 } Y \in C=\{1,-1\}

训练数据 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
X ( 1 ) X^{(1)} 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
X ( 2 ) X^{(2)} S M M S S S M M L L L M M L L
Y Y -1 -1 1 1 -1 -1 -1 1 1 1 1 1 1 1 -1


先验概率 P ( Y = 1 ) = 9 15 , P ( Y = 1 ) = 6 15 P(Y=1)=\frac {9}{15},P(Y=-1)=\frac {6}{15}
条件概率
P ( X ( 1 ) = 1 Y = 1 ) = 2 9 , P ( X ( 1 ) = 2 Y = 1 ) = 3 9 , P ( X ( 1 ) = 3 Y = 1 ) = 4 9 P(X^{(1)}=1|Y=1)= \frac{2}{9},P(X^{(1)}=2|Y=1)= \frac{3}{9},P(X^{(1)}=3|Y=1)= \frac{4}{9}
P ( X ( 2 ) = S Y = 1 ) = 1 9 , P ( X ( 2 ) = M Y = 1 ) = 4 9 , P ( X ( 2 ) = L Y = 1 ) = 4 9 P(X^{(2)}=S|Y=1)= \frac{1}{9},P(X^{(2)}=M|Y=1)= \frac{4}{9},P(X^{(2)}=L|Y=1)= \frac{4}{9}
P ( X ( 1 ) = 1 Y = 1 ) = 3 6 , P ( X ( 1 ) = 2 Y = 1 ) = 2 6 , P ( X ( 1 ) = 3 Y = 1 ) = 1 6 P(X^{(1)}=1|Y=-1)= \frac{3}{6},P(X^{(1)}=2|Y=-1)= \frac{2}{6},P(X^{(1)}=3|Y=-1)= \frac{1}{6}
P ( X ( 2 ) = S Y = 1 ) = 3 6 , P ( X ( 2 ) = M Y = 1 ) = 2 6 , P ( X ( 2 ) = L Y = 1 ) = 1 6 P(X^{(2)}=S|Y=-1)= \frac{3}{6},P(X^{(2)}=M|Y=-1)= \frac{2}{6},P(X^{(2)}=L|Y=-1)= \frac{1}{6}

对给定的 x = ( 2 , S ) T x=(2,S)^T 计算:
Y = 1 Y=1 时:
  P ( Y = 1 ) P ( X ( 1 ) = 2 Y = 1 ) P ( X ( 2 ) = S Y = 1 ) = 9 15 3 9 1 9 = 1 45 \quad \quad\ \quad P(Y=1)P(X^{(1)}=2|Y=1)P(X^{(2)}=S|Y=1)=\frac{9}{15} * \frac{3}{9} * \frac{1}{9} = \frac{1}{45}
Y = 1 Y=-1 时:
  P ( Y = 1 ) P ( X ( 1 ) = 2 Y = 1 ) P ( X ( 2 ) = S Y = 1 ) = 6 15 2 6 3 6 = 1 15 \quad \quad\ \quad P(Y=-1)P(X^{(1)}=2|Y=-1)P(X^{(2)}=S|Y=-1)=\frac{6}{15} * \frac{2}{6} * \frac{3}{6} = \frac{1}{15}

Y = 1 Y=-1 时的概率最大,所以 y = 1 y=-1

2.2.2 例题代码

# -*- coding:utf-8 -*-
# Python 3.7
# @Time: 2020/1/19 22:08
# @Author: Michael Ming
# @Website: https://michael.blog.csdn.net/
# @File: naiveBayes.py

import numpy as np
data = [[1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3],
        ['S', 'M', 'M', 'S', 'S', 'S', 'M', 'M', 'L', 'L', 'L', 'M', 'M', 'L', 'L'],
        [-1, -1, 1, 1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, -1]]
X1 = []
X2 = []
Y = []
for i in range(len(data[0])):  # 统计数据种类
    if data[0][i] not in X1:
        X1.append(data[0][i])
    if data[1][i] not in X2:
        X2.append(data[1][i])
    if data[2][i] not in Y:
        Y.append(data[2][i])
nY = [0] * len(Y)
for i in range(len(data[0])):  # 统计Yi的数量
    nY[Y.index(data[2][i])] += 1
PY = [0.0] * len(Y)
for i in range(len(Y)):
    PY[i] = nY[i] / len(data[0])  # Yi的概率
PX1_Y = np.zeros((len(X1), len(Y)))  # 条件概率
PX2_Y = np.zeros((len(X2), len(Y)))

for i in range(len(data[0])):
    PX1_Y[X1.index(data[0][i])][Y.index(data[2][i])] += 1  # 统计频数
    PX2_Y[X2.index(data[1][i])][Y.index(data[2][i])] += 1
for i in range(len(Y)):
    PX1_Y[:, i] /= nY[i]  # 转成条件概率
    PX2_Y[:, i] /= nY[i]
x = [2, 'S']
PX_Y = [PX1_Y, PX2_Y]
X = [X1, X2]
ProbY = [0.0] * len(Y)
for i in range(len(Y)):
    ProbY[i] = PY[i]
    for j in range(len(x)):
        ProbY[i] *= PX_Y[j][X[j].index(x[j])][i]
maxProb = -1
idx = -1
for i in range(len(Y)):  # 取最大的概率
    if ProbY[i] > maxProb:
        maxProb = ProbY[i]
        idx = i
print(Y)
print(ProbY)
print(x, ", 最有可能对应的贝叶斯估计 y = %d" % (Y[idx]))
# 运行结果
[-1, 1]
[0.06666666666666667, 0.02222222222222222]
[2, 'S'] , 最有可能对应的贝叶斯估计 y = -1

2.3 贝叶斯估计(平滑)

用极大似然估计可能会出现所要估计的概率值为0的情况。会影响到后验概率的计算结果,使分类产生偏差。解决方法是采用贝叶斯估计。

条件概率的贝叶斯估计:
在这里插入图片描述
式中 λ 0 \lambda \geq 0 , 取 0 时,就是极大似然估计;
取正数,对随机变量各个取值的频数上赋予一个正数;
常取 λ = 1 \lambda = 1 ,这时称为 拉普拉斯平滑(Laplacian smoothing)

先验概率的贝叶斯估计:
在这里插入图片描述

2.3.1 例题

例题:(与上面一致,采用拉普拉斯平滑估计概率,取 λ = 1 \lambda=1
用下表训练数据学习一个贝叶斯分类器并确定 x = ( 2 , S ) T x=(2,S)^T 的类标记 y y X ( 1 ) , X ( 2 ) X^{(1)},X^{(2)} 为特征,取值的集合分别为 A 1 = { 1 , 2 , 3 } , A 2 = { S , M , L } A_1=\{1,2,3\}, A_2=\{S,M,L\} , Y Y 为类标记, Y C = { 1 , 1 } Y \in C=\{1,-1\}

训练数据 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
X ( 1 ) X^{(1)} 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
X ( 2 ) X^{(2)} S M M S S S M M L L L M M L L
Y Y -1 -1 1 1 -1 -1 -1 1 1 1 1 1 1 1 -1


先验概率 P ( Y = 1 ) = 9 + 1 15 + 2 = 10 17 , P ( Y = 1 ) = 6 + 1 15 + 2 = 7 17 P(Y=1)=\frac {9+1}{15+2}=\frac{10}{17},P(Y=-1)=\frac {6+1}{15+2}=\frac{7}{17}
条件概率
P ( X ( 1 ) = 1 Y = 1 ) = 2 + 1 9 + 3 = 3 12 , P ( X ( 1 ) = 2 Y = 1 ) = 3 + 1 9 + 3 = 4 12 , P ( X ( 1 ) = 3 Y = 1 ) = 4 + 1 9 + 3 = 5 12 P(X^{(1)}=1|Y=1)= \frac{2+1}{9+3}=\frac{3}{12},P(X^{(1)}=2|Y=1)= \frac{3+1}{9+3}=\frac{4}{12},P(X^{(1)}=3|Y=1)= \frac{4+1}{9+3}=\frac{5}{12}
P ( X ( 2 ) = S Y = 1 ) = 1 + 1 9 + 3 = 2 12 , P ( X ( 2 ) = M Y = 1 ) = 4 + 1 9 + 3 = 5 12 , P ( X ( 2 ) = L Y = 1 ) = 4 + 1 9 + 3 = 5 12 P(X^{(2)}=S|Y=1)= \frac{1+1}{9+3}=\frac{2}{12},P(X^{(2)}=M|Y=1)= \frac{4+1}{9+3}=\frac{5}{12},P(X^{(2)}=L|Y=1)= \frac{4+1}{9+3}=\frac{5}{12}
P ( X ( 1 ) = 1 Y = 1 ) = 3 + 1 6 + 3 = 4 9 , P ( X ( 1 ) = 2 Y = 1 ) = 2 + 1 6 + 3 = 3 9 , P ( X ( 1 ) = 3 Y = 1 ) = 1 + 1 6 + 3 = 2 9 P(X^{(1)}=1|Y=-1)= \frac{3+1}{6+3}=\frac{4}{9},P(X^{(1)}=2|Y=-1)= \frac{2+1}{6+3}=\frac{3}{9},P(X^{(1)}=3|Y=-1)= \frac{1+1}{6+3}=\frac{2}{9}
P ( X ( 2 ) = S Y = 1 ) = 3 + 1 6 + 3 = 4 9 , P ( X ( 2 ) = M Y = 1 ) = 2 + 1 6 + 3 = 3 9 , P ( X ( 2 ) = L Y = 1 ) = 1 + 1 6 + 3 = 2 9 P(X^{(2)}=S|Y=-1)= \frac{3+1}{6+3}=\frac{4}{9},P(X^{(2)}=M|Y=-1)= \frac{2+1}{6+3}=\frac{3}{9},P(X^{(2)}=L|Y=-1)= \frac{1+1}{6+3}=\frac{2}{9}

对给定的 x = ( 2 , S ) T x=(2,S)^T 计算:
Y = 1 Y=1 时:
  P ( Y = 1 ) P ( X ( 1 ) = 2 Y = 1 ) P ( X ( 2 ) = S Y = 1 ) = 10 17 4 12 2 12 = 5 153 = 0.0327 \quad \quad\ \quad P(Y=1)P(X^{(1)}=2|Y=1)P(X^{(2)}=S|Y=1)=\frac{10}{17} * \frac{4}{12} * \frac{2}{12} = \frac{5}{153}=0.0327
Y = 1 Y=-1 时:
  P ( Y = 1 ) P ( X ( 1 ) = 2 Y = 1 ) P ( X ( 2 ) = S Y = 1 ) = 7 17 3 9 4 9 = 28 459 = 0.0610 \quad \quad\ \quad P(Y=-1)P(X^{(1)}=2|Y=-1)P(X^{(2)}=S|Y=-1)=\frac{7}{17} * \frac{3}{9} * \frac{4}{9} = \frac{28}{459}=0.0610

Y = 1 Y=-1 时的概率最大,所以 y = 1 y=-1

2.3.2 例题代码

# -*- coding:utf-8 -*-
# Python 3.7
# @Time: 2020/1/19 22:08
# @Author: Michael Ming
# @Website: https://michael.blog.csdn.net/
# @File: naiveBayes.py

import numpy as np

data = [[1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3],
        ['S', 'M', 'M', 'S', 'S', 'S', 'M', 'M', 'L', 'L', 'L', 'M', 'M', 'L', 'L'],
        [-1, -1, 1, 1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, -1]]
X1 = []
X2 = []
Y = []
for i in range(len(data[0])):  # 统计数据种类
    if data[0][i] not in X1:
        X1.append(data[0][i])
    if data[1][i] not in X2:
        X2.append(data[1][i])
    if data[2][i] not in Y:
        Y.append(data[2][i])
nY = [0] * len(Y)
for i in range(len(data[0])):  # 统计Yi的数量
    nY[Y.index(data[2][i])] += 1
PY = [0.0] * len(Y)
for i in range(len(Y)):
    PY[i] = (nY[i]+1) / (len(data[0])+len(Y)) # Yi的概率,+1为平滑
PX1_Y = np.zeros((len(X1), len(Y)))  # 条件概率
PX2_Y = np.zeros((len(X2), len(Y)))

for i in range(len(data[0])):
    PX1_Y[X1.index(data[0][i])][Y.index(data[2][i])] += 1  # 统计频数
    PX2_Y[X2.index(data[1][i])][Y.index(data[2][i])] += 1
for i in range(len(Y)):
    PX1_Y[:, i] = (PX1_Y[:, i] + 1)/(nY[i]+len(X1))  # 转成条件概率,带平滑
    PX2_Y[:, i] = (PX2_Y[:, i] + 1)/(nY[i]+len(X2))
x = [2, 'S']
PX_Y = [PX1_Y, PX2_Y]
X = [X1, X2]
ProbY = [0.0] * len(Y)
for i in range(len(Y)):
    ProbY[i] = PY[i]
    for j in range(len(x)):
        ProbY[i] *= PX_Y[j][X[j].index(x[j])][i]
maxProb = -1
idx = -1
for i in range(len(Y)):  # 取最大的概率
    if ProbY[i] > maxProb:
        maxProb = ProbY[i]
        idx = i
print(Y)
print(ProbY)
print(x, ", 最有可能对应的贝叶斯估计 y = %d" % (Y[idx]))
# 运行结果
[-1, 1]
[0.06100217864923746, 0.0326797385620915]
[2, 'S'] , 最有可能对应的贝叶斯估计 y = -1
发布了599 篇原创文章 · 获赞 473 · 访问量 10万+

猜你喜欢

转载自blog.csdn.net/qq_21201267/article/details/104033972
今日推荐