机器学习之朴素贝叶斯法

机器学习之朴素贝叶斯法

简介

朴素贝叶斯法是一种分类方法,主要是从训练集中根据统计规律学得分类方法。对于给定的输入x,利用贝叶斯定理求出后验概率最高的类别y

基本方法和推导

  • Input: x = ( x ( 1 ) , x ( 2 ) , . . . , x n ) x = (x^{(1)}, x^{(2)}, ..., x^{n}) x=(x(1),x(2),...,xn),为输入变量的特征向量
  • Output:类别 y ∈ { c 1 , c 2 , . . . , c k } y\in \{c_1, c_2, ..., c_k\} y{ c1,c2,...,ck}

定义 X X X为输入空间上的随机变量, Y Y Y是输出空间中的随机变量,我们要求的便是 m a x k ( p ( Y = c k ∣ X = x ) ) max_{k}(p(Y = c_k|X=x)) maxk(p(Y=ckX=x)),具体的方法是求出每一个 P k ( Y = c k ∣ X = x ) P_k(Y=c_k|X=x) Pk(Y=ckX=x),然后选择概率最大的那个分类。

这个概率是后验概率,没办法直接通过统计数据得来(关于先验概率和后验概率),所以需要利用到贝叶斯定理:
P ( Y = c k ∣ X = x ) = P ( Y = c , X = x ) P ( X = x ) = P ( X = x ∣ Y = c k ) P ( Y = c k ) ) ∑ k P ( X = x ∣ Y = c k ) P ( Y = c k ) (1) P(Y=c_k|X=x) = \frac{P(Y=c, X=x)}{P(X=x)} = \\\frac{P(X=x|Y=c_k)P(Y=c_k))}{\sum_kP(X=x|Y=c_k)P(Y=c_k)} \tag{1} P(Y=ckX=x)=P(X=x)P(Y=c,X=x)=kP(X=xY=ck)P(Y=ck)P(X=xY=ck)P(Y=ck))(1)
这里为了降低计算的复杂度做了一个合理的假设:输入空间中的特征都是不相关的,则:
P ( X = x ∣ Y = c k ) = P ( X ( 1 ) = x ( 1 ) , X ( 2 ) = x ( 2 ) , . . . , X ( n ) = x ( n ) ∣ Y = c k ) = Π j = 1 n P ( X ( j ) = x ( k ) ∣ Y = c k ) (2) P(X=x|Y=c_k) = P(X^{(1)} = x^{(1)}, X^{(2)} \\= x^{(2)}, ..., X^{(n)} = x^{(n)}|Y=c_k) \\= \Pi_{j=1}^nP(X^{(j)}=x^{(k)}|Y=c_k) \tag{2} P(X=xY=ck)=P(X(1)=x(1),X(2)=x(2),...,X(n)=x(n)Y=ck)=Πj=1nP(X(j)=x(k)Y=ck)(2)
这也是为什么叫“朴素贝叶斯法”(naive bayes),是通过合理性假设简化了运算的缘故。
则将(2)式代入到(1)式可以得到:
P ( Y = c k ∣ X = x ) = P ( Y = c k ) Π j P ( X ( j ) = x ( j ) ∣ Y = c k ) ∑ k P ( Y = c k ) Π j P ( X ( j ) = x ( j ) ∣ Y = c k ) (3) P(Y=c_k|X=x) \\= \frac{P(Y=c_k)\Pi_{j}P(X^{(j)} \\= x^{(j)}|Y=c_k)}{\sum_kP(Y=c_k)\Pi_{j}P(X^{(j)} = x^{(j)}|Y=c_k)} \tag{3} P(Y=ckX=x)=kP(Y=ck)ΠjP(X(j)=x(j)Y=ck)P(Y=ck)ΠjP(X(j)=x(j)Y=ck)(3)
这里令 P k ′ = P ( Y = c k ) Π j P ( X ( j ) = x ( j ) ∣ Y = c k ) P^{'}_k = P(Y=c_k)\Pi_{j}P(X^{(j)} = x^{(j)}|Y=c_k) Pk=P(Y=ck)ΠjP(X(j)=x(j)Y=ck),则(3)式可变为:
P ( Y = c k ∣ X = x ) = P k ′ ∑ k P k ′ (4) P(Y=c_k|X=x) = \frac{P^{'}_k}{\sum_kP^{'}_k}\tag{4} P(Y=ckX=x)=kPkPk(4)
很明(4)分母对于一个训练集来说是一个定植,那么我们的朴素贝叶斯分类器便可以表示为:
y = a r g m a x P k ′ = a r g m a x P ( Y = c k ) Π j P ( X ( j ) = x ( j ) ∣ Y = c k ) (5) y = argmaxP^{'}_k \\= argmaxP(Y=c_k)\Pi_{j}P(X^{(j)} = x^{(j)}|Y=c_k)\tag{5} y=argmaxPk=argmaxP(Y=ck)ΠjP(X(j)=x(j)Y=ck)(5)

参数估计

我们已经得到了朴素贝叶斯分类器的模型表示,接下来就是要求这个模型的参数。对于朴素贝叶斯法来说,需要学习的参数就是 P ( Y = c k ) P(Y=c_k) P(Y=ck) P ( X ( j ) = x ( j ) ∣ Y = c k ) P(X^{(j)} = x^{(j)}|Y=c_k) P(X(j)=x(j)Y=ck),这里有两种参数估计的方法:

极大似然估计

估计方法:

  • 先验概率 P ( Y = c k ) P(Y=c_k) P(Y=ck)的极大似然估计是: P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N P(Y=c_k) = \frac{\sum_{i=1}^NI(y_i = c_k)}{N} P(Y=ck)=Ni=1NI(yi=ck),意思是在N训练实例中找出类别为 c k c_k ck的个数, I ( y i = c k ) I(y_i = c_k) I(yi=ck)是一个指示函数,是指 y i = c k y_i = c_k yi=ck为真的话返回1,否则返回0
  • 假设第 j j j个特征 x ( j ) x^{(j)} x(j)可能取值的集合为 a j 1 , a j 2 , . . . , a j S j {a_{j1}, a_{j2}, ..., a_{jS_j}} aj1,aj2,...,ajSj 条件概率 P ( X ( j ) = a j l ∣ Y = c k ) P(X^{(j)} = a_{jl}|Y=c_k) P(X(j)=ajlY=ck)的极大似然估计是
    P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) P(X^{(j)} = a_{jl}|Y=c_k) = \frac{\sum_{i=1}^NI(x_i^{(j)}=a_{jl}, y_i=c_k)}{\sum_{i=1}^NI(y_i=c_k)} P(X(j)=ajlY=ck)=i=1NI(yi=ck)i=1NI(xi(j)=ajl,yi=ck)
    意思是从分类 c k c_k ck中求得特征是 a j l a_{jl} ajl的先验概率

贝叶斯估计

用极大似然估计会有一个缺点:会出现所要估计的概率值为0的情况。假如某一项的数据为0,则 P ( X ( i ) = x ( i ) ∣ Y = c k ) = 0 P(X^{(i)} = x^{(i)}|Y=c_k) = 0 P(X(i)=x(i)Y=ck)=0,则整个概率都为0,这使得模型的泛化效果不太好。
估计方法:

  • 先验概率 P λ ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) + λ N + K λ P_{\lambda}(Y=c_k) = \frac{\sum_{i=1}^NI(y_i = c_k) + \lambda}{N + K\lambda} Pλ(Y=ck)=N+Kλi=1NI(yi=ck)+λ
  • 条件概率 P λ ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) + λ ∑ i = 1 N I ( y i = c k ) + S j λ P_{\lambda}(X^{(j)} = a_{jl}|Y=c_k) \\= \frac{\sum_{i=1}^NI(x_i^{(j)} = a_{jl}, y_i = c_k) + \lambda}{\sum_{i=1}^NI(y_i=c_k) + S_j\lambda} Pλ(X(j)=ajlY=ck)=i=1NI(yi=ck)+Sjλi=1NI(xi(j)=ajl,yi=ck)+λ
    意思是在随机变量各个取值的频数上赋予一个正数 λ \lambda λ,当 λ = 0 \lambda = 0 λ=0时便是最大似然估计。当 λ = 1 \lambda=1 λ=1时,称为拉普拉斯平滑(Laplace smoothing)。对于任何 l = 1 , 2 , . . . , S j , k = 1 , 2 , . . . , K l = 1, 2, ..., S_j, k = 1, 2, ..., K l=1,2,...,Sj,k=1,2,...,K有:
    P λ ( X ( j ) = a j l ∣ Y = c k ) > 0 , ∑ l = 1 S j P ( X ( j ) = a j l ∣ Y = c k ) = 1 P_{\lambda}(X^{(j)} = a_{jl}|Y=c_k)>0, \\\sum_{l=1}^{S_j}P(X^(j) = a_{jl}|Y=c_k) = 1 Pλ(X(j)=ajlY=ck)>0,l=1SjP(X(j)=ajlY=ck)=1

学习与分类算法

  • 计算先验概率 P ( Y = c k ) P(Y=c_k) P(Y=ck)和条件概率 P ( X ( j ) = a j l ∣ Y = c k ) P(X^{(j)} = a_{jl}|Y=c_k) P(X(j)=ajlY=ck)
  • 对于输入的实例 x = ( x ( 1 ) , x ( 2 ) , . . . , x ( n ) ) x=(x^{(1)}, x^{(2)}, ..., x^{(n)}) x=(x(1),x(2),...,x(n)),计算 P ( Y = c k ) Π j = 1 n ( X ( j ) = x ( j ) ∣ Y = c k ) P(Y=c_k)\Pi_{j=1}^n(X^{(j)} = x^{(j)}|Y=c_k) P(Y=ck)Πj=1n(X(j)=x(j)Y=ck)
  • 根据后验概率的大小确定实例的分类: y = a r g m a x c k P ( Y = c k ) Π j = 1 n ( X ( j ) = x ( j ) ∣ Y = c k ) y = argmax_{c_k}P(Y=c_k)\Pi_{j=1}^n(X^{(j)} = x^{(j)}|Y=c_k) y=argmaxckP(Y=ck)Πj=1n(X(j)=x(j)Y=ck)

代码实现

class Bayes_classifier:
	def __init__(self, data_set, num_class, num_eigen):
		self.data_set = data_set
		self.num_node = data_set.shape[0]
		self.num_class = num_class
		self.num_eigen = num_eigen
		self.prior_probability = np.zeros(self.num_class)
		self.conditional_probability = np.zeros((self.num_class, self.num_eigen, 3))

	def learn(self):
		for node in self.data_set:
			t_class = node[self.num_eigen]
			self.prior_probability[t_class] += 1
		self.prior_probability = self.prior_probability / self.num_node

		for c in range(self.num_class):
			for node in self.data_set:
				if node[self.num_eigen] == c:
					for j in range(self.num_eigen):
						self.conditional_probability[node[self.num_eigen], j, node[j]] += 1

			self.conditional_probability[c, :, :] = self.conditional_probability[c, :, :] / (self.prior_probability[c] * self.num_node)

	def classify(self, node):
		l = []
		for c in range(self.num_class):
			p_ck = self.prior_probability[c]
			p_con = 1
			for i in range(self.num_eigen):
				p_con *= self.conditional_probability[c, i, node[i]]
			l.append(p_ck * p_con)
		idx = np.argsort(l)
		return idx[len(idx)-1]

	def get_model(self):
		return self.prior_probability, self.conditional_probability

github

猜你喜欢

转载自blog.csdn.net/qq_44026293/article/details/104606891