python bayesian algorithm

Self-understanding Bayesian algorithm is to judge whether C belongs to class A or class B by probability. The following is the specific code (python3.5 test passed)

A wave of text flow explanations

  1)   Load the training data and the category corresponding to the training data

  2)    Generate a vocabulary set, which is the union of all training data

  3)   Generate a vector set of training data, that is, a vector set containing only 0 and 1

  4)   Calculate the probabilities of the training data

  5)   Load test data

  6)    Generate a vector set of test data

  7)    The probability of test data vector * training data is finally summed

  8)   Get the category of the test data

specific code implementation

from numpy import *
 #Bayesian algorithm

def loadDataSet():
    trainData=[['my', 'dog', 'has', 'flea', 'problems', 'help', 'please'],
                 ['maybe', 'not', 'take', 'him', 'to', 'dog', 'park', 'stupid'],
                 ['my', 'dalmation', 'is', 'so', 'cute', 'I', 'love', 'him'],
                 ['stop', 'posting', 'stupid', 'worthless', 'garbage'],
                 ['mr', 'licks', 'ate', 'my', 'steak', 'how', 'to', 'stop', 'him'],
                 ['quit', 'buying', 'worthless', 'dog', 'food', 'stupid']]
    labels =[0, 1, 0, 1, 0, 1] # 1 means insulting speech, 0 means normal speech 
    return trainData, labels

#Generate vocabulary 
def createVocabList(trainData):
    VocabList = set([])
    for item in trainData:
        VocabList = VocabList|set(item) #Take the union of two sets 
    return sorted(list(VocabList)) #Return     after sorting the results

#Generate a vector set containing only 0 and 1 for the training data 
def createWordSet(VocabList, trainData):
    VocabList_len = len(VocabList) #The    length of the vocabulary set 
    trainData_len = len(trainData) #The    length of the training data 
    WordSet = zeros((trainData_len,VocabList_len)) #The      length of the row is the length of the training data and the length of the column is the length of the vocabulary set. 
    for index in range(0,trainData_len):
         for word in trainData[index]:
             if word in VocabList: #In      fact , the position corresponding to the word contained in the training data is 1 and the other is 0
                WordSet[index][VocabList.index(word)] = 1
    return WordSet

#Calculate the probability of each vector set 
def opreationProbability(WordSet, labels):
       WordSet_col = len(WordSet[0])
       labels_len = len (labels)
       WordSet_labels_0 = zeros(WordSet_col)
       WordSet_labels_1 = zeros(WordSet_col)
       num_labels_0 = 0
       num_labels_1 = 0
       for index in range(0,labels_len):
           if labels[index] == 0:
               WordSet_labels_0 += WordSet[index]        #Vector addition num_labels_0 
               += 1                         #Count else :
           
               WordSet_labels_1 += WordSet[index]        #Vector addition 
               num_labels_1 += 1                         #Count p0 = WordSet_labels_0 
       * num_labels_0 / labels_len
       p1 = WordSet_labels_1 * num_labels_1 / labels_len
       return p0, p1


trainData, labels = loadDataSet()
VocabList = createVocabList(trainData)
train_WordSet = createWordSet(VocabList,trainData)
p0, p1 = opreationProbability(train_WordSet, labels) #At
 this point, even if the training is completed 
#Start testing 
testData = [[ ' not ' , ' take ' , ' ate ' , ' my ' , ' stupid ' ]]      #Test data 

test_WordSet = createWordSet(VocabList, testData) #vector       set of test data 

res_test_0 = sum(p0 * test_WordSet)
res_test_1 = sum(p1 * test_WordSet) if res_test_0 > res_test_1:
     print ( " belongs to category 0 " )
 else :
     print ( " belongs to category 1 " )

Solemnly declare:

  I found that the results I calculated were different from those of others, and the conclusions were the same. I don't know the specific reason. This is how I understand it. Maybe my understanding is wrong. I hope God can give me some guidance. . .

Partly see the blog post of the Great God

  Link https://blog.csdn.net/moxigandashu/article/details/71480251

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324849090&siteId=291194637
Recommended