Bayesian probability python achieve network derivation (Probabilistic Inference)

EDITORIAL

This is HIT2019 artificial intelligence experiment Third, because of time constraints, the code without any optimization, experimental algorithm for reference only.

Experimental requirements

Bayesian probability to achieve network derivation (Probabilistic Inference)

Specific experimental instructions see github

Firstly, the code here

Knowledge section

About learning Bayesian networks, I refer to is this blog

Bayesian networks (belief network)

This blog is about, although comprehensive, but details, especially the Bayesian network probability derived concrete realization part, in passing. However, this experiment is to achieve the requirements of Bayesian network probability is derived, so I finished learning on the basis of this blog, the teacher again issued ppt school again, (due ppt in English, at the beginning I was refused school), and finally pick a focus and looked under the blog ppt, feeling suddenly.

So if not studied Bayesian network, we recommend learning the order I listed above.

Since ppt larger, thus given here in form of a mesh disc, extraction code: cn3h , the learning reference ppt for personal, non-profit spread form

About Bayesian network probability derivation, the most important is the following two formulas:

These two formulas specifically what it means in both online or ppt to explain, not repeat them here. Focuses on two core formula is the formula to complete this experiment code , which I realized after the completion of the experiment, when learning ppt before, due to the large formula, and do not realize the importance of these two formulas.

Experiment Code

github address where the code has been given

It requires noted that, since the test data format specified, so the code is done entirely in the specified data format requirements, does not have a universal, so experiment code algorithm for reference .

Design cpt format is as follows:

class cpt:
    def __init__(self, name, parents, probabilities):
        self.name = name
        self.parents = parents
        self.probabilities = probabilities

Bayesian network code is as follows

from cpt import cpt

class BN:
    def __init__(self, nums, variables, graph, cpts):
        self.nums = nums
        self.variables = variables
        self.graph = graph
        self.cpts = cpts
        # 创建一个名字与编号的字典,便于查找
        index_list = [i for i in range(self.nums)]
        self.variables_dict = dict(zip(self.variables, index_list))
        # 计算全概率矩阵
        self.TotalProbability = self.calculateTotalProbability()

    def calculateProbability(self, event):
        # 分别计算待求变量个数k1和待消除变量个数k2,剩余的为条件变量个数
        k1 = self.count(event, 2)
        k2 = self.count(event, 3)

        probability = []
        for i in range(2**k1):
            p = 0
            for j in range(2**k2):
                index = self.calculateIndex(self.int2bin_list(i, k1), self.int2bin_list(j, k2), event)
                p = p + self.TotalProbability[index]
            probability.append(p)
        # 最后输出的概率矩阵的格式:先输出true,再输出false
        return list(reversed([x/sum(probability) for x in probability]))

    def calculateTotalProbability(self):
        # 全概率矩阵为一个1 * 2^n大小的矩阵,将列号转化为2进制,可表示事件的发生情况
        # 例如共有5个变量,则第7列的概率为p,表示事件00111(12不发生,345发生)发生的概率为p
        TotalProbability = [0 for i in range(2 ** self.nums)]
        for i in range(2 ** self.nums):
            p = 1
            binary_list = self.int2bin_list(i,self.nums)
            for j in range(self.nums):
                # 分没有父节点和有父节点的情况
                # 注意python float在相乘时会产生不精确的问题,因此每次相乘前先乘1000将其转化成整数相乘,最后再除回来
                if self.cpts[j].parents == []:
                    p = p * (self.cpts[j].probabilities[0][1-binary_list[j]] * 1000)
                else:
                    parents_list = self.cpts[j].parents
                    parents_index_list = [self.variables_dict[k] for k in parents_list]
                    index = self.bin_list2int([binary_list[k] for k in parents_index_list])
                    p = p * (self.cpts[j].probabilities[index][1 - binary_list[j]] * 1000)
            TotalProbability[i] = p / 10 ** (self.nums * 3)
        return TotalProbability

    def int2bin_list(self, a, b):
        # 将列号转化成指定长度的二进制数组
        # 下面两句话的含义:将a转化成二进制字符串,然后分割成字符串数组,再将字符串数组转化成整形数组
        # 若得到的整型数组长度不满足self.nums,则在前面补上相应的零
        binary_list = list(map(int, list(bin(a).replace("0b", ''))))
        binary_list = (b - len(binary_list)) * [0] + binary_list
        return binary_list

    def bin_list2int(self, b):
        # 将二进制的数组转化成整数
        result = 0
        for i in range(len(b)):
            result = result + b[len(b)-1-i] * (2 ** i)
        return result

    def calculateIndex(self, i, j, event):
        # 用于生成下标
        # 原理暂略
        index_list = []
        for k in range(len(event)):
            if event[k] == 2:
                index_list.append(i[0])
                del(i[0])
            elif event[k] == 3:
                index_list.append(j[0])
                del(j[0])
            else:
                index_list.append(event[k])

        return self.bin_list2int(index_list)

    def count(self, list, a):
        # 用于统计一个list中含有多少个指定的数字
        c = 0
        for i in list:
            if i == a:
                c = c + 1
        return c

The main routine experimentation (including a function of reading the specified data file) as follows:

import sys

from BN import BN
from cpt import cpt

# 读取文件并生成一个贝叶斯网络
def readBN(filename):
    f = open(filename, 'r')
    # 读取变量数
    nums = int(f.readline())
    f.readline()
    # 读取变量名称
    variables = f.readline()[:-1].split(' ')
    f.readline()
    # 读取有向图邻接表
    graph = []
    for i in range(nums):
        line = f.readline()[:-1].split(' ')
        graph.append(list(map(int, line)))
    f.readline()
    # 读取cpt表
    # 注意,文件中数据格式必须完全按照指定要求,不可有多余的空行或空格
    cpts = []
    for i in range(nums):
        probabilities = []
        while True:
            line = f.readline()[:-1].split(' ')
            if line != ['']:
                probabilities.append(list(map(float, line)))
            else:
                break
        CPT = cpt(variables[i], [], probabilities)
        cpts.append(CPT)
    f.close()
    # 根据邻接表为每个节点生成其父亲节点
    # 注意,这里父亲节点的顺序是按照输入的variables的顺序排列的,不保证更换测试文件时的正确性
    for i in range(nums):
        for j in range(nums):
            if graph[i][j] == 1:
                cpts[j].parents.append(variables[i])

    # 测试父节点生成情况
    # for i in range(nums):
    #     print(cpts[i].parents)
    bayesnet = BN(nums, variables, graph, cpts)
    return bayesnet

# 读取需要求取概率的命令
def readEvents(filename, variables):
    # 条件概率在本程序中的表示:
    # 对变量分类,2表示待求的变量,3表示隐含的需要被消去的变量,0和1表示条件变量的false和true
    # 例如变量为[Burglar, Earthquake, Alarm, John, Mary]
    # 待求的条件概率为P(Burglar | John=true, Mary=false),则event为[2, 3, 3, 1, 0]
    f = open(filename, 'r')
    events = []
    while True:
        line = f.readline()
        event = []
        if line == "\n":
            continue
        elif not line:
            break
        else:
            for v in variables:
                index = line.find(v)
                if index != -1:
                    if line[index+len(v)] == ' ' or line[index+len(v)] == ',':
                        event.append(2)
                    elif line[index+len(v)] == '=':
                        if line[index+len(v)+1] == 't':
                            event.append(1)
                        else:
                            event.append(0)
                else:
                    event.append(3)
            # 检查文本错误
            if len(event) != len(variables):
                sys.exit()
            events.append(event)
    return events

# 主程序
filename1 = "burglarnetwork.txt"
bayesnet = readBN(filename1)
filename2 = "burglarqueries.txt"
events = readEvents(filename2, bayesnet.variables)
for event in events:
    print(bayesnet.calculateProbability(event))

Knowledge summary

After this part of the recording reference during the experiment blog, convenient review

Since there is no system learned python, python which there are a lot of basic skills, it seems we still have to learn the system again

Judgment python readline end of the document read
this blog is how to judge read reference when reading file

python string and integer, floating-point conversion with each other
this blog is how to read from the reference file incoming text into data

python- use the list to create a dictionary
of this blog is to create a reference dictionary with a list of ways

python find character in a string
in Python, how to convert a string array to an array of integers
Python-8, Python how binary string is converted into an integer
which is three blog same reference processing data read

Python3 float (float) operation is not the right approach
because the probability of multiple floating-point multiply even at the time, leading to the emergence of large errors, so check this blog, but in the end did not use Decimal module, but directly take 1000 and divide 1000 resolved.

Python skills (three) - list three approaches delete an element

Query whether python numpy array has a certain number of the total number of
this blog, I tried it can not be found, the error can not be said for Boolean sum, the problem is probably the python version of it, I do not get to the bottom of this, myself I wrote a count function

python with a digital multiplying the number list
to another list in a method and a digital method of multiplying the number of line is generally given, it generates an array may be multiplied by the number of directly numpy library. However, since I did not use the whole numpy, I do not want this place alone with a numpy, so the use of the method of this blog.

python reverse list of three ways
since the output results of the experiment instructions specified in the I counted out the opposite, so flip a list

Experimental summary

The conclusion of the experiment sentence effect: I Bayesian networks for probabilistic derivation of a more thorough understanding

If you do not finish the experiment realized a few push Bayesian network probability is derived, it is almost equivalent to no school, if the place is definitely not write the exam (think of the hearing before the exam, passed, usually do not practice hidden push Marko Cardiff, leading to the examination time to a very simple HMM, and finally because it is too unskilled lead to lack of time did not finish)

Throughout the experiment is relatively smooth, with a total time of approximately 8 hours, which write the code for a short time, the whole encounter almost no bug, take the time to design a place that represents the conditional probability. This thing took me a very long time, the final form of personal feeling is not particularly simple, but the program was still in very good use.

Guess you like

Origin www.cnblogs.com/fyunaru/p/12099139.html