Write your own compiler: from NFA to DFA

In the previous section, we completed the function of using NFA to identify strings. One problem with NFA is that there are too many state nodes, and it is not efficient enough to use. In this section, we introduce an algorithm called "subset construction", which converts an NFA with multiple nodes into a DFA. In the epsilon closure operation we described in the previous section, we can see that in fact all nodes connected by epsilon edges can actually be regarded as a state node, so we can convert multiple nodes into A DFA node, and in the node set obtained by the epsilon closure operation, the edges sent by each node can be regarded as the edges sent by the new DFA node.

Let's use the NFA state machine completed in the previous section to see the specific process: ​​​​The result of the epsilon operation starting
from epsilon-closure(0) = {0, 27, 11, 19, 9, 12 , 13}, thus we synthesize these nodes into a new node, which we mark as DFA state 0.insert image description here

Then we do the move operation on the set {0, 27, 11, 19, 9, 12}:
move({0, 27, 11, 19, 9, 12, 14}, D} = {10, 20}, so Nodes 10 and 20 can be synthesized into a new node and recorded as "DFA state 1", because there is:
move({0, 27, 11, 19, 9, 12}, .} = {14}, so we look at node 14 Make a new node, remember it as "DFA state 2", so that we get the following DFA state machine:
Please add a picture description
Next, we continue to perform epsilon closure operation on {10, 20}, epsilon-closure({10, 20})= {10, 20, 9,12,13,21}, and then move the result:
move({10, 20, 9,12,13,21}, D) = {10} , so we then Generate a new DFA node as DFA state 3, move({10, 20, 9,12,13,21}, . } = {14, 22} Then we generate a new DFA node as DFA state 4, then There is:
Please add a picture description
this process can be deduced by analogy. Here it should be noted that if there is a terminal node of the NFA state machine in the node set obtained after the epsilon closure operation, then the corresponding DFA node is a terminal node. Next, let's see how the code To achieve, we add a file named nfa_to_dfa.go, and then add the code as follows:

import "fmt"

const (
	DFA_MAX   = 254 //DFA 最多节点数
	F         = -1  //用于初始化跳转表
	MAX_CHARS = 128 //128个ascii字符
)

type ACCEPT struct {
    
    
	acceptString string //接收节点对应的执行代码字符串
	anchor       Anchor
}

type DFA struct {
    
    
	group        int  //后面执行最小化算法时有用
	mark         bool //当前节点是否已经设置好接收字符对应的边
	anchor       Anchor
	set          []*NFA //dfa节点对应的nfa节点集合
	state        int    //dfa 节点号码
	acceptString string
}

Here we first define the basic data structure. In the converted DFA state machine, it contains up to 254 nodes. At the same time, the state machine only receives characters from 0 to 128 in the ascii table. The DFA state machine we constructed this time will Instead of using a linked list structure like the NFA state machine constructed last time, we use a jump table structure this time. We will construct a two-dimensional array dtrans, assuming that state node 1 receives the character "." and jumps to state node 2. Since the ascii value corresponding to the character "." is 46, then there is dtrans[1][46] = 2.

In the above code, we define the DFA node. Since a DFA node is converted from a set of NFA nodes, there is an array of pointers to NFA nodes in its definition. Next we design a class for converting NFA into DFA, the code is:

type NfaDfaConverter struct {
    
    
	nstates    int     //当前dfa 节点计数
	lastMarked int     //下一个需要处理的dfa节点
	dtrans     [][]int //dfa状态机的跳转表
	accepts    []*ACCEPT
	dstates    []DFA //所有dfa节点的集合
}

func NewNfaDfaConverter() *NfaDfaConverter {
    
    
	n := &NfaDfaConverter{
    
    
		nstates:    0,
		lastMarked: 0,
		dtrans:     make([][]int, DFA_MAX),
		dstates:    make([]DFA, DFA_MAX),
	}

	for i := range n.dtrans {
    
    
		n.dtrans[i] = make([]int, MAX_CHARS)
	}

	return n
}

There are several variables to note in the definition, where dtrans is a two-dimensional array used to construct the DFA jump table, nstates is used to record the number of DFA nodes that have been generated currently, and lastMarked is used to point to the next jump logic to be created DFA node number, dstates is used to store the currently created DFA nodes. Let's take a look at the implementation of the conversion logic:

func (n *NfaDfaConverter) getUnMarked() *DFA {
    
    
	for ; n.lastMarked < n.nstates; n.lastMarked++ {
    
    
		debug := 0
		if n.dstates[n.lastMarked].state == 5 {
    
    
			debug = 1
			fmt.Printf("debug: %d", debug)
		}
		if n.dstates[n.lastMarked].mark == false {
    
    
			return &n.dstates[n.lastMarked]
		}
	}

	return nil
}

func (n *NfaDfaConverter) compareNfaSlice(setOne []*NFA, setTwo []*NFA) bool {
    
    
	//比较两个集合的元素是否相同
	if len(setOne) != len(setTwo) {
    
    
		return false
	}

	equal := false
	for _, nfaOne := range setOne {
    
    
		for _, nfaTwo := range setTwo {
    
    
			if nfaTwo == nfaOne {
    
    
				equal = true
				break
			}
		}

		if equal != true {
    
    
			return false
		}
	}

	return true
}

func (n *NfaDfaConverter) hasDfaContainsNfa(nfaSet []*NFA) (bool, int) {
    
    
	//查看是否存在dfa节点它对应的nfa节点集合与输入的集合相同
	for _, dfa := range n.dstates {
    
    
		if n.compareNfaSlice(dfa.set, nfaSet) == true {
    
    
			return true, dfa.state
		}
	}

	return false, -1
}

func (n *NfaDfaConverter) addDfaState(epsilonResult *EpsilonResult) int {
    
    
	//根据当前nfa节点集合构造一个新的dfa节点
	nextState := F
	if n.nstates >= DFA_MAX {
    
    
		panic("Too many DFA states")
	}

	nextState = n.nstates
	n.nstates += 1
	n.dstates[nextState].set = epsilonResult.results
	n.dstates[nextState].mark = false
	n.dstates[nextState].acceptString = epsilonResult.acceptStr
	n.dstates[nextState].anchor = epsilonResult.anchor
	n.dstates[nextState].state = nextState //记录当前dfa节点的编号s

	n.printDFAState(&n.dstates[nextState])
	fmt.Print("\n")

	return nextState
}

func (n *NfaDfaConverter) printDFAState(dfa *DFA) {
    
    
	fmt.Printf("DFA state : %d, it is nfa are: {", dfa.state)
	for _, nfa := range dfa.set {
    
    
		fmt.Printf("%d,", nfa.state)
	}

	fmt.Printf("}")
}

func (n *NfaDfaConverter) MakeDTran(start *NFA) {
    
    
	//根据输入的nfa状态机起始节点构造dfa状态机的跳转表
	startStates := make([]*NFA, 0)
	startStates = append(startStates, start)
	statesCopied := make([]*NFA, len(startStates))
	copy(statesCopied, startStates)

	//先根据起始状态的求Epsilon闭包操作的结果,由此获得第一个dfa节点
	epsilonResult := EpsilonClosure(statesCopied)
	n.dstates[0].set = epsilonResult.results
	n.dstates[0].anchor = epsilonResult.anchor
	n.dstates[0].acceptString = epsilonResult.acceptStr
	n.dstates[0].mark = false

	//debug purpose
	n.printDFAState(&n.dstates[0])
	fmt.Print("\n")
	nextState := 0
	n.nstates = 1 //当前已经有一个dfa节点
	//先获得第一个没有设置其跳转边的dfa节点
	current := n.getUnMarked()
	for current != nil {
    
    
		current.mark = true
		for c := 0; c < MAX_CHARS; c++ {
    
    
			nfaSet := move(current.set, c)
			if len(nfaSet) > 0 {
    
    
				statesCopied = make([]*NFA, len(nfaSet))
				copy(statesCopied, nfaSet)
				epsilonResult = EpsilonClosure(statesCopied)
				nfaSet = epsilonResult.results
			}

			if len(nfaSet) == 0 {
    
    
				nextState = F
			} else {
    
    
				//如果当前没有那个dfa节点对应的nfa节点集合和当前nfaSet相同,那么就增加一个新的dfa节点
				isExist, state := n.hasDfaContainsNfa(nfaSet)
				if isExist == false {
    
    
					nextState = n.addDfaState(epsilonResult)
				} else {
    
    
					nextState = state
				}
			}

			//设置dfa跳转表
			n.dtrans[current.state][c] = nextState
		}

		current = n.getUnMarked()
	}
}

func (n *NfaDfaConverter) PrintDfaTransition() {
    
    
	for i := 0; i < DFA_MAX; i++ {
    
    
		if n.dstates[i].mark == false {
    
    
			break
		}

		for j := 0; j < MAX_CHARS; j++ {
    
    
			if n.dtrans[i][j] != F {
    
    
				n.printDFAState(&n.dstates[i])
				fmt.Print(" jump to : ")
				n.printDFAState(&n.dstates[n.dtrans[i][j]])
				fmt.Printf("by character %s\n", string(j))
			}
		}
	}
}

We saw earlier that a DFA node essentially corresponds to a set of NFA nodes, so when we use move and epsilon closure operations to get a set of NFA nodes, we need to see if there are already DFA nodes corresponding to the generated NFA node set , if there is, it means that the corresponding DFA node has been generated. This operation is completed by the functions compareNfaSlice and hasDfaContainsNfa. If the currently obtained NFA node set does not have a corresponding DFA node, then use the addDfaState function to create a new DFA node, and then set It is added to the dstates array.

Every time a new DFA node is created, its mark flag will be set to false, which indicates that we have not set a jump edge for it. The function getUnMarked is used to find the earliest creation time among all the current DFA nodes whose mark is set to false. that. The algorithm core of the above code is the function MakeDTran, which executes the algorithm we mentioned above, first obtains the starting node of the NFA state machine, and then obtains a set of NFA nodes through the epsilon closure operation, and uses this set of nodes to create a corresponding DFA node. Then use the move operation to get the second set of NFA nodes, then use the epsilon closure operation again to get a new set of NFA nodes, then create the second DFA node, and finally set the jump in the two-dimensional table dtrans according to the numbers corresponding to the two nodes Turn logic.

Next, we call the above implementation code in the main function to see the result. Enter the code in mai.go as follows:

package main

import (
	"nfa"
)

func main() {
    
    
	lexReader, _ := nfa.NewLexReader("input.lex", "output.py")
	lexReader.Head()
	parser, _ := nfa.NewRegParser(lexReader)
	start := parser.Parse()
	parser.PrintNFA(start)
	//str := "3.14"
	//if nfa.NfaMatchString(start, str) {
    
    
	//	fmt.Printf("string %s is accepted by given regular expression\n", str)
	//}
	nfaConverter := nfa.NewNfaDfaConverter()
	nfaConverter.MakeDTran(start)
	nfaConverter.PrintDfaTransition()
}

After running the above code, the output is as follows:

DFA state : 0, it is nfa are: {
    
    0,27,19,11,12,13,9,}
DFA state : 1, it is nfa are: {
    
    14,15,}
DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,}
DFA state : 3, it is nfa are: {
    
    16,28,}
DFA state : 4, it is nfa are: {
    
    22,25,26,28,23,14,15,}
DFA state : 5, it is nfa are: {
    
    10,9,12,13,}
DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,}
DFA state : 7, it is nfa are: {
    
    24,23,26,28,}
DFA state : 0, it is nfa are: {
    
    0,27,19,11,12,13,9,} jump to : DFA state : 1, it is nfa are: {
    
    14,15,}by character .
DFA state : 0, it is nfa are: {
    
    0,27,19,11,12,13,9,} jump to : DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,}by character 0
DFA state : 0, it is nfa are: {
    
    0,27,19,11,12,13,9,} jump to : DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,}by character 1
DFA state : 0, it is nfa are: {
    
    0,27,19,11,12,13,9,} jump to : DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,}by character 2
DFA state : 0, it is nfa are: {
    
    0,27,19,11,12,13,9,} jump to : DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,}by character 3
DFA state : 0, it is nfa are: {
    
    0,27,19,11,12,13,9,} jump to : DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,}by character 4
DFA state : 0, it is nfa are: {
    
    0,27,19,11,12,13,9,} jump to : DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,}by character 5
DFA state : 0, it is nfa are: {
    
    0,27,19,11,12,13,9,} jump to : DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,}by character 6
DFA state : 0, it is nfa are: {
    
    0,27,19,11,12,13,9,} jump to : DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,}by character 7
DFA state : 0, it is nfa are: {
    
    0,27,19,11,12,13,9,} jump to : DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,}by character 8
DFA state : 0, it is nfa are: {
    
    0,27,19,11,12,13,9,} jump to : DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,}by character 9
DFA state : 1, it is nfa are: {
    
    14,15,} jump to : DFA state : 3, it is nfa are: {
    
    16,28,}by character 0
DFA state : 1, it is nfa are: {
    
    14,15,} jump to : DFA state : 3, it is nfa are: {
    
    16,28,}by character 1
DFA state : 1, it is nfa are: {
    
    14,15,} jump to : DFA state : 3, it is nfa are: {
    
    16,28,}by character 2
DFA state : 1, it is nfa are: {
    
    14,15,} jump to : DFA state : 3, it is nfa are: {
    
    16,28,}by character 3
DFA state : 1, it is nfa are: {
    
    14,15,} jump to : DFA state : 3, it is nfa are: {
    
    16,28,}by character 4
DFA state : 1, it is nfa are: {
    
    14,15,} jump to : DFA state : 3, it is nfa are: {
    
    16,28,}by character 5
DFA state : 1, it is nfa are: {
    
    14,15,} jump to : DFA state : 3, it is nfa are: {
    
    16,28,}by character 6
DFA state : 1, it is nfa are: {
    
    14,15,} jump to : DFA state : 3, it is nfa are: {
    
    16,28,}by character 7
DFA state : 1, it is nfa are: {
    
    14,15,} jump to : DFA state : 3, it is nfa are: {
    
    16,28,}by character 8
DFA state : 1, it is nfa are: {
    
    14,15,} jump to : DFA state : 3, it is nfa are: {
    
    16,28,}by character 9
DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,} jump to : DFA state : 4, it is nfa are: {
    
    22,25,26,28,23,14,15,}by character .
DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,} jump to : DFA state : 5, it is nfa are: {
    
    10,9,12,13,}by character 0
DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,} jump to : DFA state : 5, it is nfa are: {
    
    10,9,12,13,}by character 1
DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,} jump to : DFA state : 5, it is nfa are: {
    
    10,9,12,13,}by character 2
DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,} jump to : DFA state : 5, it is nfa are: {
    
    10,9,12,13,}by character 3
DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,} jump to : DFA state : 5, it is nfa are: {
    
    10,9,12,13,}by character 4
DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,} jump to : DFA state : 5, it is nfa are: {
    
    10,9,12,13,}by character 5
DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,} jump to : DFA state : 5, it is nfa are: {
    
    10,9,12,13,}by character 6
DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,} jump to : DFA state : 5, it is nfa are: {
    
    10,9,12,13,}by character 7
DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,} jump to : DFA state : 5, it is nfa are: {
    
    10,9,12,13,}by character 8
DFA state : 2, it is nfa are: {
    
    10,9,12,13,20,21,} jump to : DFA state : 5, it is nfa are: {
    
    10,9,12,13,}by character 9
DFA state : 4, it is nfa are: {
    
    22,25,26,28,23,14,15,} jump to : DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,}by character 0
DFA state : 4, it is nfa are: {
    
    22,25,26,28,23,14,15,} jump to : DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,}by character 1
DFA state : 4, it is nfa are: {
    
    22,25,26,28,23,14,15,} jump to : DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,}by character 2
DFA state : 4, it is nfa are: {
    
    22,25,26,28,23,14,15,} jump to : DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,}by character 3
DFA state : 4, it is nfa are: {
    
    22,25,26,28,23,14,15,} jump to : DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,}by character 4
DFA state : 4, it is nfa are: {
    
    22,25,26,28,23,14,15,} jump to : DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,}by character 5
DFA state : 4, it is nfa are: {
    
    22,25,26,28,23,14,15,} jump to : DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,}by character 6
DFA state : 4, it is nfa are: {
    
    22,25,26,28,23,14,15,} jump to : DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,}by character 7
DFA state : 4, it is nfa are: {
    
    22,25,26,28,23,14,15,} jump to : DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,}by character 8
DFA state : 4, it is nfa are: {
    
    22,25,26,28,23,14,15,} jump to : DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,}by character 9
DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,} jump to : DFA state : 7, it is nfa are: {
    
    24,23,26,28,}by character 0
DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,} jump to : DFA state : 7, it is nfa are: {
    
    24,23,26,28,}by character 1
DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,} jump to : DFA state : 7, it is nfa are: {
    
    24,23,26,28,}by character 2
DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,} jump to : DFA state : 7, it is nfa are: {
    
    24,23,26,28,}by character 3
DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,} jump to : DFA state : 7, it is nfa are: {
    
    24,23,26,28,}by character 4
DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,} jump to : DFA state : 7, it is nfa are: {
    
    24,23,26,28,}by character 5
DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,} jump to : DFA state : 7, it is nfa are: {
    
    24,23,26,28,}by character 6
DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,} jump to : DFA state : 7, it is nfa are: {
    
    24,23,26,28,}by character 7
DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,} jump to : DFA state : 7, it is nfa are: {
    
    24,23,26,28,}by character 8
DFA state : 6, it is nfa are: {
    
    16,28,24,23,26,28,} jump to : DFA state : 7, it is nfa are: {
    
    24,23,26,28,}by character 9

We graph the above output as follows:
Please add a picture description

Compared with the NFA state diagram above, the DFA state diagram is much simpler. In addition, the generated DFA state machine can be further streamlined. Let’s look at related algorithms in the next section. The code download address is: Link: https://pan.baidu.com/s/1kStrJMznrexQkGGBs8vN3w Extraction code: dqss

Guess you like

Origin blog.csdn.net/tyler_download/article/details/128514727