Project location: the Regex in Python
The first two have been completed based on a written NFA regular expression engine, the following have to do is step closer to convert the NFA to DFA, DFA and minimize
DFA definition
For the NFA to DFA conversion algorithms, mainly in the NFA state nodes may be combined, thereby allowing the state to an input node has a unique character of a jump node
So for DFA on the node containing a set of nodes and a unique identifier and a status flag nfa to whether the reception state
class Dfa(object):
STATUS_NUM = 0
def __init__(self):
self.nfa_sets = []
self.accepted = False
self.status_num = -1
@classmethod
def nfas_to_dfa(cls, nfas):
dfa = cls()
for n in nfas:
dfa.nfa_sets.append(n)
if n.next_1 is None and n.next_2 is None:
dfa.accepted = True
dfa.status_num = Dfa.STATUS_NUM
Dfa.STATUS_NUM = Dfa.STATUS_NUM + 1
return dfa
NFA conversion to DFA
The NFA is converted to DFA ultimate goal is to get a jump table, and before the C language compiler syntax analysis table a bit like
This function is to convert all of the NFA DFA algorithm of the main logic is this:
- Before the first use of closure algorithm to calculate a combined NFA node can then generate a node of the DFA
- Then this collection to traverse DFA
- After operation for each input character move, and then move the collection once again resulting closure operation, so that you can get the next node DFA state (here also be a re-determination operation it is possible current DFA state node may have after a generation)
- Then these correspondence relationship between two nodes in the jump table into
- If this time DFA NFA wherein presence status contained in a receiving node, then the current state is acceptable, of course, DFA of
def convert_to_dfa(nfa_start_node):
jump_table = list_dict(MAX_DFA_STATUS_NUM)
ns = [nfa_start_node]
n_closure = closure(ns)
dfa = Dfa.nfas_to_dfa(n_closure)
dfa_list.append(dfa)
dfa_index = 0
while dfa_index < len(dfa_list):
dfa = dfa_list[dfa_index]
for i in range(ASCII_COUNT):
c = chr(i)
nfa_move = move(dfa.nfa_sets, c)
if nfa_move is not None:
nfa_closure = closure(nfa_move)
if nfa_closure is None:
continue
new_dfa = convert_completed(dfa_list, nfa_closure)
if new_dfa is None:
new_dfa = Dfa.nfas_to_dfa(nfa_closure)
dfa_list.append(new_dfa)
next_state = new_dfa.status_num
jump_table[dfa.status_num][c] = next_state
if new_dfa.accepted:
jump_table[new_dfa.status_num]['accepted'] = True
dfa_index = dfa_index + 1
return jump_table
DFA minimization
Is essentially to minimize the DFA state of the node is also combined, then partitioned
- The first reception state whether the partition
- Then again partition of the partition of the node to the jump relations DFA jump table, if the current state of the node after node jump DFA is also located in the same partition, to prove they can be classified as a partition
- Repeat the above algorithm
Dfa partition definition
DfaGroup previously defined and similar, there is a unique identifier and a node list discharge DFA state
class DfaGroup(object):
GROUP_COUNT = 0
def __init__(self):
self.set_count()
self.group = []
def set_count(self):
self.group_num = DfaGroup.GROUP_COUNT
DfaGroup.GROUP_COUNT = DfaGroup.GROUP_COUNT + 1
def remove(self, element):
self.group.remove(element)
def add(self, element):
self.group.append(element)
def get(self, count):
if count > len(self.group) - 1:
return None
return self.group[count]
def __len__(self):
return len(self.group)
Minimize DFA
partition is the most important part of the DFA minimization algorithm
- We will start the jump table to find the next state of the DFA corresponding to the current node jump
- DFA is used to compare the first node
- If the next state in a first state and a next node of the node is not in the same partition, then they can not be described in the same partition
- To re-create a new partition
So in fact the smallest overtaken by DFA node is merged under the same status of a jump
def partition(jump_table, group, first, next, ch):
goto_first = jump_table[first.status_num].get(ch)
goto_next = jump_table[next.status_num].get(ch)
if dfa_in_group(goto_first) != dfa_in_group(goto_next):
new_group = DfaGroup()
group_list.append(new_group)
group.remove(next)
new_group.add(next)
return True
return False
Create a jump table
After completion of the jump zone subdivided between the node and the node becomes a region and a jump interval
- DFA traversal collection
- Find the corresponding node and the corresponding jump from the previous relationship between the jump table
- Then find their corresponding partition, i.e. converted into a jump between the partitions and
def create_mindfa_table(jump_table):
trans_table = list_dict(ASCII_COUNT)
for dfa in dfa_list:
from_dfa = dfa.status_num
for i in range(ASCII_COUNT):
ch = chr(i)
to_dfa = jump_table[from_dfa].get(ch)
if to_dfa:
from_group = dfa_in_group(from_dfa)
to_group = dfa_in_group(to_dfa)
trans_table[from_group.group_num][ch] = to_group.group_num
if dfa.accepted:
from_group = dfa_in_group(from_dfa)
trans_table[from_group.group_num]['accepted'] = True
return trans_table
Matches the input character string
Using a jump table for the input character string matching logic is very simple
- String traversal input
- Get the current state of the jump corresponding to the relationship between the input
- Jump or complete match
def dfa_match(input_string, jump_table, minimize=True):
if minimize:
cur_status = dfa_in_group(0).group_num
else:
cur_status = 0
for i, c in enumerate(input_string):
jump_dict = jump_table[cur_status]
if jump_dict:
js = jump_dict.get(c)
if js is None:
return False
else:
cur_status = js
if i == len(input_string) - 1 and jump_dict.get('accepted'):
return True
return jump_table[cur_status].get('accepted') is not None
to sum up
This process has been completed all a simple regular expression engine
Regular expression -> NFA -> DFA -> DFA minimized -> match