Achieve a regular expression engine in Python (c)

Project location: the Regex in Python

The first two have been completed based on a written NFA regular expression engine, the following have to do is step closer to convert the NFA to DFA, DFA and minimize

DFA definition

For the NFA to DFA conversion algorithms, mainly in the NFA state nodes may be combined, thereby allowing the state to an input node has a unique character of a jump node

So for DFA on the node containing a set of nodes and a unique identifier and a status flag nfa to whether the reception state

class Dfa(object):
    STATUS_NUM = 0

    def __init__(self):
        self.nfa_sets = []
        self.accepted = False
        self.status_num = -1

    @classmethod
    def nfas_to_dfa(cls, nfas):
        dfa = cls()
        for n in nfas:
            dfa.nfa_sets.append(n)
            if n.next_1 is None and n.next_2 is None:
                dfa.accepted = True

        dfa.status_num = Dfa.STATUS_NUM
        Dfa.STATUS_NUM = Dfa.STATUS_NUM + 1
        return dfa

NFA conversion to DFA

The NFA is converted to DFA ultimate goal is to get a jump table, and before the C language compiler syntax analysis table a bit like

This function is to convert all of the NFA DFA algorithm of the main logic is this:

  • Before the first use of closure algorithm to calculate a combined NFA node can then generate a node of the DFA
  • Then this collection to traverse DFA
  • After operation for each input character move, and then move the collection once again resulting closure operation, so that you can get the next node DFA state (here also be a re-determination operation it is possible current DFA state node may have after a generation)
  • Then these correspondence relationship between two nodes in the jump table into
  • If this time DFA NFA wherein presence status contained in a receiving node, then the current state is acceptable, of course, DFA of
def convert_to_dfa(nfa_start_node):
    jump_table = list_dict(MAX_DFA_STATUS_NUM)
    ns = [nfa_start_node]
    n_closure = closure(ns)
    dfa = Dfa.nfas_to_dfa(n_closure)
    dfa_list.append(dfa)

    dfa_index = 0
    while dfa_index < len(dfa_list):
        dfa = dfa_list[dfa_index]
        for i in range(ASCII_COUNT):
            c = chr(i)
            nfa_move = move(dfa.nfa_sets, c)
            if nfa_move is not None:
                nfa_closure = closure(nfa_move)
                if nfa_closure is None:
                    continue
                new_dfa = convert_completed(dfa_list, nfa_closure)
                if new_dfa is None:
                    new_dfa = Dfa.nfas_to_dfa(nfa_closure)
                    dfa_list.append(new_dfa)
                next_state = new_dfa.status_num
            jump_table[dfa.status_num][c] = next_state
            if new_dfa.accepted:
                jump_table[new_dfa.status_num]['accepted'] = True
        dfa_index = dfa_index + 1
    
    return jump_table

DFA minimization

Is essentially to minimize the DFA state of the node is also combined, then partitioned

  1. The first reception state whether the partition
  2. Then again partition of the partition of the node to the jump relations DFA jump table, if the current state of the node after node jump DFA is also located in the same partition, to prove they can be classified as a partition
  3. Repeat the above algorithm

Dfa partition definition

DfaGroup previously defined and similar, there is a unique identifier and a node list discharge DFA state

class DfaGroup(object):
    GROUP_COUNT = 0

    def __init__(self):
        self.set_count()
        self.group = []

    def set_count(self):
        self.group_num = DfaGroup.GROUP_COUNT
        DfaGroup.GROUP_COUNT = DfaGroup.GROUP_COUNT + 1

    def remove(self, element):
        self.group.remove(element)

    def add(self, element):
        self.group.append(element)

    def get(self, count):
        if count > len(self.group) - 1:
            return None
        return self.group[count]

    def __len__(self):
        return len(self.group)

Minimize DFA

partition is the most important part of the DFA minimization algorithm

  • We will start the jump table to find the next state of the DFA corresponding to the current node jump
  • DFA is used to compare the first node
  • If the next state in a first state and a next node of the node is not in the same partition, then they can not be described in the same partition
  • To re-create a new partition

So in fact the smallest overtaken by DFA node is merged under the same status of a jump

def partition(jump_table, group, first, next, ch):
    goto_first = jump_table[first.status_num].get(ch)
    goto_next = jump_table[next.status_num].get(ch)

    if dfa_in_group(goto_first) != dfa_in_group(goto_next):
        new_group = DfaGroup()
        group_list.append(new_group)
        group.remove(next)
        new_group.add(next)
        return True

    return False

Create a jump table

After completion of the jump zone subdivided between the node and the node becomes a region and a jump interval

  • DFA traversal collection
  • Find the corresponding node and the corresponding jump from the previous relationship between the jump table
  • Then find their corresponding partition, i.e. converted into a jump between the partitions and
def create_mindfa_table(jump_table):
    trans_table = list_dict(ASCII_COUNT)
    for dfa in dfa_list:
        from_dfa = dfa.status_num
        for i in range(ASCII_COUNT):
            ch = chr(i)
            to_dfa = jump_table[from_dfa].get(ch)
            if to_dfa:
                from_group = dfa_in_group(from_dfa)
                to_group = dfa_in_group(to_dfa)
                trans_table[from_group.group_num][ch] = to_group.group_num
        if dfa.accepted:
            from_group = dfa_in_group(from_dfa)
            trans_table[from_group.group_num]['accepted'] = True

    return trans_table

Matches the input character string

Using a jump table for the input character string matching logic is very simple

  • String traversal input
  • Get the current state of the jump corresponding to the relationship between the input
  • Jump or complete match
def dfa_match(input_string, jump_table, minimize=True):
    if minimize:
        cur_status = dfa_in_group(0).group_num
    else:
        cur_status = 0 
    for i, c in enumerate(input_string):
        jump_dict = jump_table[cur_status]
        if jump_dict:
            js = jump_dict.get(c)
            if js is None:
                return False
            else:
                cur_status = js
        if i == len(input_string) - 1 and jump_dict.get('accepted'):
            return True

    return jump_table[cur_status].get('accepted') is not None

to sum up

This process has been completed all a simple regular expression engine

Regular expression -> NFA -> DFA -> DFA minimized -> match

Guess you like

Origin www.cnblogs.com/secoding/p/11582310.html