python kmp string matching

  First, I declare that I am a rookie. I write a blog to record the learning process, as well as my own understanding and experience. There may be some places that are not well written. I hope God can point it out. . .

throw questions

  Given a text string test_str (the matched string) and the pattern string pat_str (the string that needs to be matched from the text string), find out the first occurrence of the pattern string pat_str from the text string test_str, if not, return - 1

violent way

  Before talking about kmp, let's talk about the "violent method", which means our most primitive method.

  

text_str = 'asdabcdace'
pat_str = 'abcdace'

def str_match(text_str,pat_str):
    for i in range(0,len(text_str)):
        j = 1
         while j < len(pat_str):
             if text_str[i:i+j] != pat_str[0:j]: #Start from the i-th character of text_str to see if the match is successful 
                break    #If the match fails, jump out of the loop directly , i+1, continue to match from the first character 
            j += 1       #If the match is successful, continue to match the next character, knowing that every character in pat_str is matched 
        if j == len(pat_str):
             return i
     return -1

print(str_match(text_str,pat_str))

  The reason why it is called the brute force solution is because every time the match fails, the pattern string is moved back one place, and the match starts from the beginning, and the cycle continues. Causes high time complexity, kmp is to optimize this place, each time the match fails, the next value of the distance to move next time

km²

  It may not be easy for me to fully explain the kmp algorithm to you. I can only roughly implement it step by step. I think it's just a point,

  How to find the next value corresponding to each character of the pattern string

    Because it is possible that the length of the characters that fail to match each time is different, and the distance corresponding to each movement is different, then how do we find the next value corresponding to each character, which leads to another concept

    Maximum prefix and maximum suffix

    

  Assuming that the largest prefix = the largest suffix, the length is k, then the i-th character, the corresponding next value is k+1, and the next value of each character can be found in one cycle

Code

  

#Seek the next value of the string 
text_str = ' asdabcdace ' 
pat_str = ' abcdace '

#Get the next value corresponding to the character 
def str_next(s): #The
     first two characters are equal to 1 by default 
    next = [1,1 ]
     for x in range(2 ,len(s)):
        next.append(str_max_prx(s,x,next[x -1]-1) + 1 )
     return next
 #parameter s string, the position where the match is performed, the position where the next match starts 
def str_max_prx(s,x,last_value ):
    next = 0
    for i in range(last_value,x):
        if s[0:i] == s[x-i:x]:
            next = i
    return next
def str_match(s,m):
    next = str_next(s)
    i=0
    s_len = len (s)
    m_len = len(m)
    while i <= m_len:
        flag = True      #flag bit, used to determine whether the match is successful
        index = 1
        while index <= s_len:
            if m[i:i + index] != s[0:index]:
                i = i + next[index]
                flag = False
                break
            else:
                index += 1
        if flag:
            break
    if i >= m_len:
        i = -1
    return i
res = str_match(pat_str,text_str)
print(res)

  The code is like this, and many things may still need to be understood by yourself. I'll take a note for easy search later, I hope it can be helpful to you.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326497627&siteId=291194637