[Algorithm] Detailed explanation and code implementation of character matching algorithm

In computer science, a character matching algorithm is a technique for finding specific patterns in a given text. These algorithms play an important role in a variety of applications, including text editors, search engines, network security, and bioinformatics, among others. This article will introduce two commonly used character matching algorithms in detail: the naive method and the KMP algorithm. We will also provide Python code implementations for a better understanding of these algorithms.

1. Simple method

The naive method is a simple character matching algorithm whose idea is to compare the elements of the main string and the pattern string one by one. The time complexity of this algorithm is O(mn), where m and n are the lengths of the main string and the pattern string, respectively.

The algorithm steps are as follows:

(1) Start the comparison between the first character of the main string and the first character of the pattern string.

(2) If they are equal, continue to compare the next character until all the characters in the pattern string are equal to the corresponding characters in the main string.

(3) If it is found that they are not equal during the comparison, start from the next character of the main string and compare with the first character of the pattern string again.

(4) Repeat steps (2) and (3) until a match is found or the entire main string is traversed.

The following is the Python code implementation:

def naive_search(text, pattern):
    m = len(text)
    n = len(pattern)

    for i in range(m - n + 1):
        j = 0
        while j < n and text[i + j] == pattern[j]:
            j += 1
        if j == n:
            return i
    return -1

2. KMP algorithm

The KMP algorithm is an improved character matching algorithm. It finds the position of the matching failure in the pattern string, and uses the matched information to improve efficiency by skipping some unnecessary comparisons. The time complexity of this algorithm is O(m + n), where m and n are the lengths of the main string and the pattern string, respectively.

The algorithm steps are as follows:

(1) Preprocess the pattern string and build a next array. next[i] indicates the distance to move the pattern string to the right when the i-th character of the pattern string does not match a character in the main string. The method of constructing the next array is, for each position i, calculate the length of the same prefix and suffix that appeared before it.

(2) Start the comparison between the first character of the main string and the first character of the pattern string.

(3) If they are equal, continue to compare the next character, and update which character of the main string the next character of the pattern string should be compared with according to the next array.

(4) If it is found that they are not equal during the comparison, update the position of the pattern string according to the next array, and then continue to compare the next character.

(5) Repeat steps (3) and (4) until a match is found or the entire main string is traversed.

The following is the Python code implementation:

def compute_prefix_function(pattern):
    n = len(pattern)
    next = [0] * n
    j = 0
    for i in range(1, n):
        while j > 0 and pattern[j] != pattern[i]:
            j = next[j - 1]
        if pattern[j] == pattern[i]:
            j += 1
        next[i] = j
    return next

def kmp_search(text, pattern):
    m = len(text)
    n = len(pattern)
    next = compute_prefix_function(pattern)
    i = 0
    j = 0
    while i < m and j < n:
        if text[i] == pattern[j]:
            i += 1
            j += 1
        else:
            j = next[j - 1] + 1 if j > 0 else 0
    if j == n:
        return i - j + 1
    return -1

Guess you like

Origin blog.csdn.net/qq_22744093/article/details/132461493