1.BF algorithm
Violence matching, also called simple matching algorithm efficiency is low.
The lookup string B in A, A is is the main string (n-length), B is the pattern string (length m). The initial algorithm is to check in the main stream positions respectively 0,1,2,...,n-m
and a length of m n-m+1
substrings
For example, two string similarity is high, but it is often used in engineering the algorithm, because
- In most cases the length of the two strings are not too long, and the comparison process, if not early exit
- Idea is simple, less error prone
Time complexity: O (n * m)
# coding:utf-8
class Bf(object):
"""
字符串匹配: BF算法(暴力匹配)
"""
def match(self, string1: str, string2: str):
"""
:param string1: 主串
:param string2: 模式串
:return:
"""
n = len(string1)
m = len(string2)
loop_nums = n - m + 1
for i in range(loop_nums):
for j in range(m):
if string1[i] == string2[j]:
i += 1
continue
else:
break
else:
return True
return False
if __name__ == "__main__":
bf = Bf()
assert bf.match("abcabc", "ca") == True
assert bf.match("abcabc", "bc") == True
assert bf.match("abcabc", "cc") == False
assert bf.match("abcabc", "cab") == True
assert bf.match("abcabc", "cb") == False
2.RK algorithm
The introduction of hash algorithm BF algorithm, is an upgraded version.
Respectively, by the main string hash algorithm n-m+1
substring hash values are required, then the final hash value and the comparison of the pattern string. Because the hash value is a number, so fast in order to avoid errors caused by hash conflicts, when two strings are equal to the matched string itself once again Comparative
Optimization: continue to constantly calculate the hash contrast, such as when the two hash values do not have to go the rest of the hash value calculation.
Time complexity: O (n)
# coding:utf-8
class Rk(object):
"""
字符串匹配: Rk算法, 借助hash算法实现
此处假设只有字母和数字
"""
def hash_func(self, string):
"""
求string的ascii的和为hash函数
:param string:
:return:
"""
return sum([ord(i) for i in string])
def match(self, string1: str, string2: str):
"""
:param string1: 主串
:param string2: 模式串
:return:
"""
n = len(string1)
m = len(string2)
loop_nums = n - m + 1
target_value = self.hash_func(string2)
for i in range(loop_nums):
tmp_string = string1[i:i + m]
tmp_value = self.hash_func(tmp_string)
if tmp_value == target_value:
# 防止hash冲突, 二次确认, 如果一致返回True
if tmp_string == string2:
break
else:
return False
return True
if __name__ == "__main__":
bf = Rk()
assert bf.match("abcabc", "ca") == True
assert bf.match("abcabc", "bc") == True
assert bf.match("abcabc", "cc") == False
assert bf.match("abcabc", "cab") == True
assert bf.match("abcabc", "cb") == False
data
- Data structures and algorithms - WANG Zheng