字符串匹配——Rabin–Karp algorithm（二）

上一篇https://blog.csdn.net/To_be_to_thought/article/details/84890018只是介绍了朴素的Rabin–Karp algorithm，这一篇主要说说朴素Rabin–Karp algorithm的优化。

模式串P长度为L，文本串S长度为n，在S的一轮遍历中找到P的位置，上文提到的hash(P)的复杂度为O(L)，对S的每个长度为L的子字符串进行hash函数计算需要O(nL)，如果某个子串哈希值与hash(P)相等，则进行该子串与P的一一比对，该步复杂度为O(L)，这种朴素的方法总共花费O(nL)的时间。

我们注意到这些字串的字符有很多是相互重合的，比如字符串“algorithms”的5字符的子字符串“algor”和“lgori”有四个字母是一样的，如果能利用这个共享子串来减少计算，这就是“rolling hash”的由来。

举个例子：P=“90210”，S=“48902107”

S的5字符的子串包括：，数字字符集为。

子字符串k的哈希函数计算方法：

递推公式(类似于滑动窗口)为：

使用rolling hash将计算每个子串的哈希值复杂度变成O(1)，而所有子字符串的哈希值计算也降到了O(n)复杂度。

更一般的：

其中，表示文本串里的第i+1个长度为L的子字符串，为文本串中第i个字符（i从0到n-L取值）。

算法代码如下：

class Solution {
    public static int base=256;
    public static int module=101;
    public static boolean match(String str1,String str2)
    {
        assert str1.length()==str2.length();
        for(int i=0;i<str1.length();i++)
        {
            if(str1.charAt(i)!=str2.charAt(i))
                return false;
        }
        return true;
    }
    
    public int strStr(String haystack, String needle) {
        if(needle=="" || needle.length()==0)
            return 0;
        int m=needle.length(),n=haystack.length(),h=1;
        if(n==0 || n<m)
            return -1;
        for(int i=0;i<m-1;i++)
            h=(h*base)%module;
        int p=0,t=0;
        for(int i=0;i<m;i++)
        {
            p=(p*base+needle.charAt(i))%module;
            t=(t*base+haystack.charAt(i))%module;
        }
        for(int i=0;i<n-m+1;i++)
        {
            if(t==p)
            {
                if(match(needle,haystack.substring(i,i+m)))
                    return i;
            }
            if(i<n-m)
            {
                t=( base * (t-haystack.charAt(i) * h) + haystack.charAt(i+m) )%module;
                if(t<0)
                    t=t+module;
            }
        }
        return -1;
    }
}

字符串匹配——Rabin–Karp algorithm（二）

猜你喜欢