Detailed explanation of Rabin-Karp (string matching)

String matching, for example, from the string S (length n), find the string T (length m)

Classic idea: Traverse the string S, for each as the starting point, match once, then the complexity of O (nm),
but in this way we repeatedly judge the characters many times
Please add a picture description

So what should we do to avoid judging the same character multiple times?
Let's first consider if the string S and the string T are composed of numbers.
For example, S is "19735859734" and T is "973".
The first comparison of 197 and 973
is the second comparison (197 - 1 * 100) * 10 + 3 = 973 and 973
third compare (973 - 9 * 100) * 10 + 5 = 735 and 973
fourth...
so
we can process in O(n) complexity from start to finish

Why is this feasible, because this is in decimal, and the number on each digit can clearly indicate what the character is?
Then we promote this idea, do we just need to change the decimal to B (B is greater than or equal to the character) type), so that the number on each bit can also express what the first character is

If there are many types of characters, B can be made larger, but in this way we may need to take the modulus of h. After all, the m power of B prevents overflow, because we only need to compare the difference, and we don’t need to compare the size, so we can take the modulo (note that B and h are mutually prime)

Note: It is possible that different characters have the same hash value. If this happens, we also need to perform a naive check (that is, character-by-character comparison), which may cause the complexity to degrade to O(nm), but we In the competition problem, just compare the hash value

The implementation code is as follows (comparing several strings a in string b):

const int B = 128;
int compare(string a, string b)
{
    
    
    int res = 0;
    int al = a.length(), bl = b.length(), t = 1;
    if(al > bl) return 0;
    for(int i = 0; i < al; i++)
    {
    
    
        t = t * B;
    }
    int ah = 0, bh = 0;
    for(int i = 0; i < al; i++)
    {
    
    
        ah = ah * B + a[i];
        bh = bh * B + b[i];
    }
    for(int i = 0; i + al <= bl; i++)
    {
    
    
        if(ah == bh) res++;
        if(i + al < bl) bh = bh * B + b[i+al] - b[i] * t;
    }
    return res;
}

Guess you like

Origin blog.csdn.net/m0_52212261/article/details/121467570