Oulipo HDU 1686 （哈希或KMP）

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.

Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

Sample Input
3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN
Sample Output
1
3
0

题目大意：输入有n组，每组输入两个字符串s1， s2，求在s2这个字符串中有多少个s1这样的子串。

这个是曾经写过的一道题，之前是用KMP写的，也是我刚学KMP的时候写的第一道题，就是个模板题，现在看看水到不行。。。

最近一直在学哈希，所以用哈希写了一发，也是挺好想的，写起来不费劲。就是学hash的时候一直有的一个小问题在这解决了。

先说一个知识点。

比如， s1 ： 12303 ， s2 ： 212303

我们用哈希的时候会先算s1长度的两个串的hash值，然后进行比较（比较的是12303和21230这两个的hash值），接下来就是比较第一个串的12303 和第二个串的12303（因为匹配的时候这一步从s2的第二个字符开始匹配），那这个问题就是s2串的12303的hash值怎么求的问题。

我们有一个标准seed = 233，它的hash值就是_Hash2 = _Hash2 * seed + s2[i + len1] - p * s2[i] ;因为hash也是转化成p进制的，在这里p只是一个数量级。所以说每次求后移一位之后的串的hash值就是_Hash2 = _Hash2 * seed + s2[i + len1] - p * s2[i] ;

简单的说就是求以现在为截点，去掉第一个字符后，尾部加一个字符之后的hash值。

KMP算法：

/*
@Author: Top_Spirit
@Language: C++
*/
#include <bits/stdc++.h>
using namespace std ;
typedef unsigned long long ull ;
typedef long long ll ;
const int Maxn = 1e4 + 10 ;
const int INF = 0x3f3f3f3f ;
const double PI = acos(-1.0) ;
const int seed = 133 ;

int n ;
string s1, s2 ;
int len1, len2 ;
int Next[Maxn] ;

void Get_Next() {
    Next[0] = -1 ;
    int k = -1, j = 0 ;
    while (j < len1){
        if (k == -1 || s1[j] == s1[k]) Next[++j] = ++k ;
        else k = Next[k] ;
    }
}

int KMP (){
    int i = 0, j = 0 ;
    int ans = 0 ;
    while (i < len2){
        if (j == -1 || s1[j] == s2[i]) {
            i++ ;
            j++ ;
        }
        else j = Next[j] ;
        if (j == len1) ans++ ;
    }
    return ans ;
}

int main (){
    ios_base::sync_with_stdio(false) ;
    cin.tie(0) ;
    cout.tie(0) ;
    int n ;
    cin >> n ;
    while (n--){
        cin >> s1 >> s2 ;
        len1 = s1.size() ;
        len2 = s2.size() ;
        Get_Next() ;
        int ans = 0 ;
         ans = KMP() ;
        cout << ans << endl ;
    }
    return 0 ;
}

哈希算法：

/*
@Author: Top_Spirit
@Language: C++
*/
#include <bits/stdc++.h>
using namespace std ;
typedef unsigned long long ull ;
typedef long long ll ;
const int Maxn = 1e4 + 10 ;
const int INF = 0x3f3f3f3f ;
const double PI = acos(-1.0) ;
const int seed = 133 ;

int n ;
string s1, s2 ;
int len1, len2 ;

int main (){
    ios_base::sync_with_stdio(false) ;
    cin.tie(0) ;
    cout.tie(0) ;
    cin >> n ;
    while (n--){
        cin >> s1 >> s2 ;
        len1 = s1.size() ;
        len2 = s2.size() ;
        if (len1 > len2 ) {
            cout << 0 << endl ;
            continue ;
        }
        int p = 1 ;
        for (int i = 1; i <= len1; i++) p *= seed ;
        int _Hash1 = 0, _Hash2 = 0 ;
        for (int i = 0; i < len1; i++){
            _Hash1 = _Hash1 * seed + s1[i] ;
            _Hash2 = _Hash2 * seed + s2[i] ;
        }
//        cout << "++++" << endl ;
        int ans = 0 ;
        for (int i = 0; i + len1 <= len2; i++){
            if (_Hash1 == _Hash2) ans++;
            _Hash2 = _Hash2 * seed + s2[i + len1] - p * s2[i] ;
        }
        cout << ans << endl ;
    }
    return 0 ;
}

落花人独立

微雨燕双飞

Oulipo HDU 1686 （哈希或KMP）

猜你喜欢