Oulipo HDU 1686 (哈希或KMP)

版权声明:转载请标明出处 https://blog.csdn.net/weixin_41190227/article/details/86503452

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book: 

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais… 

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces. 

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap. 
 

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format: 

One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W). 
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000. 

Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T. 
 

Sample Input

3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN

Sample Output

1
3
0

题目大意: 输入有n组,每组输入两个字符串s1, s2, 求在s2这个字符串中有多少个s1这样的子串。

这个是曾经写过的一道题,之前是用KMP写的,也是我刚学KMP的时候写的第一道题,就是个模板题,现在看看水到不行。。。

最近一直在学哈希,所以用哈希写了一发,也是挺好想的,写起来不费劲。就是学hash的时候一直有的一个小问题在这解决了。

先说一个知识点。  

比如,    s1 : 12303 ,     s2 : 212303

我们用哈希的时候会先算s1长度的两个串的hash值, 然后进行比较(比较的是12303和21230这两个的hash值), 接下来就是比较第一个串的12303 和第二个串的12303(因为匹配的时候这一步从s2的第二个字符开始匹配), 那这个问题就是s2串的12303的hash值怎么求的问题。

我们有一个标准seed = 233, 它的hash值就是_Hash2 = _Hash2 * seed + s2[i + len1] - p * s2[i] ;因为hash也是转化成p进制的,在这里p只是一个数量级。所以说每次求后移一位之后的串的hash值就是_Hash2 = _Hash2 * seed + s2[i + len1] - p * s2[i] ;

简单的说就是求以现在为截点,去掉第一个字符后,尾部加一个字符之后的hash值。

KMP算法:

/*
@Author: Top_Spirit
@Language: C++
*/
#include <bits/stdc++.h>
using namespace std ;
typedef unsigned long long ull ;
typedef long long ll ;
const int Maxn = 1e4 + 10 ;
const int INF = 0x3f3f3f3f ;
const double PI = acos(-1.0) ;
const int seed = 133 ;

int n ;
string s1, s2 ;
int len1, len2 ;
int Next[Maxn] ;

void Get_Next() {
    Next[0] = -1 ;
    int k = -1, j = 0 ;
    while (j < len1){
        if (k == -1 || s1[j] == s1[k]) Next[++j] = ++k ;
        else k = Next[k] ;
    }
}

int KMP (){
    int i = 0, j = 0 ;
    int ans = 0 ;
    while (i < len2){
        if (j == -1 || s1[j] == s2[i]) {
            i++ ;
            j++ ;
        }
        else j = Next[j] ;
        if (j == len1) ans++ ;
    }
    return ans ;
}

int main (){
    ios_base::sync_with_stdio(false) ;
    cin.tie(0) ;
    cout.tie(0) ;
    int n ;
    cin >> n ;
    while (n--){
        cin >> s1 >> s2 ;
        len1 = s1.size() ;
        len2 = s2.size() ;
        Get_Next() ;
        int ans = 0 ;
         ans = KMP() ;
        cout << ans << endl ;
    }
    return 0 ;
}

哈希算法:

/*
@Author: Top_Spirit
@Language: C++
*/
#include <bits/stdc++.h>
using namespace std ;
typedef unsigned long long ull ;
typedef long long ll ;
const int Maxn = 1e4 + 10 ;
const int INF = 0x3f3f3f3f ;
const double PI = acos(-1.0) ;
const int seed = 133 ;

int n ;
string s1, s2 ;
int len1, len2 ;

int main (){
    ios_base::sync_with_stdio(false) ;
    cin.tie(0) ;
    cout.tie(0) ;
    cin >> n ;
    while (n--){
        cin >> s1 >> s2 ;
        len1 = s1.size() ;
        len2 = s2.size() ;
        if (len1 > len2 ) {
            cout << 0 << endl ;
            continue ;
        }
        int p = 1 ;
        for (int i = 1; i <= len1; i++) p *= seed ;
        int _Hash1 = 0, _Hash2 = 0 ;
        for (int i = 0; i < len1; i++){
            _Hash1 = _Hash1 * seed + s1[i] ;
            _Hash2 = _Hash2 * seed + s2[i] ;
        }
//        cout << "++++" << endl ;
        int ans = 0 ;
        for (int i = 0; i + len1 <= len2; i++){
            if (_Hash1 == _Hash2) ans++;
            _Hash2 = _Hash2 * seed + s2[i + len1] - p * s2[i] ;
        }
        cout << ans << endl ;
    }
    return 0 ;
}

落花人独立

微雨燕双飞

猜你喜欢

转载自blog.csdn.net/weixin_41190227/article/details/86503452