Hash Practice

A - Crazy Search POJ - 1200
Many people like to solve hard puzzles some of which may lead them to madness. One such puzzle could be finding a hidden prime number in a given text. Such number could be the number of different substrings of a given size that exist in the text. As you soon will discover, you really need the help of a computer and a good algorithm to solve such a puzzle.
Your task is to write a program that given the size, N, of the substring, the number of different characters that may occur in the text, NC, and the text itself, determines the number of different substrings of size N that appear in the text.

As an example, consider N=3, NC=4 and the text “daababac”. The different substrings of size 3 that can be found in this text are: “daa”; “aab”; “aba”; “bab”; “bac”. Therefore, the answer should be 5.

Input
The first line of input consists of two numbers, N and NC, separated by exactly one space. This is followed by the text where the search takes place. You may assume that the maximum number of substrings formed by the possible set of characters does not exceed 16 Millions.

Output
The program should output just an integer corresponding to the number of different substrings of size N found in the given text.

Sample Input
3 4
daababac

Sample Output
5

Hint
Huge input,scanf is recommended.

Analysis:
题目的意思就是求一共有NC个不同字符的长度为N的子串个数．这里可以考虑使用哈希表来查重，因为题目已经说明答案不超过16　Millions，所以我们可以采用NC进制来构造哈希函数．这样有个好处就是：不需要查重，不会冲突！

Accepted code:

#include<cstdio>
#include<cstring>
#define N 16000003

bool hash[N];
char w[N];
int id[500];

int main()
{
    int n,nc,i,j;
    while(~scanf("%d%d",&n,&nc))
    {
        memset(hash,false,sizeof(hash));
        memset(id,-1,sizeof(id));
        scanf("%s",w);
        int len=strlen(w);
        int cnt=0;
        for(i=0;i<len&&cnt<nc;i++)
        {
            if(id[w[i]]!=-1) continue;
            id[w[i]]=cnt++;
        }

        int ans=0;

        for(i=0;i<len-n+1;i++)
        {
            int s=0;
            for(j=i;j<i+n;j++)
                s=s*nc+id[w[j]];
            //printf("%d ",s);
            if(hash[s]) continue;
            else ans++;
            hash[s]=true;
        }
        printf("%d\n",ans);
    }
}

B - Babelfish POJ - 2503
You have just moved from Waterloo to a big city. The people here speak an incomprehensible dialect of a foreign language. Fortunately, you have a dictionary to help you understand them.

Input
Input consists of up to 100,000 dictionary entries, followed by a blank line, followed by a message of up to 100,000 words. Each dictionary entry is a line containing an English word, followed by a space and a foreign language word. No foreign word appears more than once in the dictionary. The message is a sequence of words in the foreign language, one word on each line. Each word in the input is a sequence of at most 10 lowercase letters.

扫描二维码关注公众号，回复： 2695378 查看本文章

Output
Output is the message translated to English, one word per line. Foreign words not in the dictionary should be translated as “eh”.

Sample Input
dog ogday
cat atcay
pig igpay
froot ootfray
loops oopslay

atcay
ittenkay
oopslay

Sample Output
cat
eh
loops

Hint
Huge input and output,scanf and printf are recommended.

Analysis:
这题就是构造一个字典，用map比自己写的哈希函数慢很多！
哈希函数：类似与第一题的构造方法，采用26进制．这里所有的可能会有10^26种．所以肯定会产生冲突．
处理冲突的方法：链接法.先算出下标index，然后在遍历list[index]的所有元素

Accepted code:

#include <iostream>
#include<cstdio>
#include<cstring>
#include<list>
using namespace std;

const int maxn=100005;

struct Pair
{
    char English[12],Foreign[12];
    Pair(){}
    Pair(char *str1,char *str2)
    {
        strcpy(English,str1);
        strcpy(Foreign,str2);
    }
    void set(char *str1,char *str2)
    {
        strcpy(English,str1);
        strcpy(Foreign,str2);
    }
    bool operator ==(Pair& obj)
    {
        return strcmp(English,obj.English)==0&&strcmp(Foreign,obj.Foreign);
    }
};

class Hash
{
private:
    list<Pair> data[maxn];
public:
    void insert(Pair& obj)
    {
        int index=0;
        for(int i=0;i<strlen(obj.Foreign);i++)
        {
            index=index*26+(obj.Foreign[i]-'a');
            index%=maxn;
        }
        data[index].push_back(Pair(obj.English,obj.Foreign));
    }

    void search(char *str)
    {
        int index=0;
        for(int i=0;i<strlen(str);i++)
            index=(index*26+str[i]-'a')%maxn;

        bool flag=false;
        for(list<Pair>::iterator i=data[index].begin();i!=data[index].end();i++)
            if(strcmp(str,i->Foreign)==0)
            {
                printf("%s\n",i->English);
                flag=true;
            }
        if(!flag)
        {
            printf("eh\n");
        }
    }
};

int main()
{
    Hash hash;
    char English[12],Foreign[12];
    while(1)
    {
        char c;
        int index=0;
        while(1)
        {
            c=getchar();
            if(c==' '||c=='\n') break;
            English[index++]=c;
        }
        if(c=='\n') break;
        English[index]=0;
        index=0;
        while(1)
        {
            c=getchar();
            if(c=='\n') break;
            Foreign[index++]=c;
        }
        Foreign[index]=0;
        Pair obj(English,Foreign);
        hash.insert(obj);
    }
    char wd[12];
    while(~scanf("%s",wd))
    {
        hash.search(wd);
    }
    return 0;
}

C - 0和1相等串 51Nod - 1393
给定一个0-1串，请找到一个尽可能长的子串，其中包含的0与1的个数相等。

Input
一个字符串，只包含01，长度不超过1000000。

Output
一行一个整数，最长的0与1的个数相等的子串的长度。

Sample Input
1011

Sample Output
2

Analysis:
计算前缀和，结果保存在数组中．如果遇到’1’则加１，否则加-1.
这样，我们就把问题转换成求前缀和相同的最长距离，这就让我们不难想到使用哈希表来查询了．

Accepted code:

#include<cstdio>
#include<cstring>

using namespace std;
const int maxn=1000005;
const int maxlen=2*maxn;
char str[maxn];
int sum[maxn],hash[maxlen];

int hash_function(int x)
{
    return x+maxn;
}

int main()
{
    fgets(str,maxn,stdin);
    sum[0]=0;
    int len=strlen(str)-1;
    for(int i=0;i<len;i++)
    {
        str[i]=='1'?(sum[i+1]=sum[i]+1):(sum[i+1]=sum[i]-1);
//        printf("%d ",sum[i+1]);
    }
//    printf("\n");
    memset(hash,-1,sizeof(hash));
//This initial is very important!
    int index=hash_function(0);
    hash[index]=0;
    int res=0;
    for(int i=1;i<=len;i++)
    {
        index=hash_function(sum[i]);
        if(hash[index]!=-1)
            res=res>i-hash[index]?res:i-hash[index];
        else
            hash[index]=i;
    }
    printf("%d\n",res);
}

D - Gold Balanced Lineup POJ - 3274
Farmer John’s N cows (1 ≤ N ≤ 100,000) share many similarities. In fact, FJ has been able to narrow down the list of features shared by his cows to a list of only K different features (1 ≤ K ≤ 30). For example, cows exhibiting feature #1 might have spots, cows exhibiting feature #2 might prefer C to Pascal, and so on.

FJ has even devised a concise way to describe each cow in terms of its “feature ID”, a single K-bit integer whose binary representation tells us the set of features exhibited by the cow. As an example, suppose a cow has feature ID = 13. Since 13 written in binary is 1101, this means our cow exhibits features 1, 3, and 4 (reading right to left), but not feature 2. More generally, we find a 1 in the 2^(i-1) place if a cow exhibits feature i.

Always the sensitive fellow, FJ lined up cows 1..N in a long row and noticed that certain ranges of cows are somewhat “balanced” in terms of the features the exhibit. A contiguous range of cows i..j is balanced if each of the K possible features is exhibited by the same number of cows in the range. FJ is curious as to the size of the largest balanced range of cows. See if you can determine it.

Input
Line 1: Two space-separated integers, N and K.
Lines 2.. N+1: Line i+1 contains a single K-bit integer specifying the features present in cow i. The least-significant bit of this integer is 1 if the cow exhibits feature #1, and the most-significant bit is 1 if the cow exhibits feature # K.

Output
Line 1: A single integer giving the size of the largest contiguous balanced group of cows.

Sample Input
7 3
7
6
7
2
1
4
2

Sample Output
4

Hint
In the range from cow #3 to cow #6 (of size 4), each feature appears in exactly 2 cows in this range

Analysis:
这题像是第三题的升级版，但是本质的方法还是一样．
我们先统计前缀和sum[i][j],根据题目意思，有：
sum[i][0]-sum[i-1][0]=sum[i][k]-sum[i-1][k],１＜＝k＜＝ｎ
所以有sum[i][k]-sum[i][0]=sum[i-1][k]-sum[i-1][0]
然后令c[i][k]=sum[i][k]-sum[i][0],题目就转化为求c[i]相同的最大距离
这里我试了几个哈希函数，感觉还是三次方的哈希函数稍微快一些．
然后解决冲突的方法是开放地址法，使用二次方的方法进行探测．

Accepted code:

#include<cstdio>
#include<cstring>
using namespace std;

const int maxn=100005;
const int maxcol=33;
int hash[maxn*20];
int sum[maxn][maxcol],c[maxn][maxcol];
int n,k;
const int maxlen=maxn*20;

int hash_function(int cc[])
{
    int key=0;
    for(int i=0;i<k;i++)
    {
     // printf("%d ",cc[i]);
        key+=cc[i]*(cc[i]+10)*(cc[i]+50);
        key%=maxlen;
    }
    //printf("\n");
    return key>0?key:-key;
}

int main()
{
    scanf("%d%d",&n,&k);
    memset(hash,-1,sizeof(hash));
    memset(sum,0,sizeof(sum));
    hash[0]=0;
    int res=0;
    for(int i=1;i<=n;i++)
    {
        int num;
        scanf("%d",&num);

        for(int j=0;j<k;j++)
        {
            sum[i][j]=sum[i-1][j]+num%2;
            c[i][j]=sum[i][j]-sum[i][0];
            num>>=1;
        }
        int key=hash_function(c[i]),cp_key=key;
        for(int kk=1;hash[key]!=-1;kk++)
        {
            int index=hash[key],l;
            for(l=0;l<k;l++)
                if(c[i][l]!=c[index][l]) break;
            if(l==k&&res<(i-hash[key]))
            {
                res=i-hash[key];
                break;
            }
            key=(cp_key+kk*kk)%maxlen;
        }
        if(hash[key]==-1)
            hash[key]=i;

    }
    printf("%d\n",res);
}

猜你喜欢