Codeforces Contest 1120 problem C Compress String —— 字符串hash+dp 结构体unique的方法

Suppose you are given a string s of length n consisting of lowercase English letters. You need to compress it using the smallest possible number of coins.

To compress the string, you have to represent s as a concatenation of several non-empty strings: s=t1t2…tk. The i-th of these strings should be encoded with one of the two ways:

if |ti|=1, meaning that the current string consists of a single character, you can encode it paying a coins;
if ti is a substring of t1t2…ti−1, then you can encode it paying b coins.
A string x is a substring of a string y if x can be obtained from y by deletion of several (possibly, zero or all) characters from the beginning and several (possibly, zero or all) characters from the end.

So your task is to calculate the minimum possible number of coins you need to spend in order to compress the given string s.

Input
The first line contains three positive integers, separated by spaces: n, a and b (1≤n,a,b≤5000) — the length of the string, the cost to compress a one-character string and the cost to compress a string that appeared before.

The second line contains a single string s, consisting of n lowercase English letters.

Output
Output a single integer — the smallest possible number of coins you need to spend to compress s.

Examples
inputCopy
3 3 1
aba
outputCopy
7
inputCopy
4 1 1
abcd
outputCopy
4
inputCopy
4 10 1
aaaa
outputCopy
12
Note
In the first sample case, you can set t1= ‘a’, t2= ‘b’, t3= ‘a’ and pay 3+3+1=7 coins, since t3 is a substring of t1t2.

In the second sample, you just need to compress every character by itself.

In the third sample, you set t1=t2= ‘a’, t3= ‘aa’ and pay 10+1+1=12 coins, since t2 is a substring of t1 and t3 is a substring of t1t2.

题意:

给你一个字符串,让你压缩这个字符串,如果你选择的字符长度为1并且这个字符之前没出现过相同的字符,压缩代价为a,如果你选择的字符长度任意,且在这个字符串之前出现过相同的字符串,那么压缩代价可以为b,问你将所有字符压缩的最小代价

题解:

这道题目做的方法很多,lcs,后缀数组等,我用的是字符串hash+dp,但是这种方法不好,很容易t,用map存字符串会mle,那么只能用字符串hash,这里总共有不多于3e6个字符串,那么模1e9+7就有很大几率会重,所以我用unsigned long long 来让他自然溢出。首先我们处理出所有的字符串,按照hash值,出现的位置排序,unique由于保留的是第一个,所以留下来的位置就是最小的那个位置。cal函数表示的是处理出这一段(不包括sta)的哈希值,因为这道题需要从后往前哈希,所以sta和fin换了个位置。然后我们for一遍n,对于每个位置看看最长能从前面哪里转移过来,就是说在之前的位置最长出现的字符串的长度是多少。这里有一个优化,如果暴力for长度的话是过不了的。我建一个表示位置的值l,我们每次都找l与i之间的字符串是否出现过(不包括l),如果没有l++,为什么这个是正确的?因为如果l到i这一段没有出现过了,那么之后的在家上来就更不可能出现过。
时间复杂度:nnlog
在这里插入图片描述

#include<stdio.h>
#include<algorithm>
#include<math.h>
using namespace std;
#define ll unsigned long long
const int N=5e3+5;
struct node
{
    ll h;
    int pos;
    node(){}
    node(ll h,int pos):h(h),pos(pos){}
    bool operator< (const node& a)const
    {
        if(h!=a.h)
            return h<a.h;
        return pos<a.pos;
    }
    bool operator== (const node& a)const
    {
        return h==a.h;
    }
}mp[N*N/2];
char s[N];
int cha[N];
ll has;
ll maxn=9223372036854775807*2+1;
ll hh[N],p[N],dp[N];
ll cal(int fin,int sta)
{
	return maxn-hh[sta]*p[sta-fin]+hh[fin]+1;
}
int main()
{
    int n,a,b;
    scanf("%d%d%d",&n,&a,&b);
    scanf("%s",s+1);
    p[0]=1;
    for(int i=1;i<=n;i++)
        cha[i]=s[i]-'a'+28,p[i]=p[i-1]*37;
    for(int i=n;i>=1;i--)
        hh[i]=hh[i+1]*37+cha[i];
    int all=0,pos;
    for(int i=1;i<=n;i++)
    {
        has=0;
        for(int j=i;j>=1;j--)
        {
            has=has*37+cha[j];
            mp[++all].h=has,mp[all].pos=i;
        }
    }
    sort(mp+1,mp+1+all);
    all=unique(mp+1,mp+1+all)-mp-1;
    dp[1]=a;
    int l=1;
    for(int i=2;i<=n;i++)
    {
        dp[i]=dp[i-1]+a;
        pos=lower_bound(mp+1,mp+1+all,node(cha[i],0))-mp;
        if(mp[pos].pos<i)
            dp[i]=min(dp[i],dp[i-1]+b);
        has=cal(l+1,i+1);
        pos=lower_bound(mp+1,mp+1+all,node(has,0))-mp;
        while(mp[pos].pos>l&&l<i)
            l++,has=cal(l+1,i+1),pos=lower_bound(mp+1,mp+1+all,node(has,0))-mp;
        if(l<i)
            dp[i]=min(dp[i],dp[l]+b);
    }
    printf("%llu\n",dp[n]);
    return 0;
}

大神解法:
这个就是刚才说的lcs,lcs[i][j]表示以i为结束时,以j为结束的两个字符串的最长相同长度,如果i和j相等,那么就是i-1,j-1 +1,否则就是0不过也有可能会出现bbbbb,两个长度为3的交在一起的情况,那么就要从i-lcs[i][j],j中取最大值

#include<bits/stdc++.h>
using namespace std;
int f[5010],lcs[5010][5010];
int n,a,b;string s;
int main()
{
	cin>>n>>a>>b>>s;s="P"+s;
	for(int i=1;i<=n;i++)
	{
		f[i]=f[i-1]+a;
		for(int j=1;j<i;j++)
		{
			lcs[i][j]=(s[i]==s[j])?lcs[i-1][j-1]+1:0;
			f[i]=min(f[i],f[max(j,i-lcs[i][j])]+b);
		}
	}
	printf("%d",f[n]);return 0;
}

猜你喜欢

转载自blog.csdn.net/tianyizhicheng/article/details/88743359