Number of occurrences of string B in A (no modulo + modulo template)

Topic link

Given a string A and a string B, find the number of occurrences of B in A.

B appearing in different positions in A can overlap.

Input

There are two lines of input, string A and string B.

Output

Output an integer representing the number of occurrences of B in A.

Sample Input

zyzyzyz
zyz

Sample Output
3

HINT

1≤A, the length of B≤10e6, A and B only contain uppercase and lowercase letters.

Analysis:
string problem: generally there are two methods of hash and Kmp

String Hash (hash):
the problem sought: find the position or number of times the string B appears in the string A.

Violent thinking

	枚举字符串A中个所有位置,非常浪费时间
	例如字符串A为aaaaaaaaaaab     字符串B为aaaab

Hash thinking

例如字符串A为aaaaaaaaaaab     字符串B为aaaab
字符串B长度=5,求出字符串B的哈希值,并且利用前缀和思想求字符A的哈希值
然后利用前缀和思想可知任意一段长度的哈希值

The next step is to find the position where the hash value of string A has a length of 5 and is equal to the hash value of string B. Of course, you can also find the number

Use prefixes and ideas to find the hash value of character A. Specific ideas:

假设字符A为ACDA,字符串B为CD,这里在假定A=1,C=3,D=4
这里只是简化下面的叙话,写代码过程中无需这么写
首先,引入一个质数B,不要太大也不要太小
接着  定义H[0]=1,长度为i的哈希值定义为H[i]
H(1)=1
H(2)=1*B+3
H(3)=1*B^2+3*B+4
H(4)=1*B^3+3*B^2+4*B+1

Obviously, H[k+1]=H(k)*B+c[k+1]Insert picture description here

Here c is the string, m is the length of the string

同理可得B的哈希值为s=3*B+4
我们可以发现H(3)-H(2)*(B^2)==3*B+4
很明显,最后乘以B的几次方是根据字符串B的长度而定的

The correctness of the string hash (I don’t understand it very well, but these are the theorems of the predecessors)

字符串Hash对于任意不同的字符串所产生的哈希值必然是互不相同的吗
显然不是的
但如果我们的哈希函数所生成的哈希值是随机分布的话,不同的字符串哈希值相等的
概率是很低的,因此我们常常认为竞赛中的题目不会出现字符串相等的情况,实际上
根据生日悖论,对于[0,n)内均匀分布的哈希函数,出现不同字符串哈希值相等
的期望是O(sqrt(n)),这在选择哈希函数时可以作为一个效率和正确性参考。

Here to explain, the hash function here is like this
Insert picture description here

In fact, the significance of modulating is not to explode the data, but it is also correct not to modulate this question, because the computer also has certain processing rules for exploded data, and it does not return a value at will, so we can determine that it is for the exploded two. For a different number, the value returned by the computer is not the same. In this question, we only consider the hash values ​​are not equal, regardless of whether it is positive or negative.

But if you need to use the hash value as a subscript in some questions, you must take the modulus

In order to enhance the correctness of Hash

可以采用“双哈希”来降低不同字符串出现相同哈希值的概率,即取不同的模数,把
不同模数算出的哈希值都记下来,只有几个哈希值都一样,我们才判定字符串匹配。
通常用双哈希就可以就可以将冲突降到最低,如果分别取h=10e9+7和10e9+9,就
几乎不可能发生冲突,因为他们是一对“孪生质数”。当然为了考虑下标,我们可以
考虑取10e6左右的质数,比如999979。

The code of the question: (no modulo)
Insert picture description here

#include<stdio.h>
#include<string.h>
#include<algorithm>
#define maxn 1000010
#define B 31
using namespace std;
typedef long long ll;
char a[maxn],b[maxn];
ll sum[maxn];//用来存放前i位哈希值 
ll power[maxn];
void init()//存储B^n(n=1,2,3,4,5,6...) 
{
	power[0]=1;
	for(int i=1;i<=1000000;i++)
		power[i]=power[i-1]*B;//爆了按溢出处理 
}
int main()
{
	init();//预处理 
	while(~scanf("%s",a+1))//下标从1开始 //a[0]='\0'; 
	{
		int i,j;
		scanf("%s",b+1);
		int la=strlen(a+1);//长度未变 
		int lb=strlen(b+1); 
		if(la<lb)
			printf("0\n");
		else
		{
			sum[0]=0;
			for(i=1;i<=la;i++)
				sum[i]=sum[i-1]*B+(a[i]-'A'+1);//可减A也可以不减
			
			ll s=0;//存储匹配串哈希值 
		
			for(i=1;i<=lb;i++)
				s=s*B+(b[i]-'A'+1);
			
			int ans=0;//记符合几串
			for(i=0;i<=la-lb;i++)
			{
				if(s==sum[i+lb]-sum[i]*power[lb])
					ans++;
			}
			printf("%d\n",ans); 
		}
	}
	return 0;
} 

Modulo code (single hash)
Insert picture description here

#include<stdio.h>
#include<string.h>
#include<algorithm>
#define maxn 1000010
#define B 31
using namespace std;
typedef long long ll;
char a[maxn],b[maxn];
ll sum[maxn];//用来存放前i位哈希值 
ll power[maxn];
ll mod=1e9+7;
void init()//存储B^n(n=1,2,3,4,5,6...) 
{
	power[0]=1;
	for(int i=1;i<=1000000;i++)
		power[i]=(power[i-1]*B)%mod;//爆了按溢出处理 	
}
int main()
{
	init();//预处理 
	while(~scanf("%s",a+1))//下标从1开始 //a[0]='\0'; 
	{
		int i,j;
		scanf("%s",b+1);
		int la=strlen(a+1);//长度未变 
		int lb=strlen(b+1); 
		if(la<lb)
			printf("0\n");
		else
		{
			sum[0]=0;
			for(i=1;i<=la;i++)
				sum[i]=(sum[i-1]*B+a[i])%mod;
			
			ll s=0;//存储匹配串哈希值 
		
			for(i=1;i<=lb;i++)
				s=(s*B+b[i])%mod;
			
			int ans=0;//记符合几串
			for(i=0;i<=la-lb;i++)
			{
				if((s==((sum[i+lb]-sum[i]*power[lb])%mod+mod)%mod))
					ans++;
			}
			printf("%d\n",ans); 
		}
	}
	return 0;
} 

Guess you like

Origin blog.csdn.net/Helinshan/article/details/109533923