Longest Common Substring(最长公共子串)

SP1811

题目描述
A string is finite sequence of characters over a non-empty finite set Σ.
In this problem, Σ is the set of lowercase letters.
Substring, also called factor, is a consecutive sequence of characters occurrences at least once in a string.
Now your task is simple, for two given strings, find the length of the longest common substring of them.
Here common substring means a substring of two or more strings.
输入格式
The input contains exactly two lines, each line consists of no more than 250000 lowercase letters, representing a string.
输出格式
The length of the longest common substring. If such string doesn’t exist, print “0” instead.
题意翻译
输入2 个长度不大于250000的字符串,输出这2 个字符串的最长公共子串。如果没有公共子串则输出0 。
Translated by @xyz32768
输入输出样例
输入
alsdfkjfjkdsal
fdjskalajfkdsla
输出
3

有两个字符串 A A B B ,我们要求它们的最长公共连续子串。
首先,我们对 A A 建立一个 S A M SAM
定义 L e n Len 为当前 B B 的前 i i 个字符组成的子串与 A A 的最长公共子序列, S t a t e State 为当前状态,初始化为0(初始状态)。 n e x t ( S t a t e , c h ) next(State,ch) 为在 S t a t e State 状态节点处往 c h ch 道路走的下一状态。
匹配
然后我们从 B [ i = 0 ] B[i=0] 开始,在 A A S A M SAM 上走,一个一个匹配,若:

  1. 当前状态朝着 B [ i ] B[i] 往下走有路,说明可以继续往下匹配,就接着走,即 S t a t e = n e x t ( S t a t e , B [ i ] ) + + L e n State=next(State,B[i]),++Len
  2. 如果没有路了,就跳到当前状态在后缀连接树上的父节点,如果父节点还是没有 B [ i ] B[i] 的路,就一直往上跳(即 S t a t e = l i n k ( S t a t e ) State=link(State) ),直到遇到能往下走的边。此时就令 L e n = l e n ( ) + 1 Len=len(当前状态)+1 S t a t e = n e x t ( S t a t e , B [ i ] ) State=next(State,B[i])
  3. 如果跳到头了都没有能走的路的话,就说明要从B[i]开始重新匹配,令 L e n = 0 S t a t e = 0 Len=0,State=0

原理:如果 B [ i ] B[i] 在当前位置下失配(无路可走),那么说明当前状态下的所有子串都失配了,但是它的后缀连接树上的父节点不一定失配,就继续往上找,即相当于当前已经匹配的 A A 的子串的左边界往右移,然后继续找路。如果一直没路,就一直往上找,直到达到 初始状态 ,如果此时仍没有路的话,说名在当前 L e n Len 长度下已经是 B [ i ] B[上一次从初始状态开始匹配的i] 开始的最长公共子串了,无法在加长了。那就让 A n s = m a x ( A n s , L e n ) Ans=max(Ans,Len) ,让后以 B [ i ] B[i] 为新的开头从初始状态重新开始匹配。即整个过程就是再找 B B 的所有前缀的后缀最长能和 A A 匹配多少。

#include<iostream>
#include<string>
#include<cstring>
#include<algorithm>
#include<vector>
#include<cmath>
#include<map>
using namespace std;
const int MAXN = 250005;
int n;
char
A[MAXN],
B[MAXN];
struct SAM {
	int size, last;
	struct Node {
		int len = 0, link = 0;
		int next[26];
		void clear() {
			len = link = 0;
			memset(next, 0, sizeof(next));
		}
	} node[MAXN * 2];
	void init() {
		for (int i = 0; i < size; i++) {
			node[i].clear();
		}
		node[0].link = -1;
		size = 1;
		last = 0;
	}
	void insert(char x) {
		int ch = x - 'a';
		int cur = size++;
		node[cur].len = node[last].len + 1;
		int p = last;
		while (p != -1 && !node[p].next[ch]) {
			node[p].next[ch] = cur;
			p = node[p].link;
		}
		if (p == -1) {
			node[cur].link = 0;
		}
		else {
			int q = node[p].next[ch];
			if (node[p].len + 1 == node[q].len) {
				node[cur].link = q;
			}
			else {
				int clone = size++;
				node[clone] = node[q];
				node[clone].len = node[p].len + 1;
				while (p != -1 && node[p].next[ch] == q) {
					node[p].next[ch] = clone;
					p = node[p].link;
				}
				node[q].link = node[cur].link = clone;
			}
		}
		last = cur;
	}
}sam;
int getNextState(const int& CurState,int Loc) {
	return sam.node[CurState].next[Loc - 'a'];
}
int Compute(int n) {
	int
		&& Ans = 0,
		&& CurState = 0,
		&& Len = 0;
	for (int i = 0; i < n; ++i) {
		//如果有路可走,就走噻
		if (getNextState(CurState, B[i])) {
			CurState = getNextState(CurState, B[i]);
			++Len;
		}
		//否则
		else {
			//跳link
			for (CurState = sam.node[CurState].link;; CurState = sam.node[CurState].link) {
				//如果跳到了
				if (CurState > 0 && getNextState(CurState, B[i])) {
					Len = sam.node[CurState].len + 1;
					CurState = getNextState(CurState, B[i]);
					break;
				}
				//如果跳到初始状态。
				else if (CurState <= 0) {
					Len = 0;
					CurState = 0;
					break;
				}
			}
		}
		Ans = max(Ans, Len);
	}
	return Ans;
}
int main() {
	scanf("%s%s", &A, &B);
	int Len_A = strlen(A);
	sam.init();
	for (int i = 0; i < Len_A; ++i) {
		sam.insert(A[i]);
	}
	printf("%d", Compute(strlen(B)));
	return 0;
}
发布了55 篇原创文章 · 获赞 63 · 访问量 2万+

猜你喜欢

转载自blog.csdn.net/qq_42971794/article/details/104088982