"Suffix array"

Foreword

Do relatively water problem, write a summary please forgive me.

application

Currently doing questions are relatively routine questions, some applications involve directly in the solution to a problem in.

When very useful for statistical substrings because the substring must be a prefix corresponding to a unique suffix.

It can be used when "many" of. AC automatic machine is "many to one."

In fact, I forgot what's up.

contact

Suffix tree

Suffix automata

AC automaton

Think "fail tree" and the suffix array of AC automaton difference is "fail tree" is the prefix of suffix of all.

example

A. Sandy card

Changed the title 26 times, very detailed.

First, the same title is defined as the same as the differential conversion string.

Then the subject is equivalent to seeking a maximum number of the same string string.

Approach is to take these strings with special characters are separated, run a suffix array processing "height."

Half the answer, you can put in a ranking for the next target array into a number of "height"> = mid range of.

Determine whether there is a range of points satisfying all the strings have to.

Note that "height" i and i-1 is the longest common prefix, so this section should include the type of string in front of that point.

B. meow named on the planet

First name string strings together and asked to run SA.

Each query string, about half of it can end, "L, R" meet lcp == query string length, it is clear that there is a dichotomy of.

The answer is the number of interval [L, R] different name strings, MO team can do or offline BIT do, or the President of the number of tree colors do.

For the second question, BIT can do, you can do the difference.

The difference:

In Mozambique team process, for the newly added a new color to the color added $ m-now + contribution of $ 1, it may be assumed that color after $ m-now + $ 1 times appear.

For one color left, minus $ m-now contribution of $ 1 +.

This difference is easy to find the right.

BIT:

Q. The first number is required for each different color section.

The second question is seeking is a color number cover different intervals.

The second question:

L according to the ascending section, enumerate the current position i, + 1'd the BIT on the left point, right point is at point 1 in the respective left, statistics this point the answer is $ sum (i) -sum (pre [i] ) $

$ Pre [i] $ is a position of the color.

Why I think is right.

Consider the contribution of the answer is this color with each position, the answer is $ \ sum sum (i) $

It can be said that the position can contribute answer is the number of the interval between one position and this position.

Consider again the BIT + 1'd points on the left, at the right point in the process -1 corresponding to the left point is the number of the processing section is included in the current point i.

Minus $ sum (pre [i]) $ is the answer.

C. strings

Effect: substring demand interval period with a maximum lcp string.

Lcp half length, two separated interval [l, r] such as A question, which if string [a, b] satisfying the condition and the length in the lcp length is legitimate.

Chairman of the tree did not play a direct enumeration [l, r] for each position can live.

D. Differences

Maintenance monotonous stack height for each suffix as the minimum interval, the interval of the accumulated contributions is the answer.

E. similar substring

Xianpao SA, is essentially different strings rank $ \ sum \ limits_ {i = 1} ^ n n-sa_i + 1-height_i $

Find the i-th substring, the j substrings about endpoints, the rest is minimization of the ST segment table lcp.

When seeking lcp If it is to do with their own, special judge to return $ n-sa_i + 1 $, otherwise the left endpoint +1 again RMQ.

But also to be the opposite of the string again, it is to open two structures SA, as if directly inserted into the opposite of the original string back, too (not verified).

F. tasting General Assembly

First, "r similarity" can become "0 ~ r ', which is similar' 1 ~ r similar" is given to the subject, "0 similarity" can be changed because both the "similar x", "y similarity" are equivalent to in relation to a point, but the subject was required to "0 similar" relationship is that all of the points.

Of course you special judge said there is no problem.

The question becomes seeking two suffixes lcp, suffixes, and you can then take in the statistical answer.

Because there are negative numbers,

Product product equivalent to the maximum interrogation point to the intermediate point $ [l_i, i-1] $ and $ [i, r_i] $ minimum value and the maximum value is taken max.

Maintenance monotonous stack height for each suffix as the minimum interval, the interval of the accumulated contributions is the answer.

"Do not think this sentence seems to appear before?"


 

A. extraterrestrial contact

Emergent of a question.

Is the number of occurrences of a different nature to find the substring, then run again SA by rating traversed again.

Substring for each position on the direct successor to support it $ O (n ^ 2) $ has a total height of the real answer rankings, a new string directly to the new rankings cumulative answer.

The answer is to appear larger than the number of the last substring 1, by rating the output.

B. fleas

A solution to a problem is not decadent "thanks to $ ak_t $" good question.

My approach:

First-half final answer, but not binary string, consider the final answer must appear in the original string, it is half the answer ranking in the sub-string.

Ranked relationship can not be found in the back of the substring mid, is the need to cut, our aim is to determine the strings are cut back and spend a minimum of k-1.

According greedy idea, the better must be cut, and therefore would not have half of k, even if half of it must also be a single growing.

SA to obtain or use a different nature in the mid string rankings calculated after the start of the string is $ i $, length $ j $.

They say the $ i $, are referring to is ranked as the $ i $ suffix.

Then the starting point for $ I $, length $ j + 1 $ ~ $ n-sa_i + 1 $ require a knife cut at least slightly, that is, if the point I $ I $ $ $ imagine and $ i + 1 $ directly void.

So the best strategy is between $ i $ ~ $ i + j-1 $ cutting knife, because it would cut off all the starting point for the illegal string of i.

Similarly to the starting point for $ i + 1 $ ~ $ n $ string, as long as all the starting point for the truncated $ i $ ~ $ n $ string is not valid, and the number of cut <= $ k-1 $, then Description this mid legitimate.

So the question to ask in the end transformed for each origin, the optimal strategy in the end zone where yes.

Or that $ j $, we found that for the latter starting point k, using it as a starting point, do not cut the string is the length of <= "$ j $ and $ i + 1 $ ~ $ k $ of height to take min," this thing is not lcp called, and I yy is out, I feel like is right to hit out.

$ K $ can be understood as the existence of "equivalence $ i $ substring starting legal."

This put the question again became transferred "given intervals m, p have the opportunity to cover the times, seeking each interval can cover at least one" problem.

$ True I can not tell Ak_t $ greedy ideas:

The interval is split into two $ l $$ r $ position, into corresponding vector index in this range.

Enumeration $ 1 ~ n $ is added to the left end of the stack of all, the right end of a hit not visited directly to the left end of the stack section corresponding to the point marked as visited index, and cnt ++.

Why do so is minimal, emotional understanding about it, a time where it is most could not spare cut, it must be optimal.

Another point is that $ n $ corresponding to the point gap is not guaranteed, because the automatic cut off the head and tail said.

Problem solution approach:

The front half intact.

Then backwards in the original sequence to add a character of a character, that character if the addition is complete, the string's ranking on the> mid, then the gap between the string and the character before it is necessary to cut knife.

Correctness, apparently. Optimality, because every time just before the break in the case may not continue, so it must be optimal.

Cut the number of <= $ k-1 $, it shows the mid legitimate.

skyh approach:

The front half intact.

Subsequently maintains a $ $ Mn represents a minimum desired position of the cut, each update $ mn = min (mn, min (len, lcp (st, j)) $

I found it easy and practice equivalent, but more concise.

D. predict the stock market

E. SvT

Divide and conquer, lcp.

Consider the partition of time each point will only be as a minimum RMQ once, so the total complexity $ O (nlog) $.

Guess you like

Origin www.cnblogs.com/hzoi2018-xuefeng/p/12093115.html