Z algorithm

Z algorithm

Z algorithm is an algorithm for string matching. The core of this algorithm is that the \ (Z \) array and its method of calculating.

(Hereinafter, the subscript convention string \ (1 \) starts)

\ (\ bm z \) array and the Z-box

Defined \ (Z \) array: \ (Z_ {A, I} \) represents the string \ (A \) of \ (I \) bit first, and later can \ (A \) matching prefix the longest length. Obviously, \ (Z_ {A, 1} = | A | \) permanent establishment.

A Z-box is an interval. Given a string \ (A \) , then the \ (A \) exists on a Z-box \ ([l, r] \ ) if and only if the following all of the conditions:

  • \ (l in \ ne1 \) ;
  • \ (z_ {a, l} \ ne0 \) ;
  • \(r=l+z_{a,l}-1\)

Popular, if the \ (A \) of \ (I \) start bit and energy \ (A \) prefix matches at least \ (1 \) bits, the longest matching string can be covered over a section It is a Z-box. ( \ (L \ NE1 \) because the position \ (1 \) is very special in itself prefix, considered separately)

For example, if \ (A = \ {text ACACTAAC `` ''} \) , then the \ (Z_ {A} = [8,0,2,0,0,1,2,0] \) , there is the Z-Box \ ([3,4], [6,6], [7,8] \) .

\ (\ bm z \) array Seeking

Given string \ (A \) , now we need to find \ (Z_} A {\) .

Because of \ (z_ {a, 1} \) values do not seek, and the position \ (1 \) special, is the prefix, so we treated separately.

Suppose we now know \ (z_ {a, 2 \ sim i-1} \) and that \ (Zr \) maximum the Z-Box \ ([ZL, Zr] \) , requires the \ (z_ {a , i} \) and updates \ (ZL, Zr \) , then the points \ (2 \) cases:

  1. \ (Zr <I \) . At this point we direct violence from the section \ (I \) determined bit match rearwardly \ (Z_ {A, I} \) . If \ (Z_ {A, I} \ NeO \) , then let \ (ZL = I, I + Zr = Z_ {A, I} -1 \) ;
  2. \ (Zr \ GE i \) . Set \ (I-ZL +. 1 = I '\) , i.e. \ (i' \) is the spanning \ (I \) of the Z-Box \ ([ZL, Zr] \) is translated to \ (A \) of after the prefix \ (I \) position. At this point divided \ (2 \) cases:
    1. \ (I + Z_ {A, I '} \ Le Zr \) . Obviously \ ([I, I + Z_ {A, I '}] \ subsetneq [ZL, Zr] \) . According to the definition of the Z-box, \ (\ FORALL J \ in [I, I + Z_ {A, I '}], a_j A_ = {J}. 1-ZL + \) . So from \ (A \) of \ (I \) bits starting with \ (A \) case matched prefix and from \ (i '\) bits starting is the same, the direct cause \ (z_ {a, A = {} Z_ I, I '} \) , \ (ZL, Zr \) unchanged;
    2. \ (I + Z_ {A, I '}> Zr \) . Similarly, \ (\ FORALL J \ in [I, Zr], a_j A_ = {J}. 1-ZL + \) . Then \ (A \) of \ (i \ sim zr \) bits of \ (A \) case matched prefix and \ (i '\ sim zr- zl + 1 \) bits is the same, obviously \ (z_ {a, i} \ ) there is at least \ (zr-i + 1 \ ) . so he directly from \ (i + z_ {a, i '} \) bits starting violent backward match determined \ ( A {Z_, I} \) , and let \ (ZL = I, I + Zr = Z_ {A, I} -1 \) (as \ (z_ {a, i} \) is unlikely to be \ (0 \ ) ).

Such Shilling \ (Z_1 = | A | \) , then as described above from \ (i = 2 \) Recursive to \ (I = | A | \) , can be determined \ (z_a \) array.

The following is seeking \ (Z \) Code array:

//|a|=n
void z_init(){//求z数组
    z[1]=n;//特殊处理z[1]
    int zl=0,zr=0;//右端点最大的Z-box
    for(int i=2;i<=n;i++)//从i=1递推到i=n
        if(zr<i){//第1种情况
            z[i]=0;
            while(i+z[i]<=s&&a[i+z[i]]==a[1+z[i]])z[i]++;//直接向后暴力匹配
            if(z[i])zl=i,zr=i+z[i]-1;//更新右端点最大的Z-box
        }
        else if(i+z[i-zl+1]<=zr)z[i]=z[i-zl+1];//第2种情况的第1种情况
        else{//第2种情况的第1种情况
            z[i]=zr-i+1;//z[i]至少有zr-i+1这么多
            while(i+z[i]<=s&&a[i+z[i]]==a[1+z[i]])z[i]++;//后面再暴力匹配
            zl=i;zr=i+z[i]-1;//更新右端点最大的Z-box
        }
}

time complexity

Seeking as described above \ (Z \) array is linear time complexity \ (\ mathrm {O} (n-) \) .

prove(Emotional): Observe the above methods can be found only if \ (i> zr \) when it may be in this position and character prefix match, but after the end of the match will \ (zr \) updated to match the success of the last position, each prefix characters and most successful match \ (1 \) times, so the total number of successfully matched \ (\ mathrm {O} (n-) \) ; operator \ (z_i \) , if the next match violence ( That is not the first encounter \ (2 \) the first case of \ (1 \) case), then the first \ (1 \) times will stop the match fails, it matches the total number of failures but also for the \ (\ {O} mathrm (n-) \) . The total time is thus the time spent matching \ (\ mathrm {O} ( n + n) = \ mathrm O (n) \) plus some assigned update \ (zl, zr \) and some \ (1 \ ) times as long as the \ (\ mathrm O (1) \) operation, you still \ (\ mathrm O (n) \) a.

Application Z algorithm

The most common usage is the string pattern matching (KMP this hash and can do linear complexity). Consider the pattern string \ (B \) spacer to a less frequently used character text string \ (A \) in front, and even if \ (B + C = \ {text `! 'A} + \) . Then obtains \ (z_c \) , from \ (i = | b | +2 \) to \ (\ i = | | c ) sweeping again, if \ (z_i = | B | \) , then the position matching success. Note: The so-called unusual characters must not appear in the string, or will the bug. If use pattern strings \ (C \) to match the two text strings \ (A, B \) , can make \ (d = c + \ text { `! '} + A + \ text {` @'} + b \ ) , then two separator can not be the same, or will the bug.

Why Z string pattern matching algorithm on the time spent and the same hash it? Z algorithm calculated from the length of the longest every one can begin with the prefix match, but the string pattern matching only need to know whether the prefix \ (c_ {1 \ sim | b |} \) match, not fully use \ (z \) the value of the array. If you just want to know a bit begin to match the longest prefix length, hash may be necessary to help the half of complexity is with \ (\ log \) , and it is better to pretreatment with Z algorithm. May reference to the following specific section \ (2 \) sample item.

Not only that, the constant Z algorithm Bi Haxi small (because in order not to be hashed card, The FST is not CodeForcesGenerally write a double hashes), Bi Haxi correct rate is high (Z algorithm is of course correct rate \ (100 \% \) it).

\ (\ bm2 \) sample item

CodeForces 526D Om Nom and Necklace

Solution to a problem Portal

CodeForces 427D Match & Catch

Solution to a problem Portal

Guess you like

Origin www.cnblogs.com/ycx-akioi/p/Z-algorithm.html