``Introduction to Algorithms'' @书Note@ String matching algorithm and: javascript implementation of each algorithm

concept

Match the position of the b string in the a string

Comparison of various algorithms

Insert picture description here

Naive algorithm

Insert picture description here
Insert picture description here
As shown in the figure above, the idea is that the sliding window treats str2 as a skateboard and slides from left to right to compare each string in turn

javascript implementation

/**
 * @param {string} haystack
 * @param {string} needle
 * @return {number}
 */
var strStr = function(haystack, needle) {
    
    
    if (needle === '') return 0;
    let l = haystack.length;
    let n = needle.length;
    for (let i = 0; i < l - n + 1; i++) {
    
    
      if (haystack.substring(i, i + n) === needle) {
    
    
        return i;
      }
    }
    return -1;
};

Rabin-Karp algorithm

Horner's law
Insert picture description here
The idea of ​​the algorithm is like this

  1. We can convert a string into a decimal number. They are all 基数drepresented by a specific number, but the converted number may be too large, so we 质数perform 求模operations on one
  2. The sliding window (refer to the naive algorithm above) compares the modulus value of each substring----->Because Horner's law actually the next value can be calculated based on the previous value o(1)
  3. When the modulus is the same, because there is a false hit point, a congruent comparison is required

Why choose prime numbers?

If the modulus is a prime number so that 10q exactly meets the word length of a computer, then single-precision arithmetic operations can be used to perform the necessary operations

javascript implementation

var strStr = function (haystack, needle) {
    
    
	const Q = 101;//素数
	const D = 256;//基数
	const N = haystack.length;
	const M = needle.length;
	let hashHaystackt = 0; //初始化
	let hashNeedle = 0;
	let h = 1;
	for (let i = 0; i < (M - 1); i++) {
    
    
		h = (h * D) % Q;//计算 d的m-1次方 mod q的值

		for (let i = 0; i < M; i++) {
    
    
			hashNeedle = (D * hashNeedle + needle[i].charCodeAt(0)) % Q;
			hashHaystackt = (D * hashHaystackt + haystack[i].charCodeAt(0)) % Q;
		}
		for (let i = 0; i <= N - M; i++) {
    
    
			if (hashNeedle === hashHaystackt) {
    
    
				if (haystack.substring(i - M, i) === needle) {
    
    
					return i - M
				}
			}
			if (i < N - M) {
    
     //计算下一个hashHaystackt
				hashHaystackt = (D * (hashHaystackt - haystack[i].charCodeAt(0) * h) + haystack[i + M].charCodeAt(0)) % Q;
				if (hashHaystackt < 0) {
    
    
					hashHaystackt += Q;
				}
			}
		}
	}
};

Finite automata

Finite automata, also known as sequential machine, is an abstract mathematical model of a finite discrete digital system. A finite automaton M is given by a five-tuple (X, Y, S, δ, λ), where X, Y and S are all non-empty finite sets, which are called the input set, output set and state set of M respectively; δ It is the mapping from Cartesian product set S×X to S, called the next state function of M; λ is the single-valued mapping from S×X to Y, called the output function of M. When δ is single-valued mapping, M is called deterministic finite automata; when δ is multi-valued mapping, M is called non-deterministic finite automata. Finite automata has three functions: as a sequence converter, transforming the input sequence into an output sequence; as a sequence recognizer, identifying whether the input sequence has a certain property; as a sequence generator, generating a sequence with the required properties

This large number of definitions is difficult to understand. It can be seen with the topic of this leetcode. The simple understanding is that each input jumps between different states.

In fact, I personally feel that string matching is not a good algorithm because the generation map表requires a lot of calculations.
Insert picture description here
Among them, it
is a finite alphabet, and each element of it is called an input symbol; it
Sis a finite set of states. An element is called a state; it
fis a transition function, which defines a single-valued mapping from the above, that is, indicates that the current state is p. When the input symbol is a, it transitions to the next state q, which is called p Successor state;
s0is a unique initial state;
Zis a set of terminal states.
At each step of the state transition, according to the current state of the finite automata and the input symbols it faces, the next state of the finite automata can be uniquely determined, that is, the value of the transfer function is unique, which is reflected in the state transition diagram Above, that is, if, then there are n outgoing edges of any node, and the marks on these outgoing edges are not the same. This is why we call the finite automata defined in the above way a definite finite automata

Fake code
Insert picture description here
Insert picture description here

javascript implementation

The implementation of creating a state map table is very simple with this

function createStatusMap(needle) {
    
    
    // let wordmap = 'abcdefghijklmnopqrstuvwxyz';
    let wordmap = 'abc';
    let len = needle.length;
    let wordmapLength = wordmap.length;
    let map = [];
    for (let i = 0; i < len + 1; i++) {
    
    
        let template = needle.substr(0, i); // i 状态下已存在字符
        map.push([]);
        for (let j = 0; j < wordmapLength; j++) {
    
     //遍历输入下一个字符
            let status = i;
            let output = template + wordmap[j]  //组成新字符串
            let k = template.length;
            for (let q = 0; q < k + 1; q++) {
    
     //循环匹配 output的后缀 和 needle的前缀 改变成相应的状态
                    if (output.substring(q, k + 2) == needle.substring(0, k + 1 - q)) {
    
    
                        status = i + 1 - q;
                        break
                    }else{
    
    
                        status = 0
                    }
            }
            map[i][j] = status; //输出到map表
        }
    }
    return map
}
console.log(createStatusMap('ababaca'))

Knuth-Morris-Pratt algorithm

Suppose a text aaaabaac (text) aaab(pattern)
to be matched according to normal matching. If we match the fourth position aaaa aaab obviously does not match, but because it has already matched three digits
, 指针both text and pattern need to be rolled back Three. This processing is time-consuming in some cases.
So we can pre-compute a table so
在已经匹配n位下时 只需要重新匹配pattern的某一段即可满足要求that we can know directly so that our text-side pointer does not have to roll back the pattern or rematch every time.
This table means thatWhen you match i+1 at the position where i is mismatched (that is, it does not match), there must be π characters that match.
For example, for example
ababa, the suffixes that have matched i = 5 ababa and ababaca have at least three yes ababacaprefixes, which means that at least three characters have been matched.
Insert picture description here

javascript implementation

//核心就在于创建这张表
function createMap(pattern) {
    
    
    let len = pattern.length;
    let map = [...Array(len)].map(() => 0); //初始化map;
    let q = 1; //指针
    let k = 0; //指针
    for (; q < pattern.length; q++) {
    
    
        while (k > 0 && pattern[q] !== pattern[k]) {
    
     
            k = map[k - 1];
        }
        if (pattern[q] === pattern[k]) {
    
    //相等最长公共+1
            k++;
        }
        map[q] = k;
    }
    return map
}

Guess you like

Origin blog.csdn.net/weixin_38616850/article/details/106933063