Article Directory
concept
Match the position of the b string in the a string
Comparison of various algorithms
Naive algorithm
As shown in the figure above, the idea is that the sliding window treats str2 as a skateboard and slides from left to right to compare each string in turn
javascript implementation
/**
* @param {string} haystack
* @param {string} needle
* @return {number}
*/
var strStr = function(haystack, needle) {
if (needle === '') return 0;
let l = haystack.length;
let n = needle.length;
for (let i = 0; i < l - n + 1; i++) {
if (haystack.substring(i, i + n) === needle) {
return i;
}
}
return -1;
};
Rabin-Karp algorithm
Horner's law
The idea of the algorithm is like this
- We can convert a string into a decimal number. They are all
基数d
represented by a specific number, but the converted number may be too large, so we质数
perform求模
operations on one - The sliding window (refer to the naive algorithm above) compares the modulus value of each substring----->Because Horner's law actually the next value can be calculated based on the previous value o(1)
- When the modulus is the same, because there is a false hit point, a congruent comparison is required
Why choose prime numbers?
If the modulus is a prime number so that 10q exactly meets the word length of a computer, then single-precision arithmetic operations can be used to perform the necessary operations
javascript implementation
var strStr = function (haystack, needle) {
const Q = 101;//素数
const D = 256;//基数
const N = haystack.length;
const M = needle.length;
let hashHaystackt = 0; //初始化
let hashNeedle = 0;
let h = 1;
for (let i = 0; i < (M - 1); i++) {
h = (h * D) % Q;//计算 d的m-1次方 mod q的值
for (let i = 0; i < M; i++) {
hashNeedle = (D * hashNeedle + needle[i].charCodeAt(0)) % Q;
hashHaystackt = (D * hashHaystackt + haystack[i].charCodeAt(0)) % Q;
}
for (let i = 0; i <= N - M; i++) {
if (hashNeedle === hashHaystackt) {
if (haystack.substring(i - M, i) === needle) {
return i - M
}
}
if (i < N - M) {
//计算下一个hashHaystackt
hashHaystackt = (D * (hashHaystackt - haystack[i].charCodeAt(0) * h) + haystack[i + M].charCodeAt(0)) % Q;
if (hashHaystackt < 0) {
hashHaystackt += Q;
}
}
}
}
};
Finite automata
Finite automata, also known as sequential machine, is an abstract mathematical model of a finite discrete digital system. A finite automaton M is given by a five-tuple (X, Y, S, δ, λ), where X, Y and S are all non-empty finite sets, which are called the input set, output set and state set of M respectively; δ It is the mapping from Cartesian product set S×X to S, called the next state function of M; λ is the single-valued mapping from S×X to Y, called the output function of M. When δ is single-valued mapping, M is called deterministic finite automata; when δ is multi-valued mapping, M is called non-deterministic finite automata. Finite automata has three functions: as a sequence converter, transforming the input sequence into an output sequence; as a sequence recognizer, identifying whether the input sequence has a certain property; as a sequence generator, generating a sequence with the required properties
This large number of definitions is difficult to understand. It can be seen with the topic of this leetcode. The simple understanding is that each input jumps between different states.
In fact, I personally feel that string matching is not a good algorithm because the generation map表
requires a lot of calculations.
Among them, it
∑
is a finite alphabet, and each element of it is called an input symbol; it
S
is a finite set of states. An element is called a state; it
f
is a transition function, which defines a single-valued mapping from the above, that is, indicates that the current state is p. When the input symbol is a, it transitions to the next state q, which is called p Successor state;
s0
is a unique initial state;
Z
is a set of terminal states.
At each step of the state transition, according to the current state of the finite automata and the input symbols it faces, the next state of the finite automata can be uniquely determined, that is, the value of the transfer function is unique, which is reflected in the state transition diagram Above, that is, if, then there are n outgoing edges of any node, and the marks on these outgoing edges are not the same. This is why we call the finite automata defined in the above way a definite finite automata
Fake code
javascript implementation
The implementation of creating a state map table is very simple with this
function createStatusMap(needle) {
// let wordmap = 'abcdefghijklmnopqrstuvwxyz';
let wordmap = 'abc';
let len = needle.length;
let wordmapLength = wordmap.length;
let map = [];
for (let i = 0; i < len + 1; i++) {
let template = needle.substr(0, i); // i 状态下已存在字符
map.push([]);
for (let j = 0; j < wordmapLength; j++) {
//遍历输入下一个字符
let status = i;
let output = template + wordmap[j] //组成新字符串
let k = template.length;
for (let q = 0; q < k + 1; q++) {
//循环匹配 output的后缀 和 needle的前缀 改变成相应的状态
if (output.substring(q, k + 2) == needle.substring(0, k + 1 - q)) {
status = i + 1 - q;
break
}else{
status = 0
}
}
map[i][j] = status; //输出到map表
}
}
return map
}
console.log(createStatusMap('ababaca'))
Knuth-Morris-Pratt algorithm
Suppose a text aaaabaac (text) aaab(pattern)
to be matched according to normal matching. If we match the fourth position aaaa aaab obviously does not match, but because it has already matched three digits
, 指针
both text and pattern need to be rolled back Three. This processing is time-consuming in some cases.
So we can pre-compute a table so
在已经匹配n位下时 只需要重新匹配pattern的某一段即可满足要求
that we can know directly so that our text-side pointer does not have to roll back the pattern or rematch every time.
This table means thatWhen you match i+1 at the position where i is mismatched (that is, it does not match), there must be π characters that match.
For example, for example
ababa
, the suffixes that have matched i = 5 ababa and ababaca have at least three yes ababaca
prefixes, which means that at least three characters have been matched.
javascript implementation
//核心就在于创建这张表
function createMap(pattern) {
let len = pattern.length;
let map = [...Array(len)].map(() => 0); //初始化map;
let q = 1; //指针
let k = 0; //指针
for (; q < pattern.length; q++) {
while (k > 0 && pattern[q] !== pattern[k]) {
k = map[k - 1];
}
if (pattern[q] === pattern[k]) {
//相等最长公共+1
k++;
}
map[q] = k;
}
return map
}