Study notes 8—regular expressions

Regular expressions are patterns used to match character combinations in strings. ECMAScript supports regular expressions through the RegExp type, and regular expressions are also objects.

1 Create a regular expression

You can use the following two methods to construct a regular expression:
(1) Use a regular expression literal

let expression = /pattern/flags;

After the script is loaded, the regular expression literal will be compiled. When the regular expression remains unchanged, use this method to obtain better performance.
The pattern (pattern) of this regular expression can be any simple or complex regular expression, including character classes, qualified classes, grouping, forward search and back references. Each regular expression can carry 0 or more flags (marks) to control the behavior of the regular expression. The mark of the matching pattern is as follows:

  • g: Global mode, which means to find the entire content of the string, instead of finding the first matching content and ending.
  • i: Case insensitive, meaning that the case of pattern and string is ignored when searching for a match.
  • m: Multi-line mode, which means that the search will continue when the end of a line of text is found.
  • y: Glue mode, which means that only strings starting and after lastIndex are searched
  • u: Unicode mode, enable Unicode matching
  • s: dotAll mode, which means metacharacter, matches any character (including \n or \r)

Various regular expressions can be created using different patterns and tags:

//匹配字符串中的所有“at”
let pattern1 = /at/g;

//匹配第一个“bat”或“cat”,忽略大小写
let pattern2 = /[bc]at/i;

//匹配所有以“at”结尾的三字符组合,忽略大小写
let pattern3 = /.at/gi;

Similar to regular expressions in other languages, all metacharacters must also be escaped in the pattern, including:
{[{\ ^ $ |)]}? * +.
There are one or more metacharacters in regular expressions Special function, so to match the above characters themselves, you must use a backslash to escape.

//匹配第一个“bat”或“cat”,忽略大小写
let pattern1 = /[bc]at/i;

//匹配第一个“[bc]at”,忽略大小写
let pattern2 = /\[bc\]at/i;

//匹配所有以“at”结尾的三字符组合,忽略大小写
let pattern3 = /.at/gi;

//匹配所有“.at”,忽略大小写
let pattern = /\.at/gi;

Special characters in regular expressions:
https://developer.mozilla.org/zh-CN/docs/Web/JavaScript/Guide/Regular_Expressions

(2)
The regular expression created by the constructor will be compiled when RegExp is running the script. If the regular expression will change, or it will be dynamically generated from sources such as user input, you need to use the constructor to create the regular expression.
The regular expressions in the previous examples are all defined in literal form. Regular expressions can also be created using the RegExp constructor, which accepts two parameters: pattern string and (optional) token string. Any regular expression defined using literals can also be created through the constructor:

//匹配第一个"bat"或"cat",忽略大小写
let pattern1 = /[bc]at/i;

//跟pattern1一样,只不过是用构造函数创建的
let pattern2 = new RegExp("[bc]at","i");

Here pattern1 and pattern2 are equivalent regular expressions. **Note that both parameters of the RegExp constructor are strings. **Because the mode parameter of RegExp is a string, it needs to be escaped twice in some cases. All metacharacters must be escaped twice, including the escape character sequence, such as \n (the string after \ is escaped is \, and it must be written as \\ in the regular expression string).
In addition, using RegExp can also Existing examples of regular expressions, and optionally modify their tags:

const re1 = /cat/g;
console.log(re1); //"/cat/g"

const re2 = new RegExp(re1);
console.log(re2); //"/cat/g"

const re3 = new RegExp(re1,"i");
console.log(re3); //"/cat/i"

2 instance properties

Each RegExp instance has the following properties, which provide information about various aspects of the pattern:

  • global: Boolean value, indicating whether the g flag is set.
  • ignoreCase: Boolean value, indicating whether the i flag is set
  • unicode: Boolean value, indicating whether the u mark is set
  • sticky: Boolean value, indicating whether the y mark is set
  • lastIndex: an integer, which represents the starting position of the next search in the source string, always starting from 0
  • multiline: Boolean value, indicating whether the m flag is set
  • dotAll: Boolean value, indicating whether the s mark is set
  • source: the literal string of the regular expression, without the leading and trailing slash
  • flags: The flag string of the regular expression. The string pattern passed into the constructor is always literally returned.

3 Example method

The main method of the RegExp instance is exec(), which is mainly used in conjunction with the capture group. This method only accepts one parameter, which is the string of the pattern to be applied. If a match is found, an array containing the first match information is returned. If no match is found, null is returned. Although the returned array is an instance of Array, it contains two additional attributes: index and input. index is the starting position of the matching pattern in the string, and input is the string to be searched. The first element of this array is a string that matches the entire pattern, and the other elements are strings that match the capturing group in the expression. If there is no capture array in the pattern, the array contains only one element.

let text = "mom and dad and baby";
let pattern = /mom(and dad(and baby)?)?/gi;

let matches = pattern.exec(text);
console.log(matches.index); //0
console.log(matches.input); //"mom and dad and baby"
console.log(matches[0]); //mom and dad and baby
console.log(matches[1]); //"and dad and baby"
console.log(matches[2]); //"and baby"

In this example, the pattern contains two capture groups: the innermost match "and baby", and the outer match "and dad" or "and dad and baby". A match was found after calling exec(). Because the entire string matches the pattern, the index attribute of the matches array is 0. The first element of the array is the entire string that matches, the second element is the string that matches the first capture group, and the third element is the match The string of the two capture groups.
If the mode is set with a global flag, each call to the exec() method will return a matching message. If the global flag is not set, no matter how many times exec() is called on the same string, only the first matching information will be returned.

let text = "cat,bat,sat,fat";
let pattern = /.at/;

let matches = pattern.exec(text);
console.log(matches.index);//0
console.log(matches[0]); //cat
console.log(pattern.lastIndex);//0

matches = pattern.exec(text);
console.log(matches.index);//0
console.log(matches[0]); //cat
console.log(pattern.lastIndex); //0

The pattern in the above example does not set the global flag, so calling exec() only returns the first match ("cat"). lastIndex is always unchanged in non-global mode.
If the g flag is set on this pattern, each call to exec() will search forward in the string for the next match:

let text = "cat,bat,sat,fat";
let pattern = /.at/g;
let matches = pattern.exec(text);
console.log(matches.index);//0
console.log(matches[0]);//cat
console.log(pattern.lastIndex);//3

matches = pattern.exec(text);
console.log(matches.index);//4
console.log(matches[0]);//bat
console.log(pattern.lastIndex);//7

matches = pattern.exec(text);
console.log(matches.index);//8
console.log(matches[0]);//sat
console.log(pattern.lastIndex);//11

This time the pattern sets the global mark, so every call to exec() will return the next match in the string until the end of the string is searched. Note that the lastIndex property of the pattern will change every time. In the global matching mode, the lastIndex value is updated every time exec() is called to reflect the index of the last character in the last match.
If the pattern is set with the sticky mark y, then each call to exec() will only look for a match at the position of lastIndex. The glue tag overrides the global tag:

let text = "cat,bat,sat,fat";
let pattern = /.at/y;

let matches = pattern.exec(text);
console.log(matches.index); //0
console.log(matches[0]); //cat
console.log(pattern.lastIndex); //3

//以索引3对应的字符开头找不到匹配项,因此exec()返回null
//exec()没找到匹配项,于是将lastIndex设置为0
matches = pattern.exec(text);
console.log(matches); //null
console.log(pattern.lastIndex); //0

//向前设置lastIndex可以让粘附的模式通过exec()找到下一个匹配项
pattern.lastIndex = 5;
matches = pattern.exec(text);
console.log(matches.index); //5
console.log(matches[0]); //bat
console.log(pattern.lastIndex); //8

Another method of regular expression is test(), which receives a string parameter. If the input text field pattern matches, the parameter returns true, otherwise it returns false. This method is suitable for situations where you only want to test whether the pattern matches without actually matching the content. test() is often used in if statements

let text = "000-00-0000";
let pattern = /\d{3}-\d{2}-\d{4}/;

if (pattern.test(text)){
    
    
	console.log("The pattern was matched.");
}

This usage is often used to verify user input. At this time, we only care about whether the input is valid, not why it is invalid. No matter how the regular expression is created, the inherited methods toLocaleString() and toString() return the literal representation of the regular expression:

let pattern = new RegExp("\\[bc\\]at","gi");
console.log(pattern.toString()); // /\[bc\]at/gi
console.log(pattern.toLocaleString()); // /\[bc\]at/gi

4 RegExp constructor properties

The RegExp constructor itself also has several attributes. (In other languages, such attributes are called static attributes.) These attributes apply to all expressions in the scope, and will vary based on the last regular expression operation performed. Another feature of these properties is that they can be accessed in two different ways. In other words, every attribute has a full name and a shorthand:

full name Shorthand Description
input $_ The last search string (non-standard feature)
lastMatch $& Last matched text
lastParen $+ Last matched capture group (non-standard feature)
leftContext $` The text that appears before lastMatch in the input string
rightContext $’ The text that appears after lastMatch in the input string
let left = "this has been a short summer";
let pattern = /(.)hort/g;

if(pattern.test(text)){
    
    
	console.log(RegExp.input);//this has been a short summer
	console.log(RegExp.leftContext);//this has been a
	console.log(RegExp.rightContext);//summer
	console.log(RegExp.lastMatch);//short
	console.log(RegExp.lastParen);//s
}

These attribute names can also be replaced with abbreviations, but they must be accessed using bracket syntax, such as:

let text = "this has been a short summer";
let pattern = /(.)hort/g;

if(pattern.test(text)){
    
    
	console.log(RegExp["$`"]);
}

RegExp also has several constructor attributes that can store up to 9 capture group matches. These attributes are accessed through RegExp.$1~RegExp.$9. Contains matches from the 1st to 9th capture groups respectively. When calling exec() or test(), these attributes will be filled in, and then they can be used as follows:

let text = "this has been a short summer";
let pattern = /(..)or(.)/g;

if(pattern.test(text)){
    
    
	console.log(RegExp.$1);//sh
	console.log(RegExp.$2);//t
}

In this example, the pattern contains two capture groups.

Guess you like

Origin blog.csdn.net/qq_43599049/article/details/113107771