Regular objects and attributes

Regular object

Regular objects are used to match text with a pattern.

How to create a regular object

  1. Literal form: /surrounded by slashes .
  2. Constructor: new RegExp()create.

For example, the following three types will create regular objects with the same effect

/ab+c/i;
new RegExp('ab+c', 'i');
new RegExp(/ab+c/, 'i');

The difference is: when the expression is assigned, the literal form provides the compilation status of the regular expression, while the regular expression object's constructor, such as new RegExp('ab+c'), provides the regular expression runtime compilation

RegExp() constructor

grammar:

new RegExp(pattern[, flags])
RegExp(pattern[, flags])

The first parameter pattern can be a string or a regular object, and the second parameter flag is a flag. If the first parameter is a regular object, then the flag will replace the previous one. If the second parameter is not passed, then the previous The flag will remain, the optional flag is in the flag search section below

Properties of regular objects

  • lastIndex
  • flags
  • dotAll
  • global
  • ignoreCase
  • multiline
  • source
  • sticky
  • unicode

lastIndex

lastIndex is a readable and writable integer attribute of regular expressions, used to specify the starting index of the next match.

This attribute will work only when the regular expression uses the "g" or "y" flags that represent global search. The changed rules are as follows:

  • If lastIndex is greater than the length of the string, regexp.test and regexp.exec will fail to match, and lastIndex will be set to 0.
  • If lastIndex is equal to the length of the string, and the regular expression matches an empty string, then the regular expression matches the string starting from lastIndex.
  • If lastIndex is equal to the length of the string, and the regular expression does not match an empty string, then the regular expression does not match the string and lastIndex is set to 0.
  • Otherwise, lastIndex is set to the position immediately following the last successful match.

Regular object method

exec()

The exec() method performs a search match in a specified string (it can also be a regular object, that is, treat it as a string including flag bits). Return a result array or null.

In the set globalor stickythe case where the flag (e.g., / foo / g or / foo / y), JavaScript RegExp objects are stateful. The position after the last successful match they will record in the lastIndexproperty. Using this feature, it exec()can be used to traverse multiple matching results in a single string one by one (including captured matches). In contrast, String.prototype.match()only the matched results are returned.

If the match is successful, the exec() method returns an array and updates the lastIndex property of the regular expression object. The text that is fully matched will be the first item in the returned array. Starting from the second item, each subsequent item corresponds to the text that was successfully matched in the capture brackets in the regular expression.

If the match fails, the exec() method returns null and resets lastIndex to 0.

The following example shows the properties of the returned array

// Match "quick brown" followed by "jumps", ignoring characters in between
// Remember "brown" and "jumps"
// Ignore case
var re = /quick\s(brown).+?(jumps)/ig;
var result = re.exec('The Quick Brown Fox Jumps Over The Lazy Dog');
Object Attribute/index description example
result [0] All matched strings Quick Brown Fox Jumps
[1], …[n ] Group capture in brackets [1] = Brown [2] = Jumps
index The matched character is at the 0-based index value of the original string 4
input Raw string The Quick Brown Fox Jumps Over The Lazy Dog

test()

The test() method performs a search to check whether the regular expression matches the specified string (the same parameter as exec can also be a regular object). Returns true or false. When you want to know whether a pattern exists in a string, you can use test() (similar to the String.prototype.search() method), the difference is that test returns a boolean value, and search returns the index (if found) Or -1 (if not found)

The regular expression sets the global flag, and the execution of test() will change the lastIndex property of the regular expression.

Call the method corresponding to String

[@@match](), [@@matchAll](), [@@replace](), [@@search](), [@@split]()Methods of String.prototype these methods are invoked several internal correspondence, but this is a different order of parameters and, at the same time RegExp subclass can override these methods to modify the default behavior.

match()

The match() method retrieves and returns the result of a string matching a regular expression.

  • grammar:
str.match(regexp)
  • parameter:

regexp: a regular expression object. If you pass in a non-regular expression object, it will be implicitly converted to a RegExp using new RegExp(obj). If you don't give any parameters and use the match() method directly, you will get an Array containing empty strings: [""].

  • return value:

    • If the g flag is used, all results matching the full regular expression will be returned, but the capturing group will not be returned.
    • If the g flag is not used, only the first complete match and its related capture group (Array) are returned. In this case, the returned item will have other attributes as described below.
  • Additional attributes

    • groups: An array of capture groups or undefined (if no named capture groups are defined).
    • index: the starting position of the matched result
    • input: The search string.

An Array whose content depends on the existence of the global (g) flag, and null if no match is found.

If the regular expression does not contain the g flag, str.match() will return the same result as RegExp.exec().

matchAll

matchAll () method returns a regular expression matched all of the results and the packet capture group iterator .

  • grammar:
str.match(regexp)
  • parameter:

regexp: Regular expression object. If the passed parameter is not a regular expression object, it will be implicitly converted to a RegExp using new RegExp(obj). Note: RegExp must be in the form of setting the global mode g, otherwise it will throw an exception TypeError.

  • return value:

An iterator (not reusable, if the result is exhausted, you need to call the method again to obtain a new iterator).

Before matchAll appears, call regexp.exec() in the loop to get all the match information (regexp needs to use the /g flag), if you use matchAll, you don’t need to use the while loop plus exec (and the regular expression needs to use /g flag). Use matchAll to get the return value of an iterator. With for...of, array spread, or Array.from(), it can be more convenient to implement the function. If there is no /g flag, matchAll will throw an exception and will not change the lastIndex.

serach()

The search() method performs a search match between the regular expression and the String object. If the match is successful, search() returns the index of the first match of the regular expression in the string; otherwise, it returns -1.

replace()

The replace() method returns a new string after replacing some or all of the matching items of the pattern with the replacement value (replacement). The pattern can be a string or a regular expression, and the replacement value can be a string or a callback function that is called every time it matches. If pattern is a string, only the first match will be replaced, and the original string will not be changed.

  • grammar:
str.replace(regexp|substr, newSubStr|function)
  • parameter:

    • regexp (pattern): A RegExp object or its literal. The content matched by the regularity will be replaced by the return value of the second parameter.
    • substr (pattern): A string to be replaced by newSubStr. It is treated as a whole string, not a regular expression. Only the first match will be replaced.
    • newSubStr (replacement): A string used to replace the matching part of the first parameter in the original string. Some special variable names can be interpolated in this string.
    • function (replacement): A function used to create a new substring. The return value of the function will replace the result matched by the first parameter.

Using a string as a parameter, the replacement string can be inserted into the following special variable names (in the case of matching substrings):

variable name Represented value
$$ Insert a "$".
$& Insert the matching substring.
$` Insert the content to the left of the currently matched substring.
$’ Insert the content to the right of the currently matched substring.
$n If the first parameter is a RegExp object, and n is a non-negative integer less than 100, insert the n-th bracket matching string. Tip: Index starts from 1

Specify a function as a parameter, and the return value of the function as a replacement string. (Note: The special substitution parameters mentioned above cannot be used here.) Another thing to note is that if the first parameter is a regular expression and it is a global matching pattern, then this method will be called multiple times, each All matches will be called.

The following are the parameters of the function:

variable name Represented value
match The matched substring. (Corresponds to the above $&.)
p1,p2, … If the first parameter of the replace() method is a RegExp object, it represents the string matched by the nth bracket. (Corresponding to the above $1, $2, etc.) For example, if you use /(\a+)(\b+)/ to match, p1 is the matched \a+, and p2 is the matched \b+.
offset The offset of the matched substring in the original string. (For example, if the original string is'abcd' and the matched substring is'bc', then this parameter will be 1)
string The original string to be matched.
NamedCaptureGroup Name the objects matched by the capture group

The exact number of parameters depends on whether the first parameter of replace() is a regular expression (RegExp) object, and how many parenthesis substrings are specified in the regular expression. If the regular expression uses named capture , Will also add a named captured object

split()

The split() method uses the specified separator string to split a String object into an array of substrings, and a specified split string determines the position of each split.

grammar:

str.split([separator[, limit]])

parameter:

  • separator: Specify a string representing the point at which each split should occur. separator can be a string or regular expression.
  • limit: an integer that limits the number of fragments returned. When this parameter is provided, the split method will split the string at each occurrence of the specified separator, but stop when the restriction entry has been placed in the array. If the end of the string is reached before the specified limit is reached, it may still contain fewer entries than the limit. The remaining text is not returned in the new array.

Affirmation Assertions

One of the components of the assertion is the boundary. For text, words, or patterns, boundaries can be used to indicate their starting or ending parts, which are divided into boundary assertions and other assertions. Boundary class have asserted ^, $, \b, \B, there are other assertions x(?=y), x(?!y), (?<=y)x, (?<!y)x.

Special characters in regular expressions

character meaning
\ The backslash before the non-special character indicates that the next character is a special character, which cannot be understood literally. It will no longer match any characters, but represents a character boundary. The backslash before the special character indicates that the next character is not a special character and should be understood literally, that is, escaped.
^ Match the beginning of the input. If the multiline flag is set to true, then the position immediately following the newline character is also matched. When'^' appears as the first character in a character set pattern, it will have different meanings
$ Match the end of the input. If the multiline flag is set to true, then the position before the newline character is also matched.
* Matches the previous expression 0 or more times. Equivalent to {0,}.
+ Match the previous expression one or more times. Equivalent to {1,}.
? Matches the previous expression 0 or 1 times. Equivalent to {0,1}. If it immediately follows any quantifier *, +,? or {}, it will make the quantifier non-greedy (match as few characters as possible), which is the opposite of the default greedy mode (match as many characters as possible) . For example, using /\d+/ for "123abc" will match "123", while using /\d+?/ will only match "1".
. (Decimal point) By default, it matches any single character except the newline character.
(x) Will match'x' and remember the match. The brackets are called capturing brackets. The'(foo)' and'(bar)' in the pattern /(foo) (bar) \1 \2/ match and remember the first two words in the string "foo bar foo bar". The \1 and \2 in the pattern indicate that the first and second substrings matched by the captured brackets, namely foo and bar, match the last two words in the original string. Note that \1, \2,..., \n are used in the matching part of regular expressions, and in the replacement part of regular expressions, you must use something like $1 , 2,..., 2,...,2 , . . . , N-this syntax
(?:x) Matches'x' but does not remember the match. Such parentheses are called non-capturing parentheses, allowing you to define sub-expressions to be used with regular expression operators.
x(?=y) Matches'x' only when'x' is followed by'y'. This is called an advance assertion.
(?<=y)x Matches'x' only when'x' is preceded by'y'. This is called back-flight assertion.
x(?!y) Matching'x' only when'x' is not followed by'y' is called forward negative search.
(?<!y)x Matching'x' only when'x' is not preceding'y' is called reverse negative search.
x|y Match'x' or'y'.
{n} n is a positive integer that matches the previous character exactly n times.
{n,} n is a positive integer, matching the previous character at least n times.
{n,m} Both n and m are integers. Match the preceding character at least n times and at most m times. If the value of n or m is 0, this value is ignored.
[xyz] A collection of characters. Match any character in square brackets, including escape sequences. You can use a dash (-) to specify a range of characters. Special symbols such as dot (.) and asterisk (*) have no special meaning in a character set.
[^xyz] A reverse character set. It matches any character not contained in square brackets. You can use a dash (-) to specify a range of characters. Any ordinary characters are effective here.
[\b] Match a backspace (U+0008). (Not to be confused with \b.)
\b 匹配一个词的边界。一个词的边界就是一个词不被另外一个“字”字符跟随的位置或者前面跟其他“字”字符的位置,例如在字母和空格之间。注意,匹配中不包括匹配的字边界。换句话说,一个匹配的词的边界的内容的长度是0。
\B 匹配一个非单词边界。匹配如下几种情况:字符串第一个字符为非“字”字符,字符串最后一个字符为非“字”字符,两个单词字符之间,两个非单词字符之间,空字符串
\cX 当X是处于A到Z之间的字符的时候,匹配字符串中的一个控制符。
\d 匹配一个数字。等价于[0-9]。
\D 匹配一个非数字字符。等价于[^0-9]。
\f 匹配一个换页符 (U+000C)。
\n 匹配一个换行符 (U+000A)。
\r 匹配一个回车符 (U+000D)。
\s 匹配一个空白字符,包括空格、制表符、换页符和换行符。等价于[ \f\n\r\t\v\u00a0\u1680\u180e\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]。
\S 匹配一个非空白字符。等价于 [^ \f\n\r\t\v\u00a0\u1680\u180e\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]。
\t 匹配一个水平制表符 (U+0009)。
\v 匹配一个垂直制表符 (U+000B)。
\w 匹配一个单字字符(字母、数字或者下划线)。等价于 [A-Za-z0-9_]。
\W 匹配一个非单字字符。等价于 [^A-Za-z0-9_]。
\n 在正则表达式中,它返回最后的第n个子捕获匹配的子字符串(捕获的数目以左括号计数)。比如 /apple(,)\sorange\1/ 匹配"apple, orange, cherry, peach."中的’apple, orange,’ 。
\0 匹配 NULL(U+0000)字符, 不要在这后面跟其它小数,因为 \0<digits> 是一个八进制转义序列。
\xhh 匹配一个两位十六进制数(\x00-\xFF)表示的字符。
\uhhhh 匹配一个四位十六进制数表示的 UTF-16 代码单元。
\u{hhhh}或\u{hhhhh} (仅当设置了u标志时)匹配一个十六进制数表示的 Unicode 字符。

标志搜索

标志 描述
g 全局搜索
i 不区分大小写搜索
m 多行搜索
s 允许 . 匹配换行符
u 使用unicode码的模式进行匹配
y 执行“粘性(sticky)”搜索,匹配从目标字符串的当前位置开始

注意:m标志用于指定多行输入字符串应该被视为多个行。如果使用m标志,^和$匹配的开始或结束输入字符串中的每一行,而不是整个字符串的开始或结束。

y 标志与 g 标志

标志 y 会进行粘性(sticky)匹配,它与全局匹配标志 g 一样会让引擎在匹配过程执行完毕后更新正则对象lastIndex属性的值,但是加了y之后,正则表达式就成了一个带隐藏属性(也就是lastIndex)的表达式,必须从lastIndex指定的位置匹配,如果不成功就结束语句。

注意:对于replacematch来讲,lastIndex值始终是0

Guess you like

Origin blog.csdn.net/WuLex/article/details/109026877