Regular expression usage, after reading it, you don’t need to go online to find it

I used to use regular expressions when doing internships or writing questions, and I found them directly on the Internet. I studied the system yesterday, and made a record here and said it in my own way to consolidate it. Maybe it can help you too.
Reference link: https://github.com/ziishaned/learn-regex/blob/master/translations/README-cn.md

First of all, regular expressions are used to match strings. The RegExp method is encapsulated in JS for us to use. We only need to write matching rules to match or make judgments on specific required strings.
For example, you can use the replace method of the string to replace the matched string with the desired string, or you can use the regexp.test(str) method to determine whether the string matches the regular expression we wrote.

basic match

Then the most basic regular expression is a straightforward string, such as /test/, which will match test in the string (this is the test).

metacharacter

But the essence of regularization is to use metacharacters to replace ordinary characters, so as to match what we want in long and complex strings. The following table is some metacharacters, they all have special meanings:

metacharacter describe
. A period matches any single character except newline.
* Match >=0 repeated characters before the * sign.
+ Match >=1 repeated characters before the * sign.
? Characters before the mark? are optional.
[ ] Character kind. Matches any character within square brackets.
[^ ] Negated character kind. matches any character except inside square brackets
{num1, num2} Match num1<=x<=num2 characters or character set before curly braces.
( ) character set, match all characters in brackets
^ Match from start (that is, what must start with)
$ Match from the end (what to end with)
| OR operator, matches the characters before or after the symbol
\ Escape character, used to match some reserved characters [ ] ( ) { } . * + ? ^ $ \

Then with these metacharacters, we can write regular expressions, such as writing an expression to match mobile phone numbers, which must start with 13, 18, and 15 11-digit numbers.

const reg = new RegExp(/^1[385][0-9]{9}$/)
const str1 = '13226154267'
const str2 = '14226154267'
console.log(reg.test(str1), reg.test(str2)) // true  false

The above example uses the 4 metacharacters ^$[]{}. ^$ represents that the string can only be a mobile phone number in a specific format. Adding one more digit after the end will not work, because we have limited the length.
1[385] means that the first digit must be 1, and the second digit can only be selected in 385, then the first two digits are determined, and there are only 9 digits left, so continue to give the character set of 0-9 [0 -9], and limit the number of matches to 9 {9}, so that a regular expression that meets the meaning of the question is written.
Of course, there is more than this way of writing. If you can understand the above metacharacters, I believe you can quickly understand this way of writing:

const reg = new RegExp(/^1(3|8|5)[0-9]{9}$/)
或者
const reg = new RegExp(/^(13|18|15)[0-9]{9}$/)

Shorthand character set

Maybe at this time you find that the above writing is a bit redundant, because a number needs to be written [0-9], then if it also contains uppercase and lowercase letters or some symbols, then matching a character will be as long as the following:

const reg = new RegExp(/[0-9a-zA-Z_@]/)

So the next step is to learn some abbreviated character sets, as shown in the following table:

abbreviation describe
. all characters except newline
\w Matches all alphanumerics, equivalent to [a-zA-Z0-9_]
\W Matches all non-alphanumeric, i.e. symbols, equivalent to: [^\w]
\d Match digits: [0-9]
\D Matches non-digits: [^\d]
\s Matches all space characters, equivalent to: [\t\n\f\r\p{Z}]
\S Matches all non-whitespace characters: [^\s]
\f matches a form break
\n matches a newline
\r matches a carriage return
\t matches a tab
\v matches a vertical tab character
\p Matches CR/LF (equivalent to \r\n), used to match DOS line terminators
\b represent word boundaries
\B Indicates a non-word boundary, which should be understood as a (non-word) boundary, not a non-(word boundary), it still matches the boundary

With the shorthand character set, we can also optimize our expressions

const reg = new RegExp(/^1[385]\d{9}$/)

I have learned here that most of the scenarios that need to use regular expressions can be solved, but it is not finished, because the above-mentioned matches are all for the parts that need to be matched (what I am talking about), if you want to match contains (or not contains) a string with a specific prefix or suffix, but if you don’t want to include these prefixes and suffixes, you need to use a zero-width assertion (pre- and post-preview).

Zero-width assertions (before and after lookups)

?=... Positive lookahead assertion
Positive lookahead assertion means that the first part of the expression must be followed by the expression defined by ?=....
?!.. Negative lookahead assertion
Positive lookahead assertion, used to filter all matching results, the filter condition is not followed by the format defined in the assertion.
It may sound a little confusing, but let me give you an example and you will understand. For example, in a string array, I need to match the file name of the compressed package, then the file name format of the compressed package is a name composed of letters, numbers and underscores (cannot start with a number), plus the suffix .tar, . rar, .zip.

const filename = ['demo', '1reg.zip', 're0.txt', 'study1.tar', 'setup.exe']
const reg = new RegExp(/^[a-zA-Z_]\w*(?=.(tar|rar|zip)$)/)
for(const name of filename) {
    
    
	console.log(reg.exec(name)) // 获取匹配的字符串
}
//null
//null
//null
//['study1', 'tar', index: 0, input: 'study1.tar', groups: undefined]0: "study1"1: "tar"groups: undefinedindex: 0input: "study1.tar"length: 2[[Prototype]]: Array(0)...]
//null

The above example uses the positive lookahead assertion, which should be understandable, then the negative lookahead assertion is the same as the following positive post assertion and negative post assertion. It is also worth noting that these assertions need to be written inside ().
?<= … Post-positive assertion
Post-positive assertion is written as (?<=…) and is used to filter all matching results, and the filter condition is preceded by the format defined in the assertion.
?<!.. Negative lookbehind assertion
Negative lookbehind assertion written as (?<!..) is used to filter all matching results, and the filter condition is not followed by the format defined in the assertion.
Another example, for example, if I want to filter out mobile phone numbers in mainland China, they should start with +86, but I only take the main part of the number, not the prefix. Once you know the rules, you can use regular expressions to write them.

const reg = new RegExp(/(?<=^\+86)1[385]\d{9}$/) //注意加号需要转义

the sign

In the above examples, we only verify whether an entire string matches, so what if we want to match all the strings in the string that match the regular expression? Then you need to use flags. The following are the flags of several regular expressions, which are also mode modifiers:

the sign describe
i Case is ignored.
g Global search.
m Multi-line modifier: Anchor metacharacter ^$ Works at the start of each line.

This should be easy to understand, so I won't give an example.

Greedy and lazy matching

The regular expression adopts the greedy matching mode by default, which means that the substring as long as possible will be matched. We can use ? to convert greedy matching mode into lazy matching mode.

const str = 'The fat cat sat on the mat.'
const reg1 = new RegExp(/(.*at)/)
const reg2 = new RegExp(/(.*?at)/)
console.log(reg1.exec(str))
console.log(reg2.exec(str))
//['The fat cat sat on the mat', 'The fat cat sat on the mat', index: 0, input: 'The fat cat sat on the mat.', groups: undefined]
//['The fat', 'The fat', index: 0, input: 'The fat cat sat on the mat.', groups: undefined]

关于正则表达式的内容基本就差不多了,干看着好像挺容易的,但一上手可能就不知道怎么使用或者记不住了,主要还是得多练习!

Guess you like

Origin blog.csdn.net/weixin_45732455/article/details/129814975