Original blog post address: https://www.cnblogs.com/realcare/p/6028622.html

Getting Started with Regular Expressions

First, a brief introduction to regular expressions:

When writing programs or web pages that process strings, there is often a need to find strings that conform to some complex rules. Regular expressions are the tools used to describe these rules. In other words, a regular expression is the code that records the rules of text.

Let's take a look at what the messy characters in the regular expression mean:

1. Commonly used metacharacters

code	illustrate
.	matches any character except newline
\w	Match letters or numbers or underscores or Chinese characters
\s	matches any whitespace
\d	match numbers
\b	match the beginning or end of a word
^	matches the beginning of the string
$	Matches the end of the string

So let's try to understand:

\bhello\b actually looks for the word hello - first the beginning of a word ( \b ), then the string hello , and finally the end of the word ( \b).

010-\d\d\d\d\d\d\d\d For example, a landline phone in Beijing - first 010-, then 8 digits (\d).

^\d{18}$ For example, the ID number - first the string starts (^), then the 18-digit number (\d), and finally the string ends ($).

2. Commonly used qualifiers

code	illustrate
*	Repeat zero or more times
+	repeat one or more times
?	repeat zero or one time
{n}	repeat n times
{n, }	Repeat n or more times
{n,m}	Repeat n to m times

\ba\w*\b matches words starting with the letter a - the beginning of a word ( \b ), then the letter a, then any number of letters or numbers ( \w*), and finally the end of the word place ( \b).

windows\d+ matches one or more digits at the beginning of windows, windows7, windows10, etc. - \d+ matches one or more digits.

010-\d{8} also matches Beijing landline, which is the same as the above 010-\d\d\d\d\d\d\d\d, which is simpler - \d{8} is a continuous match The meaning of the number eight times.

3. Commonly used antonym codes

code	illustrate
\W	Match any character that is not a letter or a number or an underscore or a Chinese character
\S	matches any non-whitespace character
\D	matches any non-numeric character
\B	matches where a non-word begins or ends
[^x]	matches any character except x
[^aeiou]	matches any character except vowels

"s[^"]+" matches a string starting with s enclosed in quotes.

4. Commonly used grouping syntax

code	illustrate
(exp)	Match exp, and capture text into automatically named groups
(?<name>exp)	Match exp, and capture the text into the group named name, can also be written as (?'name'exp)
(?:exp)	Match exp, do not capture the matched text, and do not assign a group number to this group
(?=exp)	matches the position before exp
(?<=exp)	matches the position after exp
(?!exp)	matches a position that is not followed by exp
(?<!exp)	matches a position not preceded by exp

\b\w*h(?!e)\w*\b This is a bit more complicated, but with the help of the above table, it should be able to read it. The following is a detailed analysis - the beginning of the word (\b); then followed by zero or more letters (\w*), because they are words, they can only be letters; then the letter h, followed by a character other than e (?!e), and then zero or more letters ( \w*) until the end of the word (\b). Then we will be clear, that is, to find "words containing the letter h, but not e after the h", such as him, honey. And exclude words such as hello and help.

5. Lazy Qualifiers

code	illustrate
*?	Repeat any number of times, but as little as possible
+?	1 or more repetitions, but as few repetitions as possible
??	0 or 1 repetitions, but as few repetitions as possible
{n,m}?	Repeat n to m times, but as few as possible
{n, }?	Repeat more than n times, but as little as possible

When a regular expression contains repeatable qualifiers, the usual behavior is to match as many characters as possible. For example: a.*b It will match the longest string starting with a and ending with b. If you use it to search for aabab, it will match the entire string aabab. This is called a greedy match; but searching with a.*?b will match aab (first to third characters) and ab (fourth to fifth characters), which is called lazy matching .

6. Other commonly used symbols

code	illustrate
\.	Metacharacter escapes. . is a meta character, you have no way to match it, because it will be understood as something else, then you have to use \ to cancel the special meaning of these characters, that is, \.. Likewise, other metacharacters like *?+ etc. need to be escaped.
[]	character set. For example [0-9] matches the numbers 0 to 9, which is equivalent to \d; [az] matches lowercase letters; [.?!] matches punctuation .?!
()	grouping. Each group automatically has a group number, from left to right, the group number of the first group is 1, the second is 2, and so on. (\d{1,3}\.){3}\d{1,3} is a simple IP address matching expression - \d{1,3} matches 1 to 3 digits, (\d {1,3}\.){3} matches three digits plus an English period (this whole is the group) repeated 3 times, and finally adds a one to three digits ( \d{1,3 }). \b(\w+)\b\s+\1\b can be used to match repeated words, like go go - first a word that contains one or more letters \b(\w+)\b that The word is captured in group number 1, followed by 1 or more whitespace characters ( \s+), and finally the content captured in group 1 (that is, the word that matched earlier) ( \1), the end of the word (\b).
\|	branch. ^\d{17}(\d\|[xX])$ can be used to verify whether it is an ID number - the string starts (^), then matches 17 digits (\d{17}), then a number (\ d) or (\|) letter x or X ([xX]), end of string ($).
//i	one match. Regular expression literal. See below for examples.
//g	global match. Regular expression literal. See below for examples.

The following is the usage of //i and //g, we deepen our understanding from a piece of code:

1 <html>
 2 <body>
 3
 4 <script type="text/javascript">
 5
 6      var str="Welcome to Microsoft! ";
 7      str=str + "We are proud to announce that Microsoft has ";
 8      str=str + "one of the largest Web Developers sites in the world.";
 9      document.write(str.replace(/Microsoft/i, "W3School"));
10
11 </script>
12
13 </body>
14 </html>

The above code is to replace Microsoft in the string with W3School. When the regular expression is /Microsoft/i, the result is: Welcome to W3School! We are proud to announce that Microsoft has one of the largest Web Developers sites in the world. It can be seen that only the first Microsoft is replaced, i.e. a match .

We change the regular expression /Microsoft/i to /Microsoft/g, and the result becomes: Welcome to W3School! We are proud to announce that W3School has one of the largest Web Developers sites in the world. Microsoft places are replaced with W3School, that is, global matching .

First, a brief introduction to regular expressions:

Let's take a look at what the messy characters in the regular expression mean:

1. Commonly used metacharacters

code	illustrate
.	matches any character except newline
\w	Match letters or numbers or underscores or Chinese characters
\s	matches any whitespace
\d	match numbers
\b	match the beginning or end of a word
^	matches the beginning of the string
$	Matches the end of the string

So let's try to understand:

\bhello\b actually looks for the word hello - first the beginning of a word ( \b ), then the string hello , and finally the end of the word ( \b).

010-\d\d\d\d\d\d\d\d For example, a landline phone in Beijing - first 010-, then 8 digits (\d).

^\d{18}$ For example, the ID number - first the string starts (^), then the 18-digit number (\d), and finally the string ends ($).

2. Commonly used qualifiers

code	illustrate
*	Repeat zero or more times
+	repeat one or more times
?	repeat zero or one time
{n}	repeat n times
{n, }	Repeat n or more times
{n,m}	Repeat n to m times

\ba\w*\b matches words starting with the letter a - the beginning of a word ( \b ), then the letter a, then any number of letters or numbers ( \w*), and finally the end of the word place ( \b).

windows\d+ matches one or more digits at the beginning of windows, windows7, windows10, etc. - \d+ matches one or more digits.

010-\d{8} also matches Beijing landline, which is the same as the above 010-\d\d\d\d\d\d\d\d, which is simpler - \d{8} is a continuous match The meaning of the number eight times.

3. Commonly used antonym codes

code	illustrate
\W	Match any character that is not a letter or a number or an underscore or a Chinese character
\S	matches any non-whitespace character
\D	matches any non-numeric character
\B	matches where a non-word begins or ends
[^x]	matches any character except x
[^aeiou]	matches any character except vowels

"s[^"]+" matches a string starting with s enclosed in quotes.

4. Commonly used grouping syntax

code	illustrate
(exp)	Match exp, and capture text into automatically named groups
(?<name>exp)	Match exp, and capture the text into the group named name, can also be written as (?'name'exp)
(?:exp)	Match exp, do not capture the matched text, and do not assign a group number to this group
(?=exp)	matches the position before exp
(?<=exp)	matches the position after exp
(?!exp)	matches a position that is not followed by exp
(?<!exp)	matches a position not preceded by exp

5. Lazy Qualifiers

code	illustrate
*?	Repeat any number of times, but as little as possible
+?	1 or more repetitions, but as few repetitions as possible
??	0 or 1 repetitions, but as few repetitions as possible
{n,m}?	Repeat n to m times, but as few as possible
{n, }?	Repeat more than n times, but as little as possible

6. Other commonly used symbols

code	illustrate
\.	Metacharacter escapes. . is a meta character, you have no way to match it, because it will be understood as something else, then you have to use \ to cancel the special meaning of these characters, that is, \.. Likewise, other metacharacters like *?+ etc. need to be escaped.
[]	character set. For example [0-9] matches the numbers 0 to 9, which is equivalent to \d; [az] matches lowercase letters; [.?!] matches punctuation .?!
()	grouping. Each group automatically has a group number, from left to right, the group number of the first group is 1, the second is 2, and so on. (\d{1,3}\.){3}\d{1,3} is a simple IP address matching expression - \d{1,3} matches 1 to 3 digits, (\d {1,3}\.){3} matches three digits plus an English period (this whole is the group) repeated 3 times, and finally adds a one to three digits ( \d{1,3 }). \b(\w+)\b\s+\1\b can be used to match repeated words, like go go - first a word that contains one or more letters \b(\w+)\b that The word is captured in group number 1, followed by 1 or more whitespace characters ( \s+), and finally the content captured in group 1 (that is, the word that matched earlier) ( \1), the end of the word (\b).
\|	branch. ^\d{17}(\d\|[xX])$ can be used to verify whether it is an ID number - the string starts (^), then matches 17 digits (\d{17}), then a number (\ d) or (\|) letter x or X ([xX]), end of string ($).
//i	one match. Regular expression literal. See below for examples.
//g	global match. Regular expression literal. See below for examples.

The following is the usage of //i and //g, we deepen our understanding from a piece of code:

1 <html>
 2 <body>
 3
 4 <script type="text/javascript">
 5
 6      var str="Welcome to Microsoft! ";
 7      str=str + "We are proud to announce that Microsoft has ";
 8      str=str + "one of the largest Web Developers sites in the world.";
 9      document.write(str.replace(/Microsoft/i, "W3School"));
10
11 </script>
12
13 </body>
14 </html>

Turn! ! Regular ExpressionsLearn Regular Expressions Quick Start

Getting Started with Regular Expressions

1. Commonly used metacharacters

2. Commonly used qualifiers

3. Commonly used antonym codes

4. Commonly used grouping syntax

5. Lazy Qualifiers

6. Other commonly used symbols

1. Commonly used metacharacters

2. Commonly used qualifiers

3. Commonly used antonym codes

4. Commonly used grouping syntax

5. Lazy Qualifiers

6. Other commonly used symbols

Guess you like