Regular expression operator precedence and matching rules | Understand at a glance! ! ! (Four)

Table of contents

1. Regular expression - operator precedence

(1) Regular expressions are calculated from left to right and follow the priority order, which is very similar to arithmetic expressions.

(2) Operations with the same priority are performed from left to right, and operations with different priorities are first high and then low.

(3) The following table illustrates the order of precedence of various regular expression operators from highest to lowest:

 Two, regular expression - matching rules

(1) Basic pattern matching

1. Everything starts from the most basic. Patterns are the most basic elements of regular expressions, and they are a group of characters that describe the characteristics of a string.

2. The pattern can be very simple, consisting of ordinary strings, or very complex, often using special characters to represent a range of characters, repeated occurrences, or to represent context.

3. For example:

(1) The ^ symbol is used to match strings that begin with a given pattern.

(2) The $ symbol is used to match strings that end with a given pattern.

(3) When the characters ^ and $ are used at the same time, it means an exact match (the string is the same as the pattern).

(4) If a pattern does not include ^ and $, then it matches any string containing the pattern.

(5) To detect whether a string starts with a tab, you can use this pattern:

(2) Character clusters

1. In Internet programs, regular expressions are usually used to verify user input.

2. After the user submits a FORM, it is not enough to use ordinary literal characters to determine whether the entered phone number, address, email address, credit card number, etc. are valid.

3. So we need to use a more free way to describe the pattern we want, which is the character cluster.

4. To create a character cluster representing all vowel characters, put all vowel characters in square brackets:

5. A hyphen can be used to indicate the range of a character

 6. If you want to match a string consisting of a lowercase letter and a digit, such as "z2", "t6" or "g7", but not "ab2", "r2d3" or "b52", use this pattern :

 7. When ^ is used in a group of square brackets, it means "not" or "excluded", and is often used to remove a certain character.

8. Examples of excluding specific characters:

9. HP's regular expressions have some built-in general character clusters, the list is as follows:

(3) Determining repeated occurrences

1. A word consists of several letters, and a group of numbers consists of several singular numbers. Curly braces ({}) following a character or cluster of characters are used to determine the number of repetitions of the preceding content.

2. The pattern can be extended to more words or numbers:

3. The special characters? are equal to {0,1}, they all represent: 0 or 1 previous content or the previous content is optional.

 4. The special characters * are equal to {0,}, they both represent 0 or more previous contents.

5. The character + is equal to {1,}, which means one or more previous contents


1. Regular expression - operator precedence

(1) Regular expressions are calculated from left to right and follow the priority order, which is very similar to arithmetic expressions.

(2) Operations with the same priority are performed from left to right, and operations with different priorities are first high and then low.

(3) The following table illustrates the order of precedence of various regular expression operators from highest to lowest:

operator describe
\ Escapes
(), (?:), (?=), [] parentheses and square brackets
*, +, ?, {n}, {n,}, {n,m} qualifier
^, $, \ any metacharacter, any character Anchors and sequences (ie: position and order)
| Substitution, "or" operator
characters have higher precedence than substitution operators, such that "m|food" matches "m" or "food". To match "mood" or "food", use parentheses to create a subexpression, resulting in "(m|f)ood".

 Two, regular expression - matching rules

(1) Basic pattern matching

1. Everything starts from the most basic. Patterns are the most basic elements of regular expressions, and they are a group of characters that describe the characteristics of a string.

2. The pattern can be very simple, consisting of ordinary strings, or very complex, often using special characters to represent a range of characters, repeated occurrences, or to represent context.

3. For example:

(1) The ^ symbol is used to match strings that begin with a given pattern.

^once

 This pattern contains a special character ^, which means that the pattern matches only those strings beginning with once .

For example, the pattern matches the string "once upon a time" , but not "There once was a man from NewYork" .

(2) The $ symbol is used to match strings that end with a given pattern.

bucket$

 This pattern matches "Who kept all of this cash in a bucket" , not "buckets" .

(3) When the characters ^ and $ are used at the same time, it means an exact match (the string is the same as the pattern).

^bucket$

 Matches only the string "bucket" .

(4) If a pattern does not include ^ and $, then it matches any string containing the pattern.

For example:

once

 with the string

There once was a man from NewYork
Who kept all of his cash in a bucket.

 is a match.

The letters (once) in this pattern are literal characters, that is, they represent the letter itself, and the same goes for numbers. Other slightly more complex characters, such as punctuation marks and white characters (spaces, tabs, etc.), use escape sequences. All escape sequences start with a backslash \. The escape sequence for tab is \t.

(5) To detect whether a string starts with a tab, you can use this pattern:

^\t 

 Similarly, use \n for "new line" and \r for carriage return. Other special symbols can be preceded by a backslash. For example, the backslash itself is represented by \\, the period. is represented by \., and so on.

(2) Character clusters

1. In Internet programs, regular expressions are usually used to verify user input.

2. After the user submits a FORM, it is not enough to use ordinary literal characters to determine whether the entered phone number, address, email address, credit card number, etc. are valid.

3. So we need to use a more free way to describe the pattern we want, which is the character cluster.

4. To create a character cluster representing all vowel characters, put all vowel characters in square brackets:

[AaEeIiOoUu]

 This pattern matches any vowel character, but can only represent one character.

5. A hyphen can be used to indicate the range of a character

For example:

[az] // matches all lowercase letters
[AZ] // matches all uppercase letters
[a-zA-Z] // matches all letters
[0-9] // matches all numbers
[0-9\.\-] // matches all numbers, periods and minus signs
[ \f\r\t\n] // match all white characters

 6. If you want to match a string consisting of a lowercase letter and a digit, such as "z2", "t6" or "g7", but not "ab2", "r2d3" or "b52", use this pattern :

^[a-z][0-9]$

Although [az] represents a range of 26 letters, here it can only match strings whose first character is a lowercase letter.

 7. When ^ is used in a group of square brackets, it means " not " or " excluded ", and is often used to remove a certain character.

^[^0-9][0-9]$

 This pattern matches "&5", "g7" and "-2", but not "12", "66".

8. Examples of excluding specific characters:

[^az] //All characters except lowercase letters
[^\\\/\^] //All characters except (\)(/)(^)
[^\"\'] //All characters except double quotes (") and single quotes (')

 The special characters . (dot, period) are used in regular expressions to represent all characters except "new line". So the pattern ^.5$ matches any two-character string that ends with the number 5 and starts with some other non-"new line" character. pattern. Can match any string, except newlines (\n, \r) .

9. HP's regular expressions have some built-in general character clusters, the list is as follows:

character cluster describe
[[:alpha:]] any letter
[[:digit:]] any number
[[:alnum:]] any letters and numbers
[[:space:]] any whitespace characters
[[:upper:]] any capital letter
[[:lower:]] any lowercase letter
[[:point:]] any punctuation marks
[[:xdigit:]] Any hexadecimal number, equivalent to [0-9a-fA-F]

(3) Determining repeated occurrences

1. A word consists of several letters, and a group of numbers consists of several singular numbers. Curly braces ({}) following a character or cluster of characters are used to determine the number of repetitions of the preceding content.

character cluster describe
^[a-zA-Z_]$ all letters and underscores
^[[:alpha:]]{3}$ all 3 letter words
^a$ letter a
^a{4}$ aaaa
^a{2,4}$ aa, aaa or aaaa
^a{1,3}$ a, aa or aaa
^a{2,}$ A string containing more than two a's
^a{2,} Such as: aardvark and aaab, but not apple
a{2,} Such as: baad and aaa, but not Nantucket
\t{2} two tabs
.{2} all two characters

These examples describe three different uses of curly braces. A number {x} means that the previous character or character cluster only appears x times ; a number plus a comma {x,} means that the previous content appears x or more times ; two numbers separated by commas { x,y} means that the previous content appears at least x times, but not more than y times .

2. The pattern can be extended to more words or numbers:

^[a-zA-Z0-9_]{1,}$ // All strings containing more than one letter, number or underscore
^[1-9][0-9]{0,}$ // all positive integers
^\-{0,1}[0-9]{1,}$ // all integers
^[-]?[0-9]+\.?[0-9]+$ // all floating point numbers

 The last example means: start with an optional minus sign ([-]?) (^), followed by 1 or more digits ([0-9]+), followed by a decimal point (\.) 1 or more digits ([0-9]+) followed by nothing else ($).

3. The special characters? are equal to {0,1}, they all represent: 0 or 1 previous content or the previous content is optional .

For example:

^\-?[0-9]{1,}\.?[0-9]{1,}$

 4. The special characters * are equal to {0,}, they both represent 0 or more previous contents .

5. The character + is equal to {1,}, which means one or more previous contents

So the 4 examples above can be written as:

^[a-zA-Z0-9_]+$ // all strings containing more than one letter, number or underscore
^[1-9][0-9]*$ // all positive integers
^\-?[0-9]+$ // all integers
^[-]?[0-9]+(\.[0-9]+)?$ // all floating point numbers

Guess you like

Origin blog.csdn.net/wuds_158/article/details/131544410