The role of parentheses
Classification Code/Syntax Description
Capture
(exp) matches exp, and captures text into an automatically named group
(?<name>exp) matches exp, and captures text into a group named name, can also be written as (?'name'exp)
(?: exp) matches exp, does not capture the matched text, and does not assign a group number to the group
zero-width assertion
(?=exp) matches the position before exp
(?<=exp) matches the position after exp
(?!exp) matches the following Followed by a position that is not exp
(?<!exp) Matches a position that is not preceded by an exp
Comment
(?#comment) This type of grouping does not have any effect on the processing of regular expressions, and is used to provide comments for human reading
It is important to note that a zero-width assertion does not occupy space, that is, it will not be returned in the matching result.
(?:exp) neither captures the matched text nor assigns a group number to this group, so what's the use of this thing?
(?:exp) is a non-capturing group that matches the content of exp, but does not capture it in the group.
Generally speaking, it is to save resources and improve efficiency.
For example, to verify whether the input is an integer, you can write it like this
^([1-9 ][0-9]*|0)$
At this time, we need to use () to limit the scope of "|" to indicate "or" relationship, but we just need to judge the rules, there is no need to save the content matched by exp into the group , then you can use a non-capturing group
^(?:[1-9][0-9]*|0)$
Sometimes we have to use (), and () will capture the content matched by exp into the group by default, and in some cases we just judge the rules, or we don't need to do the matching content here () later. When referencing, there is no need to capture it into the group. On the one hand, it will cause a waste of resources and on the other hand, it will reduce the efficiency. In this case, a non-capturing group is used.
As for these things, it is said that it is unclear, and it is useless to look at the symbols. It is best to use the above example.
//The regular expression is awesome, the noun is awesome, but it's actually very simple static void Main(string[] args) { //(exp) matches exp and captures text into automatically named groups Regex reg = new Regex(@"A(\w+)A"); Console.WriteLine(reg.Match("dsA123A")); //输出 A123A Console.WriteLine(reg.Match("dsA123A").Groups[1]); //输出123 //(?<name>exp) matches exp, and captures the text into a group named name, which can also be written as (?'name'exp) Regex reg2 = new Regex(@"A(?<num>\w+)A"); Console.WriteLine(reg2.Match("dsA123A").Groups["num"]); //输出123 Regex reg3 = new Regex(@"A(?:\w+A)"); Console.WriteLine(reg3.Match("dsA123A")); Console.WriteLine("=============================="); //(?=exp) matches the zero-width positive prediction look-ahead assertion in front of exp Regex reg4 = new Regex(@"sing(?=ing)"); //The meaning of the expression is, I think there will be ing after sing, if sing is followed by ing, then the sing will match successfully, Note that predicates will not be matched Console.WriteLine(reg4.Match("ksingkksingingkkk")); //输出 sing Console.WriteLine(reg4.Match("singddddsingingd").Index); //Output 8 Output 8 means that the previous sing is not matched //(?<=exp) Match the position after exp with zero width and make an assertion after looking back Regex reg5 = new Regex(@"(?<=wo)man"); Console.WriteLine(reg5.Match("Hi man Hi woman")); //输出 man Console.WriteLine(reg5.Match("Hi man Hi woman").Index); //Output 12 and count which one matches with your fingers //(?!exp) matches a positional zero-width negative lookahead assertion that is not followed by exp Regex reg6 = new Regex(@"sing(?!ing)"); Console.WriteLine(reg6.Match("singing-singabc")); //输出 sing Console.WriteLine(reg6.Match("singing-singabc").Index); //The output 8 has to be counted with fingers //(?<!exp) Assert after matching zero-width negative lookback at the position that is not exp before it Regex reg7 = new Regex(@"(?<!wo)man"); Console.WriteLine(reg7.Match("Hi woman Hi man")); //输出 man Console.WriteLine(reg7.Match("Hi woman Hi man").Index); //Output 12 to calculate which one matches //(?#comment) has no effect on the processing of regular expressions and is used to provide comments for human reading Regex reg8 = new Regex("ABC(?#This is just a comment)DEF"); Console.WriteLine(reg8.Match("ABCDEFG")); //输出 ABCDEF }
lazy matching
Code/Syntax Explanation
*? Repeat any number of times, but as few times as possible
+? Repeat 1 or more times, but as few times as possible
?? Repeat 0 or 1 times, but as few times as possible
{n,m}? Repeat n to m times, but as few as possible
{n,}? Repeat more than n times, but as little as possible
If you pay attention carefully, you will find that the lazy matcher is actually just an addition to the original qualifier? to mean as few matches as possible.
class Program { //The regular expression is awesome, the noun is awesome, but it's actually very simple static void Main(string[] args) { // lazy match Regex reg1 = new Regex(@"A(\w)*B"); Console.WriteLine(reg1.Match("A12B34B56B")); //Output A12B34B56B //Note that the default is to match as much as possible Regex reg2 = new Regex(@"A(\w)*?B"); //\w Repeat as many times as possible Console.WriteLine(reg2.Match("A12B34B56B")); //输出 A12B Regex reg3 = new Regex(@"A(\w)+?"); //\w Repeat 1 or more times, but as little as possible Console.WriteLine(reg3.Match("AB12B34B56B")); //Output AB Note the test string here Regex reg4 = new Regex(@"A(\w)??B"); //\w Repeat 0 or 1 times, but as little as possible Console.WriteLine(reg4.Match("A12B34B56B")); //The output is blank, and the match fails, because at least \w must be repeated twice Console.WriteLine(reg4.Match("A1B2B34B56B")); //输出 A1B Regex reg5 = new Regex(@"A(\w){4,10}?B"); //\w Repeat at least 4 times and at most 10 times Console.WriteLine(reg5.Match("A1B2B3B4B5B")); //When the output of A1B2B3B reaches the 4th, it happens that the 4th character is 3 and only matches the B behind 3 Regex reg6 = new Regex(@"A(\w){4,}?"); //\w at least 4 repetitions, no upper limit at most Console.WriteLine(reg5.Match("A1B2B3B4B5B")); //When the output of A1B2B3B reaches the 4th, it happens that the 4th character is 3 and only matches the B behind 3 Console.ReadKey(); } }
balance group
The regular expression balance group is used to match content that starts and ends with an equal number of symbols on the left and right sides.
For example, for the string "xx <aa <bbb> <bbb> aa> yy>", the < > on the left and right sides is not equal, if simple The <.+> matches the content between the outermost opening bracket < and the closing bracket
>, but the number of opening and closing brackets is inconsistent. If you want to match strings that are normally terminated by left and right parentheses, then you need to use balanced groups.
Balance group syntax:
(?'group') Name the captured content group, and push it onto the stack
(?'-group') Pop the last captured content of the group that was pushed onto the stack from the stack, if the stack Originally empty, the matching of this group fails
(?(group)yes|no) If there is a capture content named group on the stack, continue to match the expression of the yes part, otherwise continue to match the no part
(?!) Zero Wide negative lookahead assertion, since there is no postfix expression, trying to match always fails
static void Main(string[] args) { //Balance group We now want to match the contents of the outermost parenthesis string strTag = "xx <aa <bbb> <bbb> aa> yy>"; //The target to match is <aa <bbb> <bbb> aa>, note that the number of brackets is not equal Regex reg = new Regex("<.+>"); Console.WriteLine(reg.Match(strTag)); //Output <aa <bbb> <bbb> aa> yy> See the target inconsistent with the desired match, mainly because the number of < and > is not equal Regex reg3 = new Regex("<[^<>]*(((?'Open'<)[^<>]*)+((?'-Open'>)[^<>]+))*(?(Open)(?!))>"); Console.WriteLine(reg3.Match(strTag)); //<aa <bbb> <bbb> aa> the target is correct //The most common example of a balanced group, matching HTML, the following is matching the content inside the nested DIV Regex reg2 = new Regex(@"<div[^>]*>[^<>]*(((?'Open'<div[^>]*>)[^<>]*)+((?'-Open'</div>)[^<>]*)+)*(?(Open)(?!))</div>"); string str = "<a href='http://www.baidu.com'></a><div id='div1'><div id='div2'>Are you doing well in another country?</div ></div><p></p>"; Console.WriteLine(reg2.Match(str)); //Output <div id='div1'><div id='div2'> Are you ok in a foreign country? </div></div> Console.ReadKey(); }
Syntax Explanation:
< #The outermost left parenthesis [^<>]* #The content of the outermost left parenthesis is not the parenthesis ( ( (?'Open'<) #When the left bracket is encountered, write an "Open" on the blackboard [^<>]* #Match the content that is not parenthesis after the left parenthesis )+ ( (?'-Open'>) #When the closing bracket is encountered, erase an "Open" [^<>]* #Match anything that is not a parenthesis after the closing parenthesis )+ )* (?(Open)(?!)) #Before encountering the outermost closing bracket, judge whether there is any "Open" on the blackboard that has not been erased; if there is, the match fails > #outermost closing parenthesis