C# 中正则表达式 Group 分组

在一个正则表达式中，如果要提取出多个不同的部分（子表达式项），需要用到分组功能。

在 C# 正则表达式中，Regex 成员关系如下，其中 Group 是其分组处理类。

Regex –> MatcheCollection (匹配项集合)

          –> Match (单匹配项内容)

                –> GroupCollection (单匹配项中包含的 "(分组/子表达式项)" 集合)

                      –> Group ( "(分组/子表达式项)" 内容)

                            –> CaputerCollection (分组项内容显示基础？)

                                  –> Caputer

Group 对分组有两种访问方式：

1、数组下标访问

在 ((\d+)([a-z]))\s+ 这个正则表达式里总共包含了四个分组，按照默认的从左到右的匹配方式，

Groups[0]    代表了匹配项本身，也就是整个整个表达式 ((\d+)([a-z]))\s+

Groups[1]    代表了子表达式项 ((\d+)([a-z]))

Groups[2]    代表了子表达式项 (\d+)

Groups[3]    代表了子表达式项 ([a-z])

 
        string  
        text =  
        "1A 2B 3C 4D 5E 6F 7G 8H 9I 10J 11Q 12J 13K 14L 15M 16N ffee80 #800080" 
        ; 
       
        Response.Write(text +  
        "<br/>" 
        ); 
       
        string  
        strPatten =  
        @"((\d+)([a-z]))\s+" 
        ; 
       
        Regex rex =  
        new  
        Regex(strPatten, RegexOptions.IgnoreCase); 
       
        MatchCollection matches = rex.Matches(text); 
       
        //提取匹配项 
       
        foreach  
        (Match match  
        in  
        matches) 
       
        { 
       
        GroupCollection groups = match.Groups; 
       
        Response.Write( 
        string 
        .Format( 
        "<br/>{0} 共有 {1} 个分组：{2}<br/>" 
       
        , match.Value, groups.Count, strPatten)); 
       
        //提取匹配项内的分组信息 
       
        for  
        ( 
        int  
        i = 0; i < groups.Count; i++) 
       
        { 
       
        Response.Write( 
       
        string 
        .Format( 
        "分组 {0} 为 {1}，位置为 {2}，长度为 {3}<br/>" 
       
        , i 
       
        , groups[i].Value 
       
        , groups[i].Index 
       
        , groups[i].Length)); 
       
        } 
       
        } 
       
        /*  
       
        * 输出： 
       
        1A 2B 3C 4D 5E 6F 7G 8H 9I 10J 11Q 12J 13K 14L 15M 16N ffee80 #800080 
       
        1A 共有 4 个分组：((\d+)([a-z]))\s+ 
       
        分组 0 为 1A ，位置为 0，长度为 3 
       
        分组 1 为 1A，位置为 0，长度为 2 
       
        分组 2 为 1，位置为 0，长度为 1 
       
        分组 3 为 A，位置为 1，长度为 1 
       
        .... 
       
        */

2、命名访问

利用 (?<xxx>子表达式) 定义分组别名，这样就可以利用 Groups["xxx"] 进行访问分组/子表达式内容。

 
        string  
        text =  
        "I've found this amazing URL at http://www.sohu.com, and then find ftp://ftp.sohu.comisbetter." 
        ; 
       
        Response.Write(text +  
        "<br/>" 
        ); 
       
        string  
        pattern =  
        @"\b(?<protocol>\S+)://(?<address>\S+)\b" 
        ; 
       
        Response.Write(pattern.Replace( 
        "<" 
        ,  
        "&lt;" 
        ).Replace( 
        ">" 
        , 
        "&gt;" 
        ) +  
        "<br/><br/>" 
        ); 
       
        MatchCollection matches = Regex.Matches(text, pattern); 
       
        foreach  
        (Match match  
        in  
        matches) 
       
        { 
       
        GroupCollection groups = match.Groups; 
       
        Response.Write( 
        string 
        .Format( 
       
        "URL: {0}； Protocol: {1}； Address: {2} <br/>" 
       
        , match.Value 
       
        , groups[ 
        "protocol" 
        ].Value  
       
        , groups[ 
        "address" 
        ].Value)); 
       
        } 
       
        /*  
       
        * 输出 
       
        I've found this amazing URL at http://www.sohu.com, and then find ftp://ftp.sohu.comisbetter. 
       
        \b(?<protocol>\S+)://(?<address>\S+)\b 
       
        URL: http://www.sohu.com； Protocol: http； Address: www.sohu.com  
       
        URL: ftp://ftp.sohu.comisbetter； Protocol: ftp； Address: ftp.sohu.comisbetter  
       
        */

内容参考自：

C#正则表达式编程（三）：Match类和Group类用法 http://blog.csdn.net/zhoufoxcn/archive/2010/03/09/5358644.aspx

C#正则表达式类Match和Group类的理解 http://tech.ddvip.com/2008-10/122483707982616.html

C# 中正则表达式 Group 分组

猜你喜欢