C# regular expression RegularExpression related knowledge (Regex class usage details)

C# regular expression related knowledge

This document is a summary of my study and is provided for my obligation. It is not for commercial use. Please contact to delete infringement.
Part of the content of this document is reprinted, and the reprinting place will be marked below

1. Match as it is

Any character: match any character
as it is For example: 123, will match whether there are 123 characters in a string of characters, usually used for format recognition!

2. Escaping characters (reproduced)

\b Match a word boundary, that is, the position between the word and the space. For example, "er\b" can match the "er" in "never", but not the "er" in "verb".
\B Match non-word boundaries. "Er\B" can match the "er" in "verb", but not the "er" in "never".
\cx Matches the control character specified by x. For example, \cM matches a Control-M or carriage return character. The value of x must be one of AZ or az. Otherwise, treat c as a literal "c" character.
\d Match a numeric character. Equivalent to [0-9].
\D Match a non-digit character. Equivalent to [^0-9].
\f Matches a form feed character. Equivalent to \x0c and \cL.
\n Match a newline character. Equivalent to \x0a and \cJ.
\r Matches a carriage return character. Equivalent to \x0d and \cM.
\s Matches any blank characters, including spaces, tabs, form feeds, etc. Equivalent to [\f\n\r\t\v].
\S Match any non-whitespace character. Equivalent to [^ \f\n\r\t\v].
\t Matches a tab character. Equivalent to \x09 and \cI.
\ v Matches a vertical tab character. Equivalent to \x0b and \cK.
\w Matches any word character including underscore. Equivalent to "[A-Za-z0-9_]".
\W Match any non-word character. Equivalent to "[^A-Za-z0-9_]".
\xn Match n, where n is the hexadecimal escape value. The hexadecimal escape value must be two digits long. For example, "\x41" matches "A". "\X041" is equivalent to "\x04&1". ASCII encoding can be used in regular expressions. .
\on one Match num, where num is a positive integer. A reference to the obtained match. For example, "(.)\1" matches two consecutive identical characters.
\n Identifies an octal escape value or a backward reference. If there are at least n sub-expressions before \n, then n is a backward reference. Otherwise, if n is an octal number (0-7), then n is an octal escape value.
\nm Identifies an octal escape value or a backward reference. If there are at least nm obtained subexpressions before \nm, then nm is a backward reference. If there are at least n acquisitions before \nm, then n is a backward reference followed by the text m. If the preceding conditions are not met, if n and m are both octal numbers (0-7), \nm will match the octal escape value nm.
\ nml If n is an octal digit (0-3), and both m and l are octal digits (0-7), match the octal escape value nml.
\a Match n, where n is a Unicode character represented by four hexadecimal digits. For example, \u00A9 matches the copyright symbol (©).

3. Special characters

\ Mark the next character as a special character, or a literal character, or a backward reference, or an octal escape character. For example, "n" matches the character "n". "\N" matches a newline character. The serial "\" matches "\" and "(" matches "(".
^ Match the beginning of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after "\n" or "\r".
$ Match the end position of the input string. If the Multiline property of the RegExp object is set, $ also matches the position before "\n" or "\r".
* Matches the preceding subexpression zero or more times. For example, zo* can match "z" as well as "zoo". *Equivalent to {0,}.
+ Match the preceding sub-expression one or more times. For example, "zo+" can match "zo" and "zoo" but not "z". +Equivalent to {1,}.
? Matches the preceding subexpression zero or one time. For example, "do(es)?" can match the "do" in "does" or "does". ? Equivalent to {0,1}.
{n} n is a non-negative integer. Matches determined n times. For example, "o{2}" cannot match the "o" in "Bob", but it can match the two o's in "food".
{n,} n is a non-negative integer. Match at least n times. For example, "o{2,}" cannot match the "o" in "Bob", but it can match all o in "foooood". "O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "o*".
{n,m} m和n均为非负整数,其中n<=m。最少匹配n次且最多匹配m次。例如,“o{1,3}”将匹配“fooooood”中的前三个o。“o{0,1}”等价于“o?”。请注意在逗号和两个数之间不能有空格。
? 当该字符紧跟在任何一个其他限制符(*,+,?,{n},{n,},{n,m})后面时,匹配模式是非贪婪的。非贪婪模式尽可能少的匹配所搜索的字符串,而默认的贪婪模式则尽可能多的匹配所搜索的字符串。例如,对于字符串“oooo”,“o+?”将匹配单个“o”,而“o+”将匹配所有“o”。
. 匹配除“\n”之外的任何单个字符。要匹配包括“\n”在内的任何字符,请使用像“(.
(pattern) 匹配pattern并获取这一匹配。所获取的匹配可以从产生的Matches集合得到,在VBScript中使用SubMatches集合,在JScript中则使用$0…$9属性。要匹配圆括号字符,请使用“(”或“)”。
(?:pattern) 匹配pattern但不获取匹配结果,也就是说这是一个非获取匹配,不进行存储供以后使用。这在使用或字符“(
(?=pattern) 正向肯定预查,在任何匹配pattern的字符串开始处匹配查找字符串。这是一个非获取匹配,也就是说,该匹配不需要获取供以后使用。例如,“Windows(?=95
(?!pattern) 正向否定预查,在任何不匹配pattern的字符串开始处匹配查找字符串。这是一个非获取匹配,也就是说,该匹配不需要获取供以后使用。例如“Windows(?!95
(?<=pattern) 反向肯定预查,与正向肯定预查类拟,只是方向相反。例如,“(?<=95
(?<!pattern) 反向否定预查,与正向否定预查类拟,只是方向相反。例如“(?<!95
(.*?) 匹配任意字符,比如”aaa555aaa”,要匹配中间的数字,就可以写为”aaa(.*?)aaa”
() ()表示匹配组 ,()本身不匹配任何东西,也不限制匹配任何东西,只是把括号内的内容作为同一个表达式来处理 例如(ab){1,3},就表示ab一起连续出现最少1次,最多3次
x y
[xyz] 字符集合。匹配所包含的任意一个字符。例如,“[abc]”可以匹配“plain”中的“a”。
[^xyz] 负值字符集合。匹配未包含的任意字符。例如,“[^abc]”可以匹配“plain”中的“p”。
[a-z] 字符范围。匹配指定范围内的任意字符。例如,“[a-z]”可以匹配“a”到“z”范围内的任意小写字母字符。
[^a-z] 负值字符范围。匹配任何不在指定范围内的任意字符。例如,“[^a-z]”可以匹配任何不在“a”到“z”范围内的任意字符。

4.正则表达式在C#中实现

以上为正则表达式常用语法,现在让我们来看看C#中正则表达式如何实现以及Regex类的使用方法!

首先引入命名空间System.Text.RegularExpressions

正则表达式具体实现有两种方法(可以混用):

	1.利用静态方法匹配,结果存储在Match类中,然后通过match集合来得到匹配结果
	2.实例化Regex类,然后匹配得到 MatchCollection对象,遍历 MatchCollection对象得到Match集合

现在我们直接用示例来解释:
1. 判断是否有符合正则表达式的文本
Regex.IsMatch(input,pattern)
示例:从数组中循环判断是否有满足包含5-11位数组+@+qq或163+.com的表达式(匹配邮箱)

 static void IsMatch()
{
      string[] text = { "[email protected]", "[email protected]", "sfdd5f1d5f1ds5@s15d35f1", "[email protected]", "[email protected]" };
      string pattern = @"\d{5,11}@qq|163\.com";
      foreach (string t in text)
      {
      	 Console.WriteLine(t + "是否符合规则:" + Regex.IsMatch(t, pattern));
      }
      Console.Read();
}

示例代码

2. 从一段文本中匹配出相应数据
在此说明:()号的作用是子表达式,这个概念比较难理解,简单解释就是用括号括起来的子表达式里的内容才会被匹配出来,详细看下方示例:
第一种实现:

static void Match()
{
       string text = "语文:95 数学:54|语文:98 数学:45|语文:56 数学:87|语文:15 数学:55|语文:89 数学:100|语文:89 数学:0" ;
       string pattern = @"语文:(\d{0,3})\s数学:(\d{0,3})";
       Match match = Regex.Match(text,pattern);
       while (match.Success)
       {
           Console.WriteLine("匹配到的语文成绩:"+match.Groups[1].Value+"匹配到的数学成绩:"+match.Groups[2].Value);
           match = match.NextMatch();
        } 
 }

第二种实现:

static void MatchCollection()
{
    string text = "语文:95 数学:54|语文:98 数学:45|语文:56 数学:87|语文:15 数学:55|语文:89 数学:100|语文:89 数学:0";
    string pattern = @"语文:(\d{0,3})\s数学:(\d{0,3})";
    Regex regex = new Regex(pattern);
    MatchCollection matchs = regex.Matches(text);
    foreach (Match match in matchs)
    {
       Console.WriteLine("匹配到的语文成绩:" + match.Groups[1].Value + "匹配到的数学成绩:" + match.Groups[2].Value);
    }
}

运行结果完全相同:
运行结果
注意:想要匹配的数据应放在子表达式()中,其他的作为识别的标志
再此表达式中,整个表达式用于规范格式,然后从子表达式中取值

注意:一个Regex示例只能用于一个正则表达式,如果更换正则表达式请重新实例化类
此代码运行效果同上!

3. 正则表达式的替换与分割
注意:此处只演示基本格式,本人更倾向于逻辑上(代码上)的替换操作

$数值 包括替换字符串中的由 number标识的捕获组所匹配的最后一个子字符串,其中 number 是一个十进制值。
${ name } 包括替换字符串中由 (? ) 指定的命名组所匹配的最后一个子字符串。
$$ 包括替换字符串中的单个“$”文本。
$& 包括替换字符串中整个匹配项的副本。
$` 包括替换字符串中的匹配项前的输入字符串的所有文本。
$’ 包括替换字符串中的匹配项后的输入字符串的所有文本。
$+ 包括在替换字符串中捕获的最后一个组。
$_ 包括替换字符串中的整个输入字符串。
static void Replace()//利用正则表达式替换文本
{
   string text = "价格为:36";
   string pattern = @"\d{2}";
   string replacement = "¥$&";   Console.WriteLine(Regex.Replace(text,pattern,replacement));
}

上方代码效果:把36替换为¥36

static void Result()//利用匹配后的match对象再替换
{
    string text = "价格为:36";
    string pattern = @"\d{2}";
    string replacement = "¥$&";
    Regex regex = new Regex(pattern);
    Match match= regex.Match(text);
    Console.WriteLine(match.Result(replacement));
} 

注意:由于匹配结果已经存在于match变量中,是有意第二种方式只需填写被替换的正则表达式即可!
两种结果完全相同:
结果图片
利空正则表答式分割文本

static void Split()//利用正则表达式分割文本
{
    string text = "abc bcde fdsa jdji fgkkkkk fdfh";
    string pattern = @"\s";
    string[] results=Regex.Split(text,pattern);
    foreach (string result in results)
    {
        Console.WriteLine(result);
    }
 }

结果图片

个人认为最常用正则表达式

万能匹配符:(.*?)
万能匹配符可匹配所有规定表达式中的任意字符
比如任意一个字符串:
aaa待匹配内容aaa
aaa待匹配内容aaa
aaa待匹配内容aaa
aaa待匹配内容aaa
aaa待匹配内容aaa
aaa待匹配内容aaa

我们的正则表达式就可以写为aaa(.*?)aaa

而我们的代码则可以替换为:

static void MatchCollection()
{
    string text = "语文:95 数学:54|语文:98 数学:45|语文:56 数学:87|语文:15 数学:55|语文:89 数学:100|语文:89 数学:0";
    string pattern = @"语文:(.*?)\s数学:(.*?)";
    Regex regex = new Regex(pattern);
    MatchCollection matchs = regex.Matches(text);
    foreach (Match match in matchs)
    {
       Console.WriteLine("匹配到的语文成绩:" + match.Groups[1].Value + "匹配到的数学成绩:" + match.Groups[2].Value);
    }
}

运行结果完全相同!

部分知识点转载自:原链接

Guess you like

Origin blog.csdn.net/qq_42628989/article/details/88202320