Interpretation in C # regular expressions

This article is excerpted LTP.NET knowledge base.

regexp class rules contained in System.Text.RegularExpressions.dll file, when you compile the application software must reference this file:

System.Text.RegularExpressions.dll

Introduction to namespaces

Only in the namespace contains classes 6 and a definition, which are:

Capture : contains the results of a match;

The CaptureCollection : Capture sequence;

Group : results of a set of records, inherited by Capture from;

Match : a match result of the expression, inherited by the Group;

The MatchCollection : Match of a sequence;

MatchEvaluator : execution agent used in the replacement operation;

The Regex : Expressions examples compiled.

Regex class also contains a number of static methods

The Escape : string of the regex escapes escape;

IsMatch : If the expression in the string matches, the method returns a Boolean value;

Match : Match returns an instance of;

The Matches : Match returns a list of methods;

The Replace : Replace matched expression with replacement string;

Split : Returns the number of strings determined by the expression;

Unescape : not string escape character escaped.

Simple match

Start using Regex, simple expressions Match class to start learning.

Match m = Regex.Match("abracadabra", "(a|b|r)+");
例:

if (m.Success) ...
If you want to use the matching string, it can be converted into a string:

Console.WriteLine("Match="+m.ToString());
输出: Match=abra。

This is the string matching.

The replacement string

Simple replacement string very intuitive.

For example, the following statement:

string s = Regex.Replace ( "abracadabra" , "abra", "zzzz");
it returns a string zzzzcadzzzz, all matching strings are replaced with zzzzz.

Now we look at a more complex example of the replacement string:

Regex.Replace S = String ( "Abra", @ "^ \ S . ( ?) \ S * $", "$ 1");
This statement returns a string abra, its leading and trailing spaces are removed.

The above models are useful for removing any strings leading and trailing spaces.

In C #, we often use strings of letters, the letters in a string, the compiler does the character "" as an escape character processing. In use the character "" is designated escape character, @ "..." is very useful.

Also worth mentioning is $ 1 in terms of the replacement string, it indicates that the replacement string can contain the string to be replaced.

Details of the matching engine

Next to be understood that a more complex example by a group of structure:

text = String "abracadabra1abracadabra2abracadabra3";
String PAT = @ "
    start (# first group of
     abra # matching string Abra
     (# second group began
     cad # match string CAD
     ?) # end of the second set (available option)
    ) # The first group ended
    + # match one or more
    ";

  // x modifier ignores comments using
  Regex r = new Regex (pat, "x");

  // get the list of group numbers of
  int [] gnums = r.GetGroupNumbers () ;

  // first matching
  Match m = r.Match (text);

  while (m.Success)
   {
  //从组1开始
   for (int i = 1; i < gnums.Length; i++)

    {

    Group g = m.Group(gnums[i]);

  // get the matching group
    Console.WriteLine ( "Group" + gnums [ i] + "= [" + g.ToString () + "]");

  // Calculate the start position and length of the group
    CaptureCollection cc = g.Captures;

    for (int j = 0; j < cc.Count; j++)

     {

     Capture c = cc [j];

     Console.WriteLine(" Capture" + j + "=["+c.ToString()

       + "] Index=" + c.Index + " Length=" + c.Length);

     }
    }
  // Next match
   m = m.NextMatch ();
   }
output of this example is shown below:

Group1 = [open]

      Capture0=[abracad] Index=0 Length=7

      Capture1=[abra] Index=7 Length=4

  Group2 = [ie]

      Capture0=[cad] Index=4 Length=3

  Group1 = [open]

      Capture0=[abracad] Index=12 Length=7

      Capture1=[abra] Index=19 Length=4

  Group2 = [ie]

      Capture0=[cad] Index=16 Length=3

  Group1 = [open]

      Capture0=[abracad] Index=24 Length=7

      Capture1=[abra] Index=31 Length=4

  Group2 = [ie]

      Capture0 = [cad] Index = 28 Length = 3
we start with the test string pat, pat contains expressions.

The first capture is from the beginning of the first parentheses, then the expression will be matched to a abra.

The second set of capture starting from the second parenthesis, but the first capture group is not over yet, which means that the results of the first group match is abracad, and the results of the second group match only cad.

So if by using? Alternatively the cad become a symbol match, the matching results may be abra or abracad.

Then, the first group will end, designated by the symbol + in claim multiple expression matching.

Now let's see happening in the matching process.

First, create an instance of expression by the constructor method calls the Regex, in which you specify various options.

In this case, since there are comments in expressions, so use the x option also uses some space.

Open x option, will ignore comments and expressions which there is no escape spaces.

Then, get a list of group numbers defined in the expression.

Of course you can use these numbers explicitly, as used herein, is a method of programming.

If the name of the group, as a way to establish a rapid indexing of this method is also very effective.

Next is the completion of the first match.

Through a cycle test whether the current match is successful, then repeat from group 1 to perform this operation on the group list.

Cause group 0 is not used in this example is the group 0 is a match string, if the string to match all collected as a single string, it will use the group 0.

We track each group of CaptureCollection.

Typically each match, each group can have only one capture, in the present embodiment, there are two Group1 capture: Capture0 and Capture1.

If you need only Group1 of ToString, it will only get abra, of course, it will be with abracad match.

Capture value is the value of the last of its ToString CaptureCollection in the group, which is exactly what we need.

If you want the whole process at the end of the match abra, it should be deleted from the expression + sign, so we just need to know regex engine to match the expression.

Based on the comparison and process-based method of expression

In general, users use regular expressions can be divided into the following two categories:

The first class of users minimize the use of regular expressions, but the use of the process to perform some operation to be repeated;

The second class of users take full advantage of the capabilities and power of regular expression processing engine, and use as little as possible.

For most of our users, the best solution than both and will use it up.

Hopefully this article will explain the role of language in the .NET regexp class as well as its superior and inferior point between performance and complexity.

Process-based models

We often need to use a feature in the program is part of the string to match or some other string processing, the following is an example of a word in a string to match:

string text = "the quick red fox jumped over the lazy brown dog.";
System.Console.WriteLine("text=[" + text + "]");
string result = "";
string pattern = @"\w+|\W+";
foreach (Match m in Regex.Matches(text, pattern))
{

  // Get the matching string
   string x = m.ToString ();

  // If the first character is lowercase
   if (char.IsLower (x [0] ))

  // capitalized
  x = char.ToUpper (x [0] ) + x.Substring (1, x.Length-1);

  Collect all characters //
   Result = X +;
}

System.Console.WriteLine ( "result = [" + result + "]");
Just as shown in the example above, we use C # language foreach statement matching process for each character, and corresponding processing is completed, the in this example, we create a new result string.

The output of this example are shown below:

text=[the quick red fox jumped over the lazy brown dog.]
result=[The Quick Red Fox Jumped Over The Lazy Brown Dog.]
基于表达式的模式

On completion of the embodiment is another way through a MatchEvaluator, new code as follows:

static string CapText(Match m)

    {

  // Get the matching string

    string x = m.ToString();

  // If the first character is lowercase

    if (char.IsLower(x[0]))

  // converted to uppercase

     return char.ToUpper(x[0]) + x.Substring(1, x.Length-1);

    return x;

    }

    

   static void Main()

    {

    string text = "the quick red fox jumped over the

     lazy brown dog.";

    System.Console.WriteLine("text=[" + text + "]");

    string pattern = @"\w+";

    string result = Regex.Replace(text, pattern,

   new MatchEvaluator(Test.CapText));

    System.Console.WriteLine("result=[" + result + "]");

    }
Note also that, since only the word to be modified without having to modify the non-word, this mode is very simple.

This paper from wood village network blog > Reading in C # regular expressions

Guess you like

Origin www.cnblogs.com/muzhuang/p/11708125.html