Regular Expression Study

This chapter would like to share with you some basic usage of regular expressions, hoping to help some of the white, but also to prevent their future forgotten knowledge point, let's formal entry into the theme.

First, the regular expression

  1, regular expression pattern by a common text characters (e.g. characters a to z) and special characters (called meta characters) thereof.
  2, regular expression as a template, a character pattern to match with the search string.
  3, when writing a program or web page processing strings, there is often a need to find or replace strings meet certain complex rules.
  4, the regular expression is to record the code text rule.

  Role:
    1, to find data
    2, replace the data

  regex (replace extracted matching string, string, string) What to do

Second, regular expressions constitute

  1, ordinary characters

    This includes all of the characters uppercase and lowercase letters, all numbers, all punctuation marks and special symbols.
    For example: Hello world xyh666

  2, the definition of the character set ( range ) (this point is a single character match, in order to match the string qualifier needed to achieve binding)
    [AE]  represents the characters a to e in a single character
    [aeiou]  represents aeiou wherein five characters a certain character
    [a-zA-Z]  represents uppercase, lowercase letters in a single character
    [0-9]  represents an number between 0 to 9

     Represents a non
    [^ lsjd]  : not any character in brackets
    [^ af]   : any character outside the range af

  3, a combination of characters (represented by uppercase non) (this point is the single character match, in order to match the string qualifier needed to achieve binding)

    \ d   : Matches a digit character. Equivalent to [0-9].
    \ D   : to match a non-numeric characters. It is equivalent to [^ 0-9].
    \ w   : to match a letter or a number or a character or an underscore.
    \ W   : matching a non-alphabetic, non-numeric, non-non-Chinese characters and the underscore character.
    \ S   : a matching any whitespace, including spaces, tabs, page breaks and the like. Is equivalent to [\ f \ n \ r \ t \ v].
    \ S   : matches any non-whitespace. Is equivalent to [^ \ f \ n \ r \ t \ v].
    \ b  : matching words start or end.
    \ B   : at the beginning or end of a word is not matched position.

  4, special characters

    $   : Indicates the end of the string.
     : Indicates the start position of the string (in the range of said non-)
     : In addition to a matching point represents a newline \ any single character other than n- (To match the string qualifier needed to achieve binding)
     : or means, a choice between two specified with [...] similar
     : this symbol is used escaped
    ()  : grouping a sub-mark start and end positions expression.

  5, common qualifier
    ================= number of matches =================
    {m}  : a front unit strict appear m times
    {m,}   : a front unit occurs at least m
    {m, n}  : a front unit occurs at least m, up to n times
    ================= ========================
    ================= multiple hits ====== ===========
      : 0 or any number of times it appears in front of the unit
      : appear more than once or at least once a match preceded by that unit
    ?   : front of the unit appear 0 or 1 lazy match times (as short as possible match)
    =========================================
  6, greed and laziness (greedy and non-greedy mode) (match as long as possible and as short as possible match)
    *? repeated any number of times, but as few repeats
    +? repeated one or more times, but less duplication wherever possible
    ?? repeated 0 or 1, but as little as possible repetition
    {n, m}? repeated n to m times, but as few repeated
    {n,}? repeated n times or more, but less repeated as

  7, group

    When using () defines a regular expression group, the group of the regular engine will be matched to the order number stored in the buffer.

    By default, each group will automatically have a group number, the rule is: from left to right to left parenthesis grouping of flags, the group number of the first occurrence of a packet is 1, the second 2, in order to analogy.

    We can reference group has been cached by "\ digital" approach. \ 1 refers to the first set of matching, \ 2 refers to the second group, and so on.

    In brackets it will be treated as a whole match.

  8, non-matching and pre-investigation

    Non-matching: is correct then the engine will not be matched set of cached, we can not be referenced by our group, "\ digital" approach.

    Pre-check: Pre-check does not consume characters, that is, after a match occurs, the last match after the next match to start the search immediately, rather than starting from the characters that contains pre-investigation. (I.e., for pre-screening expression string is not consumed )

    (:? pattern)  non-access match, but not to acquire matching pattern matching results, not stored for later use. This is in use or character "(|)" is useful when combined parts of a model. For example, "industr (:? Y | ies )" is a more than | more brief expressions "industry industries".

    (? = pattern)  non-matching, positive affirmation pre-investigation , matching the search string at the beginning of the string any pattern matching, the match does not need to obtain for later use. For example, "Windows (= 95 |? 98 | NT | 2000)" can match "Windows2000" in the "Windows", but can not match "Windows3.1" in the "Windows". Pre-check does not consume characters, that is, after a match occurs, the last match after the next match to start the search immediately, rather than starting from the characters that contains pre-investigation.

    (?! pattern)  non-matching, positive negative pre- investigation, matching the search string at the beginning of the pattern does not match any of the string, the match does not need to obtain for later use. For example, "Windows (95 |?! 98 | NT | 2000)" can match "Windows3.1" in the "Windows", but can not match the "Windows2000" in the "Windows".

    (? <= pattern)  non-access match, the reverse is certainly pre-investigation , and certainly pre-investigation forward similar, but in the opposite direction. For example, "(? <= 95 | 98 | NT | 2000) Windows" can match "2000Windows" in the "Windows", but can not match "3.1Windows" in the "Windows".

    (? <! patte_n)  non-access match, the reverse negative pre-investigation , with negative pre-investigation forward similar, but in the opposite direction. For example "(<95 |?! 98 | NT | 2000) Windows" can match "3.1Windows" in the "Windows", but can not match "2000Windows" in the "Windows".

Third, the example shows

Next for the content of the second-largest point we give some examples to illustrate:

To be continued .......

 

Regex tester:

Links: HTTPS: // pan.baidu.com/s/1CwyrLH2dwbBk1KVi2FCGDw 
extraction code: nwyc

 

Disclaimer: This article is taken from section describes the network, any similarity is purely coincidental, if infringement please contact me modify, thank you! ! !

Guess you like

Origin www.cnblogs.com/xyh9039/p/11780032.html