Get Python regular expressions, on this one it!

1. The regular expression syntax

1.1 Character and character class
    . 1 Special Characters: ^ $ + * {} [ ] () |?
      Or more special characters in order to use literal must be escaped
    2 character class
      a 1 contained in [] One or more of characters are called character class, character class if you do not specify a quantifier the only match in which the match.
      2. The character classes can be specified within a range, such as [a-zA-Z0-9] represents a to z, A to Z, any character between 0 and 9.
      3 followed by a left bracket after ^, indicates a denial character class, such as [^ 0-9] match represents an arbitrary non-numeric characters.
      4. The internal character classes, in addition, other special characters no longer have a special meaning, it has expressed literal. ^ In the first position represents a negation, on the other positions represents itself ^ - represents the middle range, the characters in the first character class, then - itself.

      The internal shorthand character classes may be used, such as DSW
    . 3 shorthand
      . Matches any character except a newline, if re.DOTALL flag, matches any character including newline
      d match a Unicode number, if the band re. ASCII, the match 0-9
      D Unicode non-matching number
      s match Unicode blank if with re.ASCII, the matching of a
      s Unicode non-matching gaps
      w matches the Unicode character words, if with re.ascii, matching the [ a-zA-Z0-9_] in a
      W matches the list of non-Unicode character

1.2 quantifier
    1.? Matches the preceding character 0 or 1
    2 * matches the preceding character zero or more times
    3 + 1 matches the preceding character or more times
    4. {m} m times the preceding expression matching
    5 . {m,} matches the preceding expression at least m
    6. {, n} the preceding regular expression matching most n
    7. {m, n} the preceding regular expression matching at least m times, most n
    Precautions:
      more quantifiers are greedy, will match as much as possible, if you want to change non-greedy mode, followed by a quantifier in the back? to achieve

1.3 group and the capture
    1 () action:
      1. Capture () content of regular expressions for further processing use, you can follow behind the left parenthesis:? Capture feature to turn off the brackets
      2. The regular expression part of the combination, in order to use quantifiers or |
    2 responses cited earlier captured content () inside:
      1. group number reverse reference
        ? each is not used: parentheses are assigned a group of good, starting from 1, from left increment to the right, can be cited earlier in this () within the captured by the expression I
      2. reverse captured content referenced in parentheses by the front group name
        can be followed by? P <name> after the left parenthesis, angle brackets placed to a group name for the group an alias, followed by (? P = name) to refer to the content of the previously captured. As (? P <word> w + ) s + (? P = word) is repeated to match words.
    Note 3 points:
      Backreferences not be placed in the character class [] are used.

Assertion labeled with 1.4
    assertion does not match any of the text, the text is only applied to certain constraints assertion where
    a common assertion:
      Boundary 1. matching words, on the character class [] indicates the Backspace
      2. B matches non-word boundary, Effect by ASCII tag
      3. A match at the start
      4. ^ match at the start, if MULTILINE flag, the matching after each linefeed
      5. Z matching at the end of
      6 $ at the end of the match, if there MULTILINE flag, the matching of each line break before
      7. (? = e) n Prospects
      8. (?! e) negative Prospects
      9. (? <= e) n review
      10. (? <! e) negative review
    2 forward-looking review to explain
      Preview: exp1 behind the content exp1 to match exp2 (= exp2?)
      negative Preview: exp1 behind the content exp1 not match exp2 (exp2?!)
      Looking back: (? <= exp2) exp1 exp1 previous content to matching exp2
      negative Looking back: (?! <exp2) content exp1 exp1 front does not match exp2
      example: we're looking for hello, but hello must be followed world, regular expressions can be written like this: "(hello) s + ( ? = world) "to match" hello wangxing "and" hello world "can only be matched to the latter hello

1.5 Conditions matches
    ((id) yes_exp | no_exp? ): Id of the corresponding sub-expression to match if the content is here to match yes_exp, otherwise the match no_exp

1.6 regular expressions flag
    flag 1. regular expressions used in two ways
      1. a method to compile flag parameter passing, a plurality of flags used | segmentation method, such as re.compile (r "# [da- f ] {}. 6 ", re.IGNORECASE | re.MULTILINE)
      ? 2. by adding the foregoing expression (flag) in a positive manner to flag a regular expression, such as (ms) # [da-z ] {6}?
    2. signs commonly
      re.A or re.ASCII, so that B s S w W d D are assumed string string is assumed the ASCII
      re.I re.IGNORECASE or ignore case that the regular expression
      re.M or re .MULTILINE match multiple rows, each ^ after every carriage, each carriage before each match $
      re.S enable or re.DOTALL. matches any character, including the transport
      re.X or re. VERBOSE this expression across multiple lines in the positive, you can also add comments, but gaps need to use s or [] to represent, because the default of blank will not be explained. Such as:
        the re.compile (R & lt "" "
          <imgs +) # start tag
          [^>] * # src attribute is not?
          Src = Start # src attribute
          (:?
          ? (P <quote> [" ']) # opening quotation mark
          (? P <image_name>
          (? P = quote) # closing parenthesis
          "" ", re.VERBOSE | re.IGNORECASE)

2. Python regular expression module
 2.1 regular expression handling strings there are four main functions
    1. Check whether a string matches the regular expression syntax in line with the general return true or false
    2. obtain regular expressions to extract strings meet the requirements of the text
    3. Find alternative text strings that match the regular expression, and replaced with the corresponding string
    4. split a regular expression string divided.
2.2 Python re module in two ways using regular expressions
    1. re.compile (r, f) a method of generating a regular expression object, and then calls the appropriate method in the regular expression object. The advantage of this approach is that after generating a regular object can be used multiple times.
    Method 2. re module for each module regex object method has a corresponding object, the only difference is that the first argument is the regular expression character string. This method is suitable for use only once a regular expression.
2.3 The method used a regular expression object
    1. rx.findall (s, start, end ):
      returns a list, if the packet is not the regular expression, the list contains all the content is matched,
      if there is a regular expression packets, each element in the list is a tuple, the tuple is included in the matched sub-packet content, but does not return the entire contents of the regular expression matching
    2. rx.finditer (s, start, end ):
      returns an iterator objects
      to iterate iterables, returns each time a matching object, call the object's matched group () method to match the specified group to view content, 0 indicates the entire regular expression matching to the content
    3. rx.search (s, start, end ):
      Returns a matching object, if not matched, it returns None
      Search method for matching once stopped, will not continue to the next matching
    4. rx.match (s, start, end):
      If the regular expression match the beginning of the string, it returns a matching object, otherwise None
    5. the rx.sub (X, S, m):
      returns a string. Where each matching is replaced by x, returns the replacement string, specifying m, the replacement up to m times. May be used for x / i or / g <id> id may be a number or group name to refer to the captured content.
      Module method re.sub x (r, x, s , m) of a function may be used. At this point we can push through this function to process the captured content before replacing the matched text.
    Rx.subn 6. The (X, S, m):
      () re.sub same method, except that returns a tuple, which is the result of a string, a number of alternative is to do.
    7. rx.split (s, m): segmentation string
      returns a list of
      regular expression matching to the contents of the character string is divided
      if the regular expression in the presence of a packet, packet matches to put on the list for each content two intermediate divided as part of the list, such as:
      RX = the re.compile (R & lt "(D) [AZ] + (D)")
      S = "ab12dk3klj8jk9jks5"
      result = rx.split (s)
      return [ 'Ab1', '2', '. 3', 'KLJ', '. 8', '. 9', 'jks5']
    8. The rx.flags (): regular expression compiler flag set
    9. rx.pattern (): the string that is used when expression compiler

If you are still confused in the programming world, you can join us to learn Python buckle qun: 784758214, look at how seniors are learning. Exchange of experience. From basic web development python script to, reptiles, django, data mining and other projects to combat zero-based data are finishing. Given to every little python partner! Share some learning methods and need to pay attention to small details, click on Join us python learner gathering
 properties and methods 2.4 matching object
    01. m.group (g, ...)
      returns the number or group name to match the content, default or 0 indicates that the entire expression match to the content, if a plurality of specified, returns a tuple
    02. m.groupdict (default)
      returns a dictionary. Dictionary key is the group name for all named group is named after the group captured content
      if there is default parameters, it is set as the default values for those not involved in the match.
    03. m.groups (default)
      returns a tuple. Contains all captured content to a sub-group, from the beginning, if a default value is specified, then this value as the value that does not capture the content of the group of
    04. m.lastgroup ()
      name matched to the content of the highest numbered capturing group If there is no or no None is returned (not used) using the name of
    05. m.lastindex ()
      matched to the number of the highest numbered capturing group content, if not return None.
    06. m.start (g):
      the current sub-packet matching objects from the beginning of the string that matches, if the match is not involved in the current set -1
    07. m.end (g)
      subpacket matching objects from the current position of the matching string is ended, if the current group not involved in matching return -1
    08. m.span ()
      returns a tuple, the contents were is m.start (g) and m.end (g) return value
    09. m.re ()
      to generate the regular expression matching object
    10. m.string ()
      character is transmitted to the match or matches for the search string
    11. m.pos ()
      start of the search. I.e., the beginning of the string, or the position of the specified start (not used)
    12. The m.endpos ()
      end position of the search. I.e. the end of the string, or the specified end location (not used)
2.5 summarizes
    1. A method for regular expression matching, Python is not true and false returns, but can match or search process to return value is a None determines
    2. search for regular expressions, the search time can be used only if the object matches or match search method returns obtained, may be used to search multiple iterables finditer access method returns iteratively
    3. for the regular expression the replacement function of formula, or can use sub subn regular expression method to achieve the object, may be achieved by re module or sub subn method, except that the replacement text sub process module may use a function to generate
    4. For regular expression splitting function, you can use the split method of a regular expression object, you need to pay attention if there are packets regular expression object, then the contents of packet capture will put the returned list

Guess you like

Origin blog.51cto.com/14510224/2437136