Python25 regular expressions

Regular Expressions

  • Regular expressions are a powerful weapon used to match strings. Its design idea is to use a language descriptive string to define a rule, those who comply with the rules of the string, we consider it a "match", otherwise, the string is not legitimate.
  • Common match:
    • \dA match may be any number 0-9
      • '00\d'Can match '007', not match'00A'
    • \wA 0-9 match any number, az / AZ any of a letter, a special character may match
      • '\w\w\d'Can match'py3'
    • .You can match any character
      • 'py.'Can match'pyc'
    • *It represents any number of characters
    • +It represents at least one character
    • ?It represents 0 or 1 characters
    • {n}N represents characters
    • {n,m}It represents nm characters
    • \sIt represents a space (including TAb)
      • \d{3}\s+\d{3,8}
        1. \d{3}3 represents any number, such as:010
        2. \s+It represents at least one space character, such as' '
        3. \d{3,8}It represents 3-8 digits, such as123
  • Range matching:
    • [0-9a-zA-Z\_]Representation can match a number, letter, underscore
    • [0-9a-zA-Z\_]+Indicates a match by the numbers, letters, or at least the underscore character string, such as 'A100','_123
    • [a-zA-Z\_][0-9a-zA-Z\_]*After the beginning of the match by the letters, numbers or underscores access any number of characters (which may be numbers, letters, underscores)
    • [a-zA-Z\_][0-9a-zA-Z\_]{0,19}More precisely limit the variable length of 1-20 characters (1 character in front of the back up to 19 characters +)
    • A|BA or B may be matched, such as (P|p)ythonmatch Python,python
    • ^Represents the beginning of the line, ^\dit expressed the need to start with a number
    • $Represents the end of the line, \d$it expressed the need to end with figures
    • ^...$A whole line of matching
    • 0[0-9]It represents a matching 00-09string
  • re module
    • Code:
      `` `PYTHON
      ! # Usr / bin / env / python3
      # - - Coding: UTF-8 - -

      Regex #
      # Python string itself with \ escape
      s = 'ABC \ -001' # Python string of
      # corresponding to regular expression string becomes:
      # 'the ABC-001'

      # R prefix using Python, do not consider the problem of escaping
      s = r'ABC-001 '# Python string
      # corresponding to regular expression string constant:
      #' the ABC-001 '

      Re Import
      # determines whether the regular expression matching
      re.match (r '^ \ d { 3} - \ d {3,8} $', '010-12345')

      # Match () method to determine whether a match if the match is successful, returns a Match object, otherwise None. The common method is to determine:

      if re.match(r'^\d{3}-\d{3,8}$','010-12345678'):
      print('ok')
      else:
      print('no')

      Slicing string #
      # removal with regular expression matching string all spaces
      relist = re.split (R & lt '\ S +', 'ab & 12 is CD. 3')
      Print (relist)
      # regular expression matching with removal of all spaces, ,,;
      reList1 = re.split (R & lt '[\ S,;] +', '. 1, 56 is. 3; 34 is')
      Print (reList1)

      Packet #
      # In addition to simply determining matches outside, then the regular expression substring extraction power. With () is the packet to be extracted (Group) represented by
      m = re.match (r '^ ( \ d {3}) - (\ d {3,8}) $', '012-356789') # ( 012) is a set (356,789) is a set of
      print (m.group (0)) # original string
      Print (m.group (. 1)) # 012
      Print (m.group (2)) # 356789

      # Greedy match: regular match default is greedy matching, that is, matching as many characters.
      Since # \ d + greedy match, directly behind all matched 0, 0 The results can match the empty string.
      Print (re.match (r '^ (\ d +) (0
      ) $', '102300'). Groups ())
      # Let \ d + non-greedy match (that is, less match as possible) to the back of the 0 match it, plus a? can let \ d + non-greedy matching
      print (re.match (r '^ ( \ d +?) (0 *) $', '102300'). groups ())

      # Try to verify Email addresses to write a regular expression. Version can be verified that a similar In Email:
      Import Re
      DEF is_valid_email (addr):
      # [\ W.] + Match 0-9a-zA-Z match at least one character match .com .com..
      Return re.match (R & lt '[\ w.] + @ \ w + .com', addr)

      # 测试:
      assert is_valid_email('[email protected]')
      assert is_valid_email('[email protected]')
      assert not is_valid_email('bob#example.com')
      assert not is_valid_email('[email protected]')
      print('ok')

      # Version two Email addresses can be extracted with the name:
      Import Re
      DEF name_of_email (addr):
      '. # ?' Represents the shortest match any string length, the length may be zero. SUMMARY parentheses on the right match any string of letters and spaces, names meet the specifications, and the method returns extraction group.
      re.match return ( '. R & lt
      ? ([\ W \ S] +)', addr) .group (. 1)
      # Test:
      Assert name_of_email ( ' [email protected]') == 'Tom Paris'
      assert name_of_email('[email protected]') == 'tom'
      print('ok')
      ```

Guess you like

Origin www.cnblogs.com/thloveyl/p/11484962.html