1. All metacharacters
\ D which matches all numbers \ W indicates a match numbers, letters, change the line \ S to match all blank \ T denotes a tab matching tab \ N represents a newline \ n Enter key \ D matches non-numeric representation \ W represents a non-matching numbers, letters, change the line \ S represents the matching non-blank In addition to \ n than all [] Matches the character set of character [ ^ ] Matches all characters except character set characters ^ matches the start of the string $ Matches the end of the string () Matches expressions within the brackets, but also represents a group A | b matches the character or character b
2. quantifier
? Repeated zero or one + repeated one or more times * repeated zero or more times {N} n times {N,} n times or more times {N, m} n times m times or
3. greedy match non-greedy match
# Greedy match - in the case of quantifier scope allowed as many matches -. * X which matches any character, any number of times, the last encounter came to a halt x # non-greedy match -. * X which matches any character? any number of times, but stopped the event x
4. Escape
- The character originally had special meaning, in order to express its own meaning when the need to escape
- there is something special meaning, in character group, will cancel its special significance
- [() * +.?] All contents in the character set will cancel its special significance
[ac] - indicates a range of characters in a group, if you do not want it represents the range, need to escape, or on the front \ rearmost character set
Modules commonly used method 5.re
- fildall (regular, to be matched string flag): Returns all occurrences
import re RET = the re.findall ( " D + " , " 19740ash93010uru " ) # returns all the matching result satisfies the condition, in the list Print (RET) # [ '19740', '93010']
- search: a variable return, by taking the group is a matching entry
= the re.search RET ( ' \ + D ' , ' 19740ash93010uru ' ) IF RET: Print (ret.group ()) # function looks for the string pattern matching, only the first to find a match and return a matching information the object may be
a method of string matching obtained by calling Group (), if no matching string, None is returned. # 19740
- finditer: returns an iterator taken by the iterator is a variable, the value of group (to save space)
Code Continued
- macth: from the beginning to find the first one, the same as the other and search
import re RET = re.match ( ' A ' , ' ABC ' ) .group () # same search, but in the beginning of the string do match Print (RET) # 'A' # When matching content input by the user, the user needs to enter the 11-digit phone number, cell phone number ^ $ match ( " phone number regular $ " , " 123eva456taibai " ) Search ( " ^ phone number regular $ " , " 123eva456taibai " )
- compile (regular): with a regular expression is used many times when compiled in advance to save time
Code Continued
- split: split content by regular expression matching
import re ret = re.split("\d+","eva3egon4yuan") print(ret) # ['eva', 'egon', 'yuan'] ret = re.split("(\d+)",,"eva3egon4yuan") print(ret) # ['eva', '3', 'egon', '4', 'yuan'
After matching section plus () cut out the results are different, not () does not match the item retained, but there are () but able to retain a matching entry, the need to retain some portion of the match the course is very important.
- sub: replace, replaced by the contents of the regular expression matching
import re ret = re.sub("\d+","H","aas123dfghj147") print(ret) # aasHdfghjH ret = re.sub("\d+","H",,"aas123dfghj147",1) # 替换一个 print(ret) # "aasHdfghj147
- subn Alternatively, on the basis of the sub, returns a tuple, the result is to replace the first content, the second number is replaced
import re ret = re.subs("\d+","H",'alex123wusir456') print(ret) # ('alexHwusirH', 2)