A, re Introduction
A regular expression is a kind of a logical equation character string (including ordinary characters, non-printable characters, a universal character (referred to as "atoms"), special characters (referred to as "meta character")) operation, it is to use pre-defined Some good combination of specific characters, and these particular character, form a "string rule", this "rule string" is used to express a filtering logic of the string.
Regular expression is a text mode, which is described in one or more strings to match when searching for text.
Second, the "atomic" explains
### ordinary characters as atoms
Ordinary character refers to ordinary characters, such as AZ , az , 0-9.
1 #导入re模块 2 import re 3 string = "abcd123456ABC" 4 pat = "abc" 5 ret = re.search(pat, string) 6 print(ret) 7 <re.Match object; span=(0, 3), match='abc'>
### non-printing characters as atoms
It refers to non-printing characters in a computer some characters are indeed exist, but they can not be displayed or printed out.
For example: 1, ASCII code table , for example, ASCII code value of 0-31 as control characters can not be displayed and printed
2, / t / n and some escape character
1 #导入re模块 2 import re 3 string = '''abcd12 4 3456ABC''' 5 pat = "\n" 6 ret = re.search(pat, string) 7 print(ret) 8 <re.Match object; span=(6, 7), match='\n'>
### as a universal character atoms
. 1 '' ' 2 \ W matches with any letter, number, underscore, similar, but not equivalent to the "[A-Za-z0-9_] ", where "word" characters from the Unicode character set. . 3 \ W is other matches with any letter, number, underscore, equivalent to "[^ A-Za-z0-9_ ]". . 4 \ D matches with a number, equivalent to [0-9]. . 5 \ D matches with other figures, equivalent to [^ 0-9]. 6 \ S matches any non-visible characters, including spaces, tabs, page breaks, and so on. Is equivalent to [\ f \ n \ r \ t \ v]. 7 \ S matches any visible characters. Is equivalent to [^ \ f \ n \ r \ t \ v]. . 8 . 9 '' '
Table ### atoms ==> [any character]
1 # Import re module 2 Import re . 3 String = '' ' abcd123456ABC ' '' . 4 PAT = " ABC [ABCDE] " # in [] characters selected in any of a matching value exists, if not None is returned . 5 RET1 = the re.search (PAT, String) . 6 PAT = " ABC [^ ABC] " . 7 RET2 = the re.search (PAT, String) . 8 Print (RET1) . 9 Print (RET2) 10 <re.match Object; span = ( 0,. 4), match = ' ABCD ' > . 11 <re.match Object; span = (0,. 4),match='abcd'> 12 [Finished in 0.1s]
Third, metacharacters
Special characters
1 ^ # match the input word of the line. 2 $ # match the input end of the word line. 3 \ # escape character . 4 {n} # n times . 5 {n,} # least n times . 6 {n, m} # least n times, n times of at most 7 . # In addition to matching \ n any character of 8 * # Match previous character 0 / multiple 9 + # before the match a character 1 / multiple 10 ? # Match a character 0/1 11 | # or. Match | expression about any one match left to right, if | not included in (), then it is the whole range of regular expression