[Learning in Python] Regular expressions

Regular expressions (or REs) are small, highly specialized programming languages ​​that are embedded in Python and implemented through the re module. The regular expression pattern is compiled into a series of bytecodes and then executed by the matching engine written in C.

1. Ordinary characters

Second, the meta character

1. Metacharacters. ^ $ * +? {}

Copy code
1 # Introduce regular: enter fuzzy matching 
 2 import re 
 3 
 4 # '.' The default matches any character except \ n (newline character), if you specify flags = re.DOTALL, then match any character, including newline 
 5 res = re.findall ('W..l', 'Hello World !!') # ['Worl'] 
 6 ret = re.findall ('W..l', 'Hello W \ nrld !!', flags = re.DOTALL) # ['W \ nrl'] 
 7 
 8 # '^' matches the beginning of the character, if you specify flags = re.MULTILINE, this can also match (r "^ a", "\ nabc \ neee", flags = re.MULTILINE) 
 9 res = re.findall ('^ h ... o', 'hjaookhello') # ['hjaoo'] 
10 ret = re.findall (r '^ a', '\ nabc \ neee ', flags = re.MULTILINE) # [' a '] 
11 
12 #' $ 'matches the end of the character, if you specify flags = re.MULTILINE, this can also match ("foo $", "bfoo \ nsdfsf", flags = re.MULTILINE) 
13 res = re.findall ('a..x $', 'aaaalexauex') # ['auex'] 
14 ret = re.findall('foo$','bfoo\nsdfsf',flags=re.MULTILINE)  # ['foo']
15 
16 # '*' matches the character before the * number 0 or more times
17 res = re.findall ("ab *", "cabb3abcbbac") # ['abb', 'ab', 'a'] 
18 
19 # '+' matches the previous character 1 or more times 
20 res = re. findall ("ab +", "cabb3abcbbac") # ['abb', 'ab'] 
21 
22 # '?' matches the previous character 1 or 0 times 
23 res = re.findall ("ab?", "cabb3abcbbac" ) # ['ab', 'ab', 'a'] 
24 
25 # '{m}' matches the previous character m times, {m, n} represents the range of matching times 
26 res = re.findall ("a {1,3} b "," caaabb3abcbbaabc ") # ['aaab', 'ab', 'aab'] 
27 res = re.findall (" a {1,3} b "," aaaab ") # [' aaab '] 
28 
29 #Conclusion : * is equal to {0, + ∞}, + is equal to {0, + ∞},? is equal to {0,1}, *, +,? are recommended
Copy code

Note: The preceding '*' and '+' are greedy matches (as many matches as possible), followed by '?' To make them lazy matches (as few matches as possible)

1 import re
2 res = re.findall("ab+","abbbbb")   # ['abbbbb']
3 res = re.findall("ab*","abbbbb")   # ['abbbbb']
4 res = re.findall("ab+?","abbbbb")  # ['ab']
5 res = re.findall("ab*?","abbbbb")  # ['a']

2. The character set of metacharacters []

Copy code
1 # [] There are multiple characters representing the relationship of yes or 
 2 res = re.findall ('c [on] m', 'comaxcnm') # ['com', 'cnm'] 
 3 res = re.findall ( '[az]', 'comaxcn') # ['c', 'o', 'm', 'a', 'x', 'c', 'n'] 
 4 
 5 # Metacharacters are placed in [] , Cancel the special function of metacharacters (\, ^,-exception) 
 6 res = re.findall ('[w * +, $]', 'co, ma + wc $ n *') # [',', ' + ',' w ',' $ ',' * '] 
 7 
 8 # ^ Put in [], which means inverse 
 9 res = re.findall (' [^ t] ',' atxmetu ') # [' a ',' x ',' m ',' e ',' u '] 
10 res = re.findall (' [^ tx] ',' atxmetu ') # [' a ',' m ',' e ' , 'u'] 
11 
12 # 
-Put in [], it means the range 13 res = re.findall ('[1-9a-z] ',' 13mawcCB ') # [' 1 ',' 3 ',' m ',' a ',' w ',' c '] 
14 res = re.findall (' [1- 9a-zA-Z] ',' 13mawcCB ') # [' 1 ',' 3 ',' m ',' a ',' w ',' c ',' C ',' B '] 
15 
16 # reverse Slash followed by ordinary characters for special functions
17 res = re.findall('[\w\d]','13mawcCB') # ['1','3','m','a','w','c','C','B']
Copy code

3. Translation characters of metacharacters

Copy code
1 # \ d matches any decimal digit, equivalent to [0-9] 
 2 # \ D matches any non-numeric character, equivalent to [^ 0-9] 
 3 # \ s matches any blank character, equivalent to [\ t \ n \ r \ f \ v] 
 4 # \ S matches any non-blank character, equivalent to [^ \ t \ n \ r \ f \ v] 
 5 # \ w matches any alphanumeric character, equivalent to [a-zA-Z0 -9] 
 6 # \ W matches any non-alphanumeric character, equivalent to [^ a-zA-Z0-9] 
 7 # \ b matches the boundary of a special character 
 8 
 9 # '\' backslash followed by ordinary characters Special function 
10 print (re.findall ('\ d {5}', 'ae12345cw67890')) # ['12345', '67890'] 
11 print (re.findall ('\ sasd', 'fak asd')) # ['asd'] 
12 print (re.findall ('\ w', 'fak asd')) # ['f', 'a', 'k', 'a', 's', 'd'] 
13 print (re.findall (r'I \ b ',' I am a LI $ T ')) # [' I ',' I '] 
14 
15 # Backslash followed by metacharacter removal special function 
16 print (re.findall ('a \.', 'A.jk')) # ['a.'] 
17 print (re.findall ('a \ + ',' a + jk ')) # [' a + ']
Copy code

Let's look at the matching of backslash "\" as follows:

Copy code
1 # Match backslash \ 
2 print (re.findall ('c \ l', 'abc \ le')) # 
Report error 3 print (re.findall ('c \\ l', 'abc \ le')) # 
Report error 4 print (re.findall ('c \\\\ l', 'abc \ le')) # ['c \\ l'] 
5 print (re.findall (r'c \\ l ',' abc \ le ')) # [' c \\ l '] 
6 
7 # Because \ b is meaningful in the ASCII table, add r 
8 print (re.findall (' \ bblow ',' blow ')) # [], Not matching 
9 print (re.findall (r' \ bblow ',' blow ')) # [' blow ']
Copy code

4. Grouping of metacharacters ()

Copy code
1 # () grouping, the characters in brackets as a whole 
 2 print (re.findall ('(as)', 'jdkasas')) # [' as', 'as'] 
 3 
 4 res = re.search (' (? P <id> \ d {3}) / (? P <name> \ w {3}) ',' weeew34ttt123 / ooo ') 
 5 print (res.group ()) # 123 / ooo 
 6 print (res .group ('id')) # 123 
 7 print (res.group ('name')) # ooo 
 8 
 9 # findall 
10 res = re.findall ('www. (\ w +). com', 'www.baidu .com ') 
11 print (res) # [' baidu '], there are groups to only take out the contents of the group 
12 ret = re.findall (' www. (?: \ w +). com ',' www.baidu.com ') 
13 print (ret) # [' www.baidu.com '], plus?: Cancel group permission 
14 
15 # search 
16 res = re.search (' www. (\ W +). Com ',' www. baidu.com ') 
17 print (res.group ()) # www.baidu.com, different from findall
Copy code

5. The pipe character of metacharacters |

1 # | Match | left or | right character 
2 print (re.findall ('(ab) | \ d', 'rabhdg8sd')) # ['ab', '') 
3 print (re.search ('( ab) | \ d ',' rabhdg8sd '). group ()) # ab

6. Common methods under the re module

Copy code
1 # Regular expression method 
 2 # re.findall () # All results are returned to a list 
 3 # re.search () # Return the first matched object (object), the object can call the group method to Take the return result 
 4 # re.match () # Only match at the beginning of the string, and only return an object. The object can call the group method to get the return result 
 5 # re.split () # The matched characters are used as a list separator符
 6 # re.sub () # Match characters and replace 
 7 # re.subn () # The effect is the same as sub, but at the same time it will return how many times it is replaced 
 8 # re.compile () # Compile the matching rules into an object For later use 
 9 # re.finditer () # returns an iterator 
10 
11 # findall: all results are returned to a list 
12 print (re.findall ('\ d', '12345')) # ['1', '2', '3', '4', '5'] 
13 
14 # search: match the first result that meets the conditions 
15 res = re.search ('sb', 'adssbeeesb' ) 
16 print (res) # <_sre.SRE_Match object; span = (3,5), match = 'sb'>
17 print(res.group())# sb
18 
19 # match: same as search, but match at the beginning of the string 
20 res = re.match ('sb', 'sbaee') 
21 print (res) # <_sre.SRE_Match object; span = (0,2) , match = 'sb'>, if it is not matched, it returns None 
22 print (res.group ()) # sb 
23 
24 # split: the matched character is used as a list separator 
25 res = re.split ('k', ' djksal ') 
26 print (res) # [' dj ',' sal '] 
27 res = re.split (' [j, s] ',' dsejksal ') 
28 print (res) # [' d ',' e ',' k ',' al '] 
29 res = re.split (' [j, s] ',' sejksal ') 
30 print (res) # [' ',' e ',' k ',' al ' ] 
31 
32 # sub: Match characters and replace 
33 res = re.sub ('a..x', 's..b', 'eealexbb') 
34 print (res) # ees ..bbb 
35 res = re.sub ('ab', '123', 'ablexbab', 1) # The last parameter is to replace several times 
36 print (res) # 123lexbab 
37
38 # subn: The effect is the same as sub, it returns a tuple, in addition to the returned result, and how many times it has been replaced 
39 res = re.subn ('a..x', 's..b', 'eealexbb ') 
40 print (res) # (' ees..bbb ', 1) 
41 res = re.subn (' ab ',' 123 ',' ablexbab ') # The last parameter is to replace several times 
42 print (res) # ('123lexb123', 2) 
43 
44 # compile: compile the matching rule into an object for later use 
45 obj = re.compile ('\. Com') # compile the matching rule into an object 
46 res = obj .findall ('fajs.comeee') 
47 ret = obj.findall ('aa.comss.com') 
48 print (res) # ['.com'] 
49 print (ret) # ['.com', '. com '] 
50 
51 # finditer: It returns an iterator 
52 res = re.finditer (' \ d ',' 12345 ') 
53 print (res) # <callable_iterator object at 0x000001E98FE4D7B8>
54 for i in res:
55     print(i.group())
56 # 1
57 # 2
58 # 3
59 # 4
60 # 5

Guess you like

Origin www.cnblogs.com/gtea/p/12715270.html