Python_re module

A, re Introduction

  A regular expression is a kind of a logical equation character string (including ordinary characters, non-printable characters, a universal character (referred to as "atoms"), special characters (referred to as "meta character")) operation, it is to use pre-defined Some good combination of specific characters, and these particular character, form a "string rule", this "rule string" is used to express a filtering logic of the string.

  Regular expression is a text mode, which is described in one or more strings to match when searching for text.

Second, the "atomic" explains

  ### ordinary characters as atoms

    Ordinary character refers to ordinary characters, such as AZaz , 0-9.   

 1  #导入re模块
 2 import re
 3 string = "abcd123456ABC"
 4 pat = "abc"
 5 ret = re.search(pat, string)   
 6 print(ret)
 7 <re.Match object; span=(0, 3), match='abc'>

 

   ### non-printing characters as atoms

    It refers to non-printing characters in a computer some characters are indeed exist, but they can not be displayed or printed out.

    For example: 1, ASCII code table , for example, ASCII code value of 0-31 as control characters can not be displayed and printed

       2, / t / n and some escape character

1 #导入re模块
2 import re
3 string = '''abcd12
4 3456ABC'''
5 pat = "\n"
6 ret = re.search(pat, string)
7 print(ret)
8 <re.Match object; span=(6, 7), match='\n'>

   ### as a universal character atoms

. 1  '' ' 
2      \ W matches with any letter, number, underscore, similar, but not equivalent to the "[A-Za-z0-9_] ", where "word" characters from the Unicode character set.
. 3      \ W is other matches with any letter, number, underscore, equivalent to "[^ A-Za-z0-9_
 ]". . 4      \ D matches with a number, equivalent to [0-9].
. 5      \ D matches with other figures, equivalent to [^ 0-9].
6      \ S matches any non-visible characters, including spaces, tabs, page breaks, and so on. Is equivalent to [\ f \ n \ r \
 t \ v]. 7      \ S matches any visible characters. Is equivalent to [^ \ f \ n \ r
 \ t \ v]. . 8  
. 9  '' '

  Table ### atoms ==> [any character]

1   # Import re module 
2  Import re
 . 3 String = '' ' abcd123456ABC ' '' 
. 4 PAT = " ABC [ABCDE] " # in [] characters selected in any of a matching value exists, if not None is returned 
. 5 RET1 = the re.search (PAT, String)
 . 6 PAT = " ABC [^ ABC] " 
. 7 RET2 = the re.search (PAT, String) 
 . 8  Print (RET1)
 . 9  Print (RET2)
 10 <re.match Object; span = ( 0,. 4), match = ' ABCD ' >
 . 11 <re.match Object; span = (0,. 4),match='abcd'>
12 [Finished in 0.1s]

 Third, metacharacters

  Special characters

1 ^     # match the input word of the line. 
2 $     # match the input end of the word line. 
3 \     # escape character 
. 4 {n}   # n times 
. 5 {n,} # least n times 
. 6 {n, m} # least n times, n times of at most 
7 .     # In addition to matching \ n any character of 
8 *     # Match previous character 0 / multiple 
9 +     # before the match a character 1 / multiple 
10 ?   # Match a character 0/1 
11 |     # or. Match | expression about any one match left to right, if | not included in (), then it is the whole range of regular expression

 

Guess you like

Origin www.cnblogs.com/helloboke/p/11482175.html