python re module

the re.findall () to locate the character

to find the matching mode in the sequence of characters from a string: findAll (pattern (regular expression), the target string), the return value of type list, the list element is matched each character string
, such as:

Re Import

A = "Java | python12988"    
B = the re.findall ( "Java", A)
C = the re.findall ( "\ D", A)
Print (B, C)

results

[ 'java'] [ '1, '2', '. 9', '. 8', '. 8']


findAll () function has the pattern matches the third parameter, using the plurality of patterns may be used '|' separate each mode, and can be activated for all modes
the re.I :

Import Re

Language = "# JavaPHP PythonC"
R & lt = the re.findall ( 'c #', Language)
Print (R & lt)



results

[]



case sensitive By default, the 'c #' does not match the original string 'C #'
edit third parameter is re.

I, so that capitalization is not matched Effect Import Re

Language = "# JavaPHP PythonC"
R & lt = the re.findall ( 'C #', Language, re.I)
print (r)



results

[ 'C #']


as re.S :

Import Re

Language = "PythonC # \ nJavaPHP"
R & lt = the re.findall ( '# {C}. 1.', Language, re.I) # "." matches all characters except a newline
print (r)



results

[]



plus mode re.S, may be changed. "" behavior, which is matched newline

Import Re

Language = "PythonC # \ nJavaPHP"
R & lt = re.findall ( 'c # {1} .', language, re.I | re.S) "." # matches all characters except newline
print (r)



results

[ 'C # \ n-']


the re.sub () regular replacement, replacement character

the re.sub (character to be replaced, the replacement characters, the target string, the number of replacement), the return value of the replacement string
as:

Import Re

Language = 'PythonC PHPC # # # JavaC '
R & lt = the re.sub (' C # ',' the GO ',Language)
Print (R & lt)



results

PythonGOJavaGOPHPGO



In the embodiment, the number of replacing the default is 0, the number of parameters is added alternatively
as:

Import Re

Language = 'PythonC PHPC # # # JavaC'
R & lt = the re.sub ( 'C #', 'the GO', Language, COUNT =. 1)
print (r)

 

results

PythonGOJavaC # PHPC #

result of only the C # to the replacement of a
built-in function and effect replace the python () as

the re.sub () the second parameter may be passed a function
such as:

Import Re

DEF Convert (value):
    Print (value)

Language = 'PythonC PHPC # # # JavaC'
R & lt = the re.sub ( 'C #', Convert, Language)
Print (R & lt)


result is printed out of three match object, span represents the position of the matched characters

<_sre.SRE_Match Object; span = (. 6,. 8), match = 'C #'>
<Object _sre.SRE_Match; = span (12 is, 14), match = 'C #'>
<_sre Object .SRE_Match; span = (. 17,. 19), match = 'C #'>
PythonJavaPHP


DESCRIPTION 'C #' is matched to three times, every time call convert function, since this function does not return a value, so the returned string 'C #' is removed
rewritten on the embodiment, since the value is object, so you can call Group value of () method to match the 'C #'

Import Re

DEF Convert (value):
    matched = value.group ()
    return '!!' matched + + '!!'

Language = 'PythonC PHPC # # # JavaC'
R & lt Re = .sub ( 'C #', Convert For, Language)
Print (r)


results 'C #' was successfully replaced with 'C !! # !!':

Python !! C !! # !! the Java PHP C # !! # !! C !!


Since the re.sub () is passed a second parameter function, it is possible to perform complex operations on strings.
for example:
the string 'ABC24525DEF22698' all numbers less than or equal to 4 is replaced with 0, of greater than 4 digital replaced 9, you can edit the convert function to achieve this function

Import Re

DEF convert (value):
    Matched = value.group ()
    IF int (Matched) <=. 4:
        return '0'
    the else:
        return '9'

= Language 'ABC24525DEF22698'
R & lt = the re.sub ( '\ D', Convert, Language)
Print (R & lt)



Print results:

ABC00909DEF00999


a function as a parameter programming function is a function other
re.match () and re. Search ()

re.match () from a starting position of the matching string, if the starting position does not comply with the regular expression, returning null
the re.search () searching the entire string, returns the first matching result of

two functions If the results can be matched, the result returned is also an object match
method match objects in addition to group () than there is span () method, you can return to the position of matching results

and re.findall () method is to put all the match result is returned

1. match () method, the head starts from a string matching

Import Re

Content = 'One of The My Phone Number 123456 IS.'
Print (len (Content)) # string length
result = re.match (r '^ The \ s \ d + \ s \ w * ', content) # matches using match, the first parameter is a regular expression, as the second string to match
Print (Result)
Print (result.group ()) output matching # SUMMARY
position index print (result.span ()) # output matching content

results:

34 is
<_sre.SRE_Match Object; span = (0, 13 is), match = '123456 of the iS'>
of the iS 123456
(0, 13 is)


2 matching target

Import Re

Content = 'One of The My Phone Number 123456 IS.'
Print (len (Content)) # string length
result = re.match (r '^ The \ s (\ d +) \ sis', content) # matches using match, the first parameter is a regular expression, as the second string to match
Print (Result)
Print (result.group ()) matches the output #
print (result.group (1)) # is a first output () contents of the package
print (result.span ()) # matches the output of the position index


results:

34 is
<_sre.SRE_Match Object; span = (0, 13 is ), match = '123456 of the iS'>
of the iS 123456
123456
(0, 13 is)


in a regular expression with () can use the enclosed group () output, if n (n), the Group may be represented as (n ), the output of the n-th match in brackets.
3. generic matches

Import Re

content = 'one of the My Phone Number 123456 iS.'
Result = re.match (R & lt 'of the ^. * Number. $', content) using # match match, the first parameter is a regular expression, as the second string to match
Print (Result)
Print (result.group ()) # matches the output
print (result.span ()) # output matching content location index


results:

<Object _sre.SRE_Match; span = (0, 34 is), match = 'One of The My Phone Number 123456 IS.'>
of The IS 123456 One My Phone Number.
(0, 34 is)


Which represents matches any character preceding character matches * indicates unlimited.
4. Non-greedy greed and

Import Re

Content = 'One of The My Phone Number 123456 IS.'
Print ( 'greedy match:')
Result = re.match (R & lt '^ the. * (\ d +). *', content) # matches using match, the first parameter is a regular expression, to match the second string to
print (result.group ()) # output matches
print ( 'result =% s' % result.group (1)) # is a first output () contents of the package
Print ( '-' * 20 is)
Print ( 'non-greedy match:')
Result = re.match ( R & lt 'of The * ^ (\ + D) *.?.', Content)
Print (result.group ())
Print ( '% S = result' result.group% (. 1))


results:

greedy match:
of The IS My 123456 . One Phone Number
Result. 6 =
--------------------
non-greedy match:
. One of The My Phone Number 123456 IS
Result = 123456


5. The modifier re.S

import re

= Content '' 'of The IS 123456
. One of My Phone
' ''
Result = re.match ( '.?.?. of The * ^ (\ + D) * Phone', Content, re.S)
IF Result:
    Print ( result.group (. 1))
the else:
    Print ( 'Result = None')
result2 re.match = ( '* of The ^ (\ + D.?) * Phone.?.', Content)
IF result2:
    Print (result2.group (. 1))
the else:
    Print ( 'result2 = None')


results:

123456
result2 = None


. Since the parameter plus re.S, wildcard match line breaks, so the result is not empty, an empty result2 re. .S, there are many such modifiers, re.I: ignore case when using the matching.
6. match escape

Import Re

Content = '(Baidu) www.baidu.com'
Result = re.match ( '(Baidu) www.baidu.com ', Content)
result2 re.match = (' \ (Baidu \) www \ .baidu \.com', content)
if result:
    Print (result.group ())
the else:
    Print ( 'Result = None')
IF result2:
    Print (result2.group ())
the else:
    Print ( 'result2 = None')


results:

Result = None
(Baidu) www.baidu .com


due () is a special character of the regular expression, so the need to match (), the need to add escape characters ''.
7.search () method, with the match () different methods, need not begin from the head matching

Import Re

Content = '123456 at The IS My Other Phone Number The One.'
the result = re.search ( 'at The * (\ d +) * Number The.?.?.', Content)
Print (result.group ())


results:

My One Phone 123456 iS at the Number the.

8.findall () method, match () and search () returns the first content are matched on the end of the match, findall () returns all content is in line with the rules of the match

Import Re

HTML = '' '
<div ID = "Songs-List">
<h2 class="title">歌单</h2>
<p class="introduction">歌单列表</p>
<ul id="list" class="list-group">
<li data-view="2">一路上有你</li>
<li data-view="7">
<a href="/2.mp3" singer="任贤齐">沧海一声笑</a>
</li>
<li data-view="4" class="active">
<a href="/3.mp3" singer="齐秦">往事随风</a>
</li>
<li data-view="6"><a href="/4.mp3" singer="beyond">光辉岁月</a></li>
<li data-view="5"><a href="/5.mp3" singer="程慧玲">记事本</a></li>
<li data-veiw="5">
<a href="/6.mp3" singer="邓丽君">但愿人长久</a>
</li>
</ul>
</div>
'''

result = re.findall('<li.*?href="(.*?)".*?singer="(.*?)">(.*?)</a>', html, re.S)
if result:
    print(result)
    for res in result:
        Print (RES [0], RES [. 1], RES [2])



[( '/2.mp3', 'Richie', '沧海一声笑'), ( '/3.mp3', ' Chin', 'past the wind'), ( '/4.mp3', 'beyond', ' Guanghuisuiyue'), ( '/5.mp3', ' Cheng Huiling', 'notebook'), ( '/6.mp3 ',' Teresa ',' Nung ')]
/2.mp3 Richie沧海一声笑
/3.mp3 Chin past the wind
/4.mp3 beyond Guanghuisuiyue
/5.mp3 Cheng Huiling notepad
/6.mp3 Teresa Nung


9.sub () method, removing the character matching

the second parameter is the two ', it represents' \ d + \ replaced with a null matching content, if the write sub ( '\ d +', '-' ), put the contents of the matching replaced -.

Re Import

Content = '54abc59de335f7778888g'
Content = the re.sub ( '\ + D', '', Content)
Print (Content)

Results:

ABCDEFG

    . 1

10.compile ()

Import Re

content1 = '2016-1-1 12:01'



= the re.compile pattern ( '\ {2} D: \ {2} D')
RESULT1 the re.sub = (pattern, '', content1)
result2 the re.sub = (pattern, '', content2)
result3 = Re. Sub (pattern, '', content3)
Print (RESULT1, result2, result3)


results:

2016-1-1 2017-1-1 2018-1-1

in the need to match the same regex case, a previously defined can be simplified compile code amount, but may also be used compile rS modifiers and the like.

Guess you like

Origin www.cnblogs.com/lisa2016/p/11246191.html