python regular part of learning _

1, the regular

  Regular Expression : a regular string match, is a match from start to finish

  Character groups: [] matches a character in the character set Xianchuyuanxing special characters, represents a non-^, [^ a] matches characters other than a,

  Yuan characters:

    \ D: Matches a digit \ D: other non-numeric characters match

    \ W: matching numbers, letters, underscores \ W: Other non-character matching numbers, letters, underlined

    \ S: matches a space, \ n, \ t \ S: non-space matching, \ n, \ t content of

    [\ S \ S] [\ d \ D] [\ w \ W]: matches all characters

    .: Matches all characters except newline

    ^: That begin with what, generally appear in the beginning of the regular expression

    $: Indicates to what end, generally appear at the end of the regular expression

    \ B: matching the boundary (the string before and after)

    \: Escape special characters match

    a | b: a matching or B, after finding a not move, only one result, when there are two rules matching the overlapping, long on the front (on the back of the content will not be matched to the length)

  quantifier:

    Nothing will have a start string character

    {N} represents one of the preceding regular expression match the number of times

    {N, m} indicates that a regular expression match at least n times, matching up m times, as many matching (matching greedy)

    {N,} matches at least n times as many matching (matching greedy)

    ? Match zero or one (match greedy) is used in the quantifier unset greedy hits (at least in the case of matches)

    * Matches zero or any number of times

    + Matches one or more times

  Grouping:

    () Overall constraint \ d (\. \ D +)? Integer or decimal match

    (? P <name>) to a packet represents a name

    (? P = name) using this packet, represents the content and content matching exactly the same packet, the packet numbers may also be used

  Escape sign:

    The python escaped: \, r

  

2, python re module

  match:

    findall: returns a list of all results

    search: the result is a regular result object, no None found

    match: only scratch match

  Cutting:

    split

  replace:

    sub: replace string corresponding to the operation

    subn: Returns the tuple, then the replacement string, integer times of an alternative

  Advanced:

    compile: precompiled save time (when used multiple times in one and the same regular expression will increase the efficiency)

    finditer: save memory space efficiency is generally used when large amounts of data, with the generator principle

  Special usage:

    When using findall will give priority to display the contents of the search results grouping use:? Ungroup priority

    split using () will cut away the contents stored in the list

    search: if there are packets, group () can get in the group matches

 

3, interview subject:

Big Data, statistics, machine learning, sklearn, high performance, high concurrency. 
</ the p-> </ div> 
"" " 
Import Re
 
with Open ( 'regular .txt', 'r') as f:
    = the re.compile RET ( "<P> | <div> | </ div> | </ P> | <br> | \ S") 
    Content = the re.sub (RET, '', reached, f.read ()) 
Print (Content) 


"" " 
the following URL extracted domain: " 

"" 

TE = 'http://www.interoem.com/messageinfo.asp?id=35, http://3995503.com/class/class09/ news_show.asp? the above mentioned id = 14, '\ 
     ' http://lib.wzmc.edu.cn/news/onews.asp?id=769, http://www.zy-ls.com/alfx.asp?newsid the above mentioned id = 6 & 377 =, '\ 
     ' http://www.fincm.com/newslist.asp?id=415 ' 

RET = re.compile ( "HTTP:? //.* /") 
RES = re.finditer (RET , TE) 
for I in RES: 
    Print (i.group ()) 

"" " 
is extracted as the word string: " 
"" 

test_str = "hello world ha ha"
print(re.split(' ', test_str))

Guess you like

Origin www.cnblogs.com/Laura-L/p/11268691.html