The crawler parses Re (f) --- Re Module

Regular Expressions

Regular expression is actually a special string to help retrieve, check, inquiries and other acts, is a logical formula for string operations,

Some pre-defined combination of specific characters, and these particular character, composed of a "regular character", the "rules of character" A filtering logic to express characters.

Python provides a mechanism for regular expressions: the need to import the module re

Regular expressions usage scenarios

Verification role: password authentication to verify the user name of the mailbox phone number, etc.

Reptile: Query check

Regular expression rules

Powerful regular expression processing tool string, have their own unique syntax, and has a separate deal with the regular expression engine.

Regular expression processing efficiency than the low strings provided by the system itself, if the system can be completed, you do not have regular expressions

Alternatively ---- lowercase letters in the string is replaced b --- B system can be used directly to complete the system

Alternatively ---- B will replace the first to a second alternate system can not be completed --- m using regular expressions

In the conventional method of re module

Re 1.. The compile ( regular expression syntax )

  The regular expression syntax corresponding to the generated object is a regular expression, the ability to easily reuse regex

2. regular expression object .match ( character string to be authenticated )

  If the regular expression syntax is no limit head and tail, to verify whether the string to a regular expression corresponding to the beginning of the string

  If you limit the head and tail, which is limiting the length of the string, the string verify whether the content to meet the needs of regular expressions

  If the match does not satisfy the returned object to satisfy returns None

3. regular expression object .search ( string be searched )

  Find the string to be searched whether there is a regular expression Match object corresponding string contents if there is, then get to find and index the contents of the first section to find Returns can not find the return None

4. The regular expression object .findall ( string be searched )

  Find the string to be searched whether there is a regular expression string contents corresponding to the content to meet the needs of all the strings stored in the list

The regular expression object .sub ( regex, replace string, the original string )

  Returns string replacement after replacing substrings each matched string to be searched in the

Regular rules

. 1  \ W matching alphanumeric and underscores
 2  \ W is matched non-alphanumeric underlined
 . 3  \ S matches any whitespace character, equivalent to [\ T \ n-\ R & lt \ F]
 . 4  \ S matches any non-null character
 . 5  \ D matches any digital
 . 6  \ D matches any non-numeric
 . 7  \ a matches the beginning of the string
 . 8  \ matches the Z end of the string, if present wrap, only the front end of the string matching wrap
 . 9  \ Z matches the end of the string
 10  \ G match the last complete match position
 . 11  \ n-match a newline
 12 is  \ T matches a tab
 13 ^        beginning matched string
 14  at the end of the string matching $
 15  . matches any character except when the line breaks, re.DOTALL flag is specified, it is matches any character including newline
 16  [....] is used to represent a set of characters, listed separately: [amk] matches a, m or K
 . 17[^ ...] is not [] characters: [^ ABC] In addition to matching a, b, c character
 18 *        matches zero or more of the expressions
 19 +        match one or more of the expressions
 20  ? match 0 or a fragment from the preceding regular expression definition, non-greedy embodiment
 21 is  {n} exactly match the first n represents
 22 is  {m, m} matches n to m times by the preceding regular expression definition segment , greedy
 23 is a | B matches a or B
 24 () matches the expression in parentheses, also represents a group

re.match()

re.match(pattern,string,flags=0)

Try to match a pattern from a starting position of the string, if not matched, then the starting position, match () returns None

Import Re 
Content = " Hello World_This IS A 4567 123 REGEX Demo " 

Result = re.match ( ' ^ Hello \ S \ D \ D \ D \ S \. 4 {D} \ S \ 10 {W}. $ * Demo ' , Content) 
Result = re.match ( " ^ Hello. * $ Demo " , Content)                # Pan match, writing simple than the above 
Result = re.match ( ' ^ Hello \ S (\ + D) \ sWorld. * $ Demo ' , Content) # match specific target string, using () enclosed 


Print (result)
 Print (result.group ())    # acquired match results 
Print (result.span ())     # length of the string is eligible to match range

 


. 1  Import Re
 2  
. 3 String = '' ' the If you have have Great Talents, Industry Will Improve Them; 
 . 4  IF you have have But. Moderate Abilities,
 . 5  Industry Will Supply Their deficiency. ' '' 
. 6  
. 7  # . 1) metacharacter 
8  # ordinary character , letters, underline, and other digital ascii code characters 
. 9 PAT R & lt = ' A ' 
10  # nonprinting characters 
. 11 PAT R & lt = ' \ n- ' 
12 is RET = the re.findall (= PAT pattern, String = String)
 13 is  
14  # 2) wildcard 
15  #y with certain special characters to represent a class of string 
16  '' ' 
. 17  \ W arbitrary letters, numbers, underscores
 18 is  \ W is any non-alphanumeric underscores
 . 19  \ D arbitrary number
 20 is  \ D
 21 is  \ S blank
 22 is  \ S
 23 is  [abc] matches a, b or C
 24  [fA-a-P1-5] matches any af AP or 1-5 or a
 25  [^ abc] any of a non-string abc
 26  ' '' 
27 PAT = R & lt ' [^ ABC] ' 
28 PAT = R & lt ' [^ AF] ' 
29 PAT R & lt = ' \ W ' 
30 RET = the re.findall (= PAT pattern, String = String)
 31 is  Print (RET)
32  # special character 
33  '' ' 
34  . Any visible character
 35  ^ from the beginning of the string matching
 36  after what strings ending $
 37  + repeated a plurality of times to
 38  * 0 is repeated a plurality of times to
 39  ? Repeat 0 or 1
 40  {m} repeat every {, m} repeated m times at most m {, {m},} n-
 41 is  '' ' 
42 is PAT = R & lt ' ^ the If. + \ N-. + \ N-. + ' 
43 is  # PAT = R & lt' the If ^. + $ ' 
44 is RET = the re.findall (= PAT pattern, String = String)
 45  Print (RET)
 46 is  
47  # . 3) the correction mode 
48  # if you want to enter the correction mode need complile the regular expression to create a regular target 
49 # Re.S the character string as a plurality of rows row 
50  # re.M string into multiple lines to a single line to a plurality of process 
51 is  # re.I ignore case 
52 is PAT = the re.compile (R & lt ' ^ the If. + ' , re.S)
 53 is RET = pat.findall (String)
 54 is  Print (RET)
 55  
56 is  # . 4) greedy and lazy pattern 
57 is String = " afadfasadfafapyasdfadsfapyafadpypyafasdfapyasfasdfdaspyafafdaspyrtyui " 
58 PAT = the re.compile (R & lt " . * Py " ) # greedy mode: Looking for a last accordance with the rules has been found so far in line with the rules of the string 
59 PAT = re.compile (r " * Py.?" ) # Lazy mode: Searches for strings just find conform to the rules in accordance with the rules immediately stop 
60 RET = pat.findall (String)
 61  Print (RET)

 

Guess you like

Origin www.cnblogs.com/TMMM/p/10815652.html