Chapter VII to find a phone number by a string, compare the difference between regular expressions whether to use the program, significantly more concise wording regular, easy to expand.
Mode: 3 digits, a dash, three numbers, a dash, and then four digits. For example: 415-555-4242
. 1 Import Re 2 '' ' . 3 do not find a regular pattern, matching three numbers, a dash, three numbers, a dash, 4 digits . 4 EX. 111-222-3334 . 5 ' '' . 6 . 7 DEF isPhoneNo (text): . 8 IF len (text) = 12 is! : . 9 return False 10 for I in Range (0,3 ): . 11 IF Not [I] .isdecimal text (): 12 is return False 13 is IF text [ . 3] =! ' - ' : 14 return False 15 for I in Range (4,7 ): 16 IF Not text [I] .isdecimal (): . 17 return False 18 is IF text [. 7] =! ' - ' : . 19 return False 20 is for I in Range (8,12 ) : 21 is IF Not text [I] .isdecimal (): 22 is return False 23 is return True 24 25 '' ' 26 is a regular expression matching the above pattern 27 ' '' 28 DEF regPhoneNo (text): 29 phoneNoReg=re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') 30 res=phoneNoReg.search(text) 31 if res != None: 32 print('phone No find by reg: '+ res.group()) 33 34 print(isPhoneNo('123-122-9090')) 35 print(isPhoneNo('1234123321')) 36 msg = 'call me at 415-443-1111 tomorrow. 415-443-2222 is my office' 37 for i in range(len(msg)): 38 tmp = msg[i:i+12] 39 if isPhoneNo(tmp): 40 print('phone No find: ' + tmp) 41 regPhoneNo(tmp) 42 print('msg find end')
Python regular expression default is "greedy", which means that under ambiguous circumstances, they will match the longest string possible.
Braces "non-greedy" version match the shortest possible string, followed by a question mark at the end of braces
Example:
'' ' Examples and non-greedy greedy Python matches '' ' DEF showGreedReg (): greedReg = the re.compile (R & lt ' (HA) {3,5} ' ) nonGreedReg = the re.compile (R & lt ' (HA) ? {3,5} ' ) InP = ' hahahahahah ' R1 = greedReg.search (InP) R2 = nonGreedReg.search (InP) Print ( ' Greed RES REG: ' + r1.group ()) Print ( ' nongreed REG RES : ' + r2.group ()) showGreedReg()
Chapter 7 project for the phone number and email regular extraction, the clipboard section omitted here.
1 import pyperclip, re 2 phoneReg=re.compile(r'''( 3 (\d{3}|\(\d{3}\))? #area code 4 (\s|-|\.)? #separator 5 (\d{3}) #first 3 digits 6 (\s|-|\.)? #separator 7 (\d{4}) #last 4 digits 8 (\s*(ext|x|ext.)\s*(\d{2,5}))? 9 )''', re.VERBOSE 10 ) 11 12 emailReg=re.compile(r'''( 13 [a-zA-Z0-9_-]+ #username 14 @ #@ 15 [a-zA-Z0-9_-]+ #domain name 16 (\.[a-zA-Z]{2,4}) 17 )''', re.VERBOSE 18 )
A phone number from "optional" area code beginning, the area code followed by a question mark packets.
Because the code may only three digits (i.e., \ d {3}), or three figures in parentheses (i.e., \ (\ d {3} \)), so that the two parts should be connected with the pipe symbol.
This part of multi-line strings can be combined with regular expression comment # Area code, to help you remember (\ d {3} | \ (\ d {3} \))? What is to be matched Yes.
Telephone numbers may be divided character spaces (\ S), a dash (-) (.) Or periods, these portions should be connected by piping.
The following regular expression is simple parts: 3 digits, followed by another delimiter, followed by four digits.
The last part is optional extension, including any number of spaces,
then ext, x or EXT., Then followed by 2-5 digits.
E-mail user name part address one or more characters, which may include: lower and upper case letters, numbers, dot, underscore, percent sign, plus or dashes.
All of these can be classified into a character: [a-zA-Z0-9 ._ % + -].
Splitting name and username @ symbol domain to allow less classified characters, only letters, numbers, and dashes periods: [a-zA-Z0-9.-] .
The last is the "dot-com" part (technically called "top-level domain"), it can actually be "dot-anything". It has 2-4 characters.
re.VERBOSE, ignoring the regular expression string whitespace and comments
At this point, the end of the contents of Chapter VII, practical projects strong passwords to detect the next issue of blog