Practical Python Exercise: Phone Number and E-mail Address Extractor

topic:

Suppose you have the boring task of finding all the phone numbers and email addresses in a long web page or article. If you turn the pages manually, it may take a long time to find. If there was a program that could look up phone numbers and E-mail addresses in the clipboard text, then you could just hit Ctrl-A to select all the text, Ctrl-C to copy it to the clipboard, and run your program. It replaces the text in the clipboard with the phone number and E-mail address it finds.

test text

Skip to main content
Home
Search form

Search

GO!
Topics
Arduino
Art & Design
General Computing
Hacking & Computer Security
Hardware / DIY
JavaScript
Kids
LEGO®
LEGO® MINDSTORMS®
Linux & BSD
Skip to main content
Home
Search form

Search

GO!
Catalog
Media
Write for Us
About Us
Topics
Arduino
Art & Design
General Computing
Hacking & Computer Security
Hardware / DIY
JavaScript
Kids
LEGO®
LEGO® MINDSTORMS®
Linux & BSD
Manga
Minecraft
Programming
Python
Science & Math
Scratch
System Administration
Early Access
Gift Certificates
Free ebook edition with every print book purchased from nostarch.com!
Shopping cart
3 Items    Total: $53.48
View cart Checkout
Contact Us

No Starch Press, Inc.
245 8th Street
San Francisco, CA 94103 USA
Phone: 800.420.7240 or +1 415.863.9900 (9 a.m. to 5 p.m., M-F, PST)
Fax: +1 415.863.9950

Reach Us by Email
General inquiries: [email protected]
Media requests: [email protected]
Academic requests: [email protected] (Please see this page for academic review requests)
Help with your order: [email protected]
Reach Us on Social Media
Twitter
Facebook
Navigation
My account
Log out
Manage your subscription preferences.


About Us  |  ★ Jobs! ★  |  Sales and Distribution  |  Rights  |  Media  |  Academic Requests  |  Conferences  |  Order FAQ  |  Contact Us  |  Write for Us  |  Privacy
Copyright 2018 No Starch Press, Inc

Results after running

Copied to clipboard:
800-420-7240
415-863-9900
415-863-9950 
[email protected]
[email protected]
[email protected]
[email protected]
Hit any key to close this window...

Ideas:

当你开始接手一个新项目时,很容易想要直接开始写代码。但更多的时候,最好是后退一步,考虑更大的图景。

I recommend first drafting a high-level plan to figure out what the program needs to do. Don't think about real code just yet, think about it later.
1. Create a regular expression for phone calls and a regular expression for email creation
2. Match the text on the clipboard
3. Copy the processed text to the clipboard

Start writing programs now

#! python3
# phoneAndEmail.py - Finds phone numbers and email addresses on the clipboard.

import re, pyperclip
# 创建电话的正则表达式
phoneRegex = re.compile(r'''(
   (\d{3}|\(d{3}\))?  # 区号可选,444或(444)
   (\s|-|\.)?  # 分隔符:字符或-或. 可选
   (\d{3})  # 三个数字
   (\s|-|\.)?  # 分隔符:字符或-或. 可选
   (\d{4})  # 四个数字
   )''',re.VERBOSE)

# 创建email的正则表达式
emailRegex = re.compile(r'''(
   [a-zA-Z0-9._%+-]+  # username
   @
   [a-zA-Z0-9.-]+  # domail name
   (\.[a-zA-Z]{2,4})  # dot-something
   )''',re.VERBOSE)

# 匹配剪切板的文本
text = str(pyperclip.paste())
matches = []
for groups in phoneRegex.findall(text):
   phoneNum = '-'.join([groups[1], groups[3], groups[6]])
   matches.append(phoneNum)
for groups in emailRegex.findall(text):
   matches.append(groups[0])

# 把处理好的文本复制到剪切板
if len(matches) > 0:
   pyperclip.copy('\n'.join(matches))
   print('Copied to clipboard:')
   print('\n'.join(matches))
else:
   print('No phone numbers or email addresses found.')

Analyze the code

re.VERBOSE is a parameter that allows comments and whitespace to be ignored in regular expressions. Verbose means verbose, that is, it allows you to add some comments to make it more readable for regular expressions.
For more information on regular expressions, see: Python Regular

Another pit is groups. It turns out that I didn't understand the difference between groups and groups.
group() means to intercept groups. For example:

import re
a = "123abc456"
print re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(0)   #123abc456,返回整体
print re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(1)   #123
print re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(2)   #abc
print re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(3)   #456

groups() returns a tuple containing all group strings, from 1 to the contained group number.
The groups in phoneNum = '-'.join([groups[1], groups[3], groups[6]]) in the code is a variable, don't read it wrong.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325354494&siteId=291194637