[] The actual use of regular expressions telephone number and email address rapid extraction procedure

Here Insert Picture Description


Foreword

This project is for the regular expression pattern matching and practice what they have learned, exercises, reference books "Python Programming Quick Start - let automating tedious", is personalized modifications on the original project.

——

First, the problem background

Consider a boring task to find all the pages in a lengthy article or phone number, cell phone number and email address.

We need to design a program to help us quickly find all the numbers and e-mail address a long document in the clipboard, and neatly printed.

General process is this:
We press the Ctrl + A select all the text, then Ctrl + C to copy it to the clipboard, and then run the program, the final phone number, phone number, e-mail address will be neatly in the display window print it out.

——

Second, the functional analysis

The first step: Build a regular expression

  • Construction of a telephone number, cell phone number and email address of three regular expressions
  • We need to know composed of three elements form, as complete as possible and covers all forms of number and email

Step 2: Perform regular match

  • Use pyperclip get the text content
  • Three regular expression matching, use findall () to find all matches
  • The content format to match the consolidation, and then were added to the list which in turn facilitate the printing operation back

Step three: Print results

  • Neatly printing the telephone number, cell phone number and email address of three content
  • If no match is found, a message is displayed not found

Framework as follows:

import re
import pyperclip

# 第一步:构建正则表达式

# 【01】创建电话号码的正则表达式

# 【02】创建手机号码的正则表达式

# 【03】创建邮箱的正则表达式

# 第二步:进行正则匹配

# 第三步:打印结果

——

Third, code implementation

Then directly on the code specific comments in the code stated

import re
import pyperclip

# 第一步:构建正则表达式

# 创建电话号码的正则表达式
phoneRegex = re.compile(r'''(
    (\d{3}|\(\d{3}\))             # 3位数字或者带括号的3位数字
    (\s|-)?                       # 分隔符:空格或者-(匹配0次或1次)
    (\d{3})                       # 3位数字
    (\s|-)                        # 分隔符:空格或者-
    (\d{4})                       # 4位数字
    )''',re.VERBOSE)              # re.VERBOSE 管理复杂表达式

# 创建手机号码的正则表达式
cellphoneRegex = re.compile(r'''(
    ^(\d{3})              # 3位数字开始
    (-)?                  # 分隔符:-(匹配0次或1次) 因为有可能没有分隔符
    (\d{4})               # 4位数字
    (-)?                  # 分隔符:-(匹配0次或1次) 因为有可能没有分隔符
    (\d{4})$              # 4位数字结束
    )''',re.VERBOSE)      # re.VERBOSE 管理复杂表达式

# 创建邮箱的正则表达式  [email protected]
emailRegex = re.compile(r'''(
    [a-zA-Z0-9._%+-]+        # 收件人账号(可以使用的符号)
    @                        # 匹配连接符@
    [a-zA-Z0-9.-]+           # 服务器名
    \.                       # 点 dot
    ([a-zA-Z]{2,4})          # 域名
    )''',re.VERBOSE)         #  re.VERBOSE 管理复杂表达式


# 第二步:进行正则匹配

text = str(pyperclip.paste())  

phonematches=[]                        #  创建电话号码列表
cellphonematches=[]                  #  创建手机号码列表
emailmatches=[]                       #  创建邮箱地址列表

# 查找匹配电话号码并添加到列表
for group in phoneRegex.findall(text):
    phoneNumber = '-'.join([group[1],group[3],group[5]])
    phonematches.append(phoneNumber)

# 查找匹配手机号码并添加到列表
for group in cellphoneRegex.findall(text):
    cellphoneNumber = '-'.join([group[1],group[3],group[5]])
    cellphonematches.append(cellphoneNumber)

# 查找匹配邮箱地址并添加到列表
for group in emailRegex.findall(text):
    emailmatches.append(group[0])

# 第三步:打印结果

if len(phonematches) > 0:
    print('查找电话号码:')
    print('\n'.join(phonematches))
else:
    print('没查找到任何电话号码')

print('\n-----------------------------------------------------')

if len(cellphonematches) > 0:
    print('查找手机号码:')
    print('\n'.join(cellphonematches))
else:
    print('没查找到任何手机号码')

print('\n-----------------------------------------------------')

if len(emailmatches) > 0:
    print('查找邮箱地址:')
    print('\n'.join(emailmatches))
else:
    print('没查找到任何邮箱地址')

print('\n-----------------------------------------------------')

We CSDN home as experimental subjects, copying to the clipboard Select All and then run the program.
Here Insert Picture Description
The final printed results are as follows:
Here Insert Picture Description
For now, it seems nothing issue.

——

Fourth, the recovery disk summary

  • Through this project to deepen the understanding of the use of knowledge Re, refer to [Python] Re arsenal Basics: Regular Expressions
  • Construction of regular expressions difficulty lies in the composition in the form of matching content to think of, as far as possible cover all possible error or omission occurred so as not to match
  • The last part of the match to unify the content and format of neat printing is also very critical

——
Here Insert Picture Description

Published 35 original articles · won praise 35 · views 2746

Guess you like

Origin blog.csdn.net/nilvya/article/details/103828340