1-16 为gendata.py更新代码,使数据输出到redata.txt而不是屏幕。
from random import randrange ,choice
from string import ascii_lowercase as lc
from time import ctime
import sys
tlds=('com','edu','net','org','gov')
f = open('redata.txt','w+')
for i in range(randrange(5,11)):
dtint = randrange(i,11)
dtstr = ctime(dtint)
llen = randrange(4,8)
login = ''.join(choice(lc) for j in range(llen))
dlen = randrange(llen,13)
dom = ''.join(choice(lc) for j in range(dlen))
tmpstr= ('%s::%s@%s.%s::%d-%d-%d\n' % (dtstr,login,dom,choice(tlds),dtint,llen,dlen))
f.writelines(tmpstr)
print (tmpstr)
1-19 提取每行完整的时间戳
f = open('redata.txt','r')
lines = f.readlines( )
for line in lines:
print (re.findall(r"(.+):", line))
1-20提取完整电子邮件地址
f = open('redata.txt','r')
lines = f.readlines( )
for line in lines:
print (re.findall(r":(\w+@\w+\.\w+):", line))
1-21 提取时间戳中月份
f = open('redata.txt','r')
lines = f.readlines( )
for line in lines:
print (re.findall(r"\s(\w+)\s+\d", line))
1-22提取时间戳的年份
f = open('redata.txt','r')
lines = f.readlines( )
for line in lines:
print (re.findall(r"\s(\d+)::", line))
1-23提取时间戳中的时间
f = open('redata.txt','r')
lines = f.readlines( )
for line in lines:
print (re.findall(r"(\d+:\d+:\d+)", line))
1-24仅仅提取电子邮件地址中提取登录名和域名(包括主域名好高级域名)
f = open('redata.txt','r')
lines = f.readlines( )
for line in lines:
print (re.findall(r"(\w+)@(\w+.\w+):", line))
1-25 同上
1-26 使用你的电子邮件地址替换每一行电子邮件地址
f = open('redata.txt','r')
lines = f.readlines( )
for line in lines:
print (line.replace(re.findall(r"(\w+@\w+.\w+):",line)[0],'[email protected]'))
1-27从时间戳中提取月、日、和年,然后以‘月、日、年’格式,每行迭代一次。
f = open('redata.txt','r')
lines = f.readlines( )
for line in lines:
list = re.findall(r"\s(\w+)\s+(\d).+\s(\d+):", line)
print ('%s %s %s' % (list[0][0],list[0][1],list[0][2]))
1-28 区号,正则表达式应该匹配800-555-1212,也能匹配555-1212
string ="800-555-1212 555-1213"
patt = '((?:\d{3}-)?\d{3}-\d{4})'
print(re.findall(patt,string))
1-29支持圆括号连接的区号(800)555-1212
string ="800-555-1212 555-1213 (800)555-1214 "
patt = '(?:\(\d{3}\))?(?:\d{3}-)?\d{3}-\d{4}'
print(re.findall(patt,string))