[Python]第十章开箱即用

文章目录

10.1模块

10.1.1模块就是程序
10.1.2模块是用来下定义的

1.在模块中定义函数
2.在模块代码中添加测试代码

10.1.3让模块可用

1.将模块放入正确的位置
2.告诉解释器去哪里找

10.1.4包

10.2探索模块

10.2.1模块包含什么

1.使用函数dir
2.变量__all__

10.2.2使用help获取帮助
10.2.3文档
10.2.4使用源代码

10.3标准库：一些深受欢迎的模块

10.3.1 sys
10.3.2 os
10.3.3 fileinput
10.3.4集合、堆和双端队列

1.集合
2.堆 heapq
3.双端队列 collections

10.3.5time
10.3.6random
10.3.7shelve和json

1.一个潜在的陷阱
2.一个简单的数据示例

10.3.8re

1.正则表达式是什么
2.模块re的内容
3.匹配对象和编组
4.替换中的组号和函数
5.找出发件人
6.模板系统演示

标准安装包包含一组称为标准库的模块

10.1模块

>>> import math#导入模块
>>> math.sin(0)#调用模块中的方法
0.0

10.1.1模块就是程序

在C:/Users/XXXX/Downloads/python目录下有一个文件hello.py，这个文件中写了一串可执行的代码，那么这个文件就可以作为模块导入程序

# hello.py 
print("Hello, world!")

首先需要将该目录设置为系统路径，这一步告诉解释器，除了通常查找的位置，还可以在该目录下查找,注意该种方式是只是一次性的

>>>import sys
>>>sys.path.append('C:/Users/XXXX/Downloads/python')

然后导入该模块

>>>import hello
Hello, world!

运行成功后，该目录下回生成一个名为__pycache__的子目录，可以删除，必要时会重新建
再次导入时将没有任何动作，即使内容发生修改，因为模块不是用来执行操作的，而是用于定义变量、函数、类等，这些动作只需要做一次

>>> import hello
>>>

如果hello模块在运行时发生了修改，的确需要重新加载，可以使用importlib模块里面的reload函数

>>> import importlib
>>> hello = importlib.reload(hello)
Hello, new world!

如果已经用之前模块中的类实例化了对象，重新加载模块后，该对象仍然是旧版模块类的对象

10.1.2模块是用来下定义的

让模块值得被创建的原因在于他们像类一样，有自己的作用域，这意味着在模块中定义的类和函数对其进行赋值的变量都将成为模块的属性

1.在模块中定义函数

新建一个py文件，写一个函数

# hello2.py
def hello():
	print("Hello, world!")

导入模块

>>> import hello2

如下访问该函数：

>>> hello2.hello()
Hello, world!

这样使用模块的意义是增加代码的重用性，将代码保存为模块，在需要用的时候访问它，而不需要重新编写。

2.在模块代码中添加测试代码

新建一个py文件，写一个函数

# hello3.py
def hello():
	print("Hello, China!")
# 一个测试：
hello()

这块代码直接作为普通程序可以运行，当做模块导入另外一个程序中，hello()函数也能被运行

>>> import hello3
Hello, China!
>>> hello3.hello()
Hello, China!

使用变量__name__检查模块作为程序运行还是被导入另外一个程序

>>> __name__
'__main__'#当前运行的是主程序
>>> hello3.__name__
'hello3'#此时该变量__name__被赋值成该模块的名称

将测试代码放入if语句

#hello4.py 
def hello(): 
 print("Hello, china") 
def test(): 
 hello() 
if __name__ == '__main__': test() #如果作为一个普通程序自己运行的时候会调用test(),当被当作模块导入的时候不调用
>>> import hello4#不会自动运行测试代码
>>> hello4.hello()#直接访问该函数
Hello,china
>>> hello4.test()#通过测试方法访问该函数
Hello,china

10.1.3让模块可用

之前需要import sys，sys.path.append(‘目录’)才能让解释器找到模块的位置，如果一开始就让sys.path包含正确的目录，有两种方式：

1.将模块放入正确的位置

只需找出原本解释器去哪找

>>> import sys,pprint
>>> pprint.pprint(sys.path)
['',
 'G:\\Anaconda3\\python36.zip',
 'G:\\Anaconda3\\DLLs',
 'G:\\Anaconda3\\lib',
 'G:\\Anaconda3',
 'G:\\Anaconda3\\lib\\site-packages',
 'G:\\Anaconda3\\lib\\site-packages\\win32',
 'G:\\Anaconda3\\lib\\site-packages\\win32\\lib',
 'G:\\Anaconda3\\lib\\site-packages\\Pythonwin']

Notice：pprint是个卓越的打印函数，能够更妥善地打印输出。相比较print，可以只能换行，展现的格式跟布局更合理
打印出来列表里面的每个元素都是解释器去查找的目录，将模块放置在其中任意一个目录下即可。但目录site-packages是最佳的选择，因为它就是用来放置模块的。

2.告诉解释器去哪里找

对于以下情况，模块不便直接放在上述目录下：
 不希望Python解释器的目录中充斥着你编写的模块。
 没有必要的权限，无法将文件保存到Python解释器的目录中。
 想将模块放在其他地方。
那么就需要告诉解释器去实际存放模块的地方找
一种方法是之前介绍的sys,path.append()修改路径，标准做法是将存放模块的目录添加到环境变量PYTHONPATH中（计算机-属性）,命令export PYTHONPATH=$PYTHONPATH:~/python （base shell）
另外还可以使用路径配置文件.pth

10.1.4包

为组织模块，编组为包，包就是另一种模块，但他们可包含其他模块。模块是.py文件，而包是一个目录。要被Python视为包，目录必须包含文件__init__.py
例如，如果有一个名为constants的包，而文件constants/init.py包含语句PI = 3.14，就可以像下面这样做：

>>>import constants 
>>>print(constants.PI)

要将模块加入包，只需将模块文件放在包目录中即可，也可以在包中嵌套其他包
以下语法都是合法的：

import 包
import 包.模块
from 包  import 模块

10.2探索模块

10.2.1模块包含什么

探索标准模块copy

1.使用函数dir

>>>import copy
>>>dir(copy)

如果只打印那些不含下划线，可供外部使用的所有属性,可使用一下列表推导筛出

>>> [n for n in dir(copy) if not n.startswith('_')] 
['Error', 'PyStringMap', 'copy', 'deepcopy', 'dispatch_table', 'error', 'name', 't', 'weakref']

2.变量all

在dir(copy)打印的完整列表中包含__all__,这个变量包含一个列表。

>>> copy.__all__
['Error', 'copy', 'deepcopy']

它是在模块copy中像下面这样设置的（这些代码是直接从copy.py复制而来的）：
__all__ = [“Error”, “copy”, “deepcopy”]
旨在定义模块共有的接口，它告诉解释器从这个模块导入的所有的名称意味着什么
因此，使用 from copy import *只能得到上述列出的3个函数
于是要导入其他属性，例如pystringMap，需要显示导入：import copy并使用copy.PyStringMap；或者使用from copy import pystringMap
编写模块时，像这样设置__all__也很有用。因为模块可能包含大量其他程序不需要的变量、
函数和类，比较周全的做法是将它们过滤掉。如果不设置__all__，则会在以import *方式导入时，导入所有不以下划线打头的全局名称。

10.2.2使用help获取帮助

>>> help(copy.copy) #获取有关函数copy
Help on function copy in module copy: 
copy(x) 
  	Shallow copy operation on arbitrary Python objec 
See the module's __doc__ string for more info.

实际上，前面的帮助信息是从函数copy的文档字符串中提取的：

>>> print(copy.copy.__doc__) 
Shallow copy operation on arbitrary Python objects. 
 	See the module's __doc__ string for more info.

10.2.3文档

当然可以直接访问这个模块的doc文档

>>> print(copy.__doc__)
Generic (shallow and deep) copying operations.
Interface summary:
        import copy
        x = copy.copy(y)        # make a shallow
。。。。。。

Python库参考手册”（https://docs.python.org/library）

10.2.4使用源代码

查找源代码，一种办法是像解释器那样通过sys.path来查找，但更快捷的方式是查看模块的特性__file__

>>> print(copy.__file__)
G:\Anaconda3\lib\copy.py

从该路径找到后，用编辑器打开，注意不要保存修改的内容

10.3标准库：一些深受欢迎的模块

10.3.1 sys

访问与python解释器紧密相关的变量和函数

函数/变量	描述
argv	命令行参数，包括脚本名https://blog.csdn.net/sunny_580/article/details/78188716
exit([arg])	退出当前程序，可通过可选参数指定返回值或错误消息（finally子句依然会执行）
modules	一个字典，将模块名映射到加载的模块
path	一个列表，包含要在其中查找模块的目录的名称
Platform	一个平台标识符，如sunos5或win32
stdin	标准输入流——一个类似于文件的对象
stdout	标准输出流——一个类似于文件的对象
stderr	标准错误流——一个类似于文件的对象

简单地说，Python从sys.stdin获取输入（例如，用于input中），并将输出打印到sys.stdout。
CASE:反转打印命令行参数

# reverseargs.py 
import sys 
args = sys.argv[1:] #this is a test
args.reverse() 
print(' '.join(args))
或者
print(' '.join(reversed(sys.argv[1:]))) 
>>> python reverseargs.py this is a test
test a is this

这里在shell或者其他解释器输入的python reverseargs.py后面的‘this is a test’就是sys.argv[1:] ，程序实现了将输入的内容传给reverseargs文件

10.3.2 os

访问多个操作系统服务

函数/变量	描述
environ	包含环境变量的映射
system(command)	在子shell中执行操作系统命令
sep	路径中使用的分隔符
pathsep	分隔不同路径的分隔符
linesep	行分隔符（’\n’、’\r’或’\r\n’）
urandom(n)	返回n个字节的强加密随机数据

CASE:打开浏览器

import os
#命令system可用于执行任何外部程序
#打开记事本程序
os.system('notepad')
os.system(r'C:\"Program Files (x86)"\Google\Chrome\Application\chrome.exe')#有错误
os.startfile(r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe')

更佳的方案

import webbrowser
webbrowser.open('http://www.taobao.com')

Extend: 1.getcwd() #获取当前路径 ,chdir() #改变当前路径

>>>import os
>>>localpath=os.getcwd()
>>>print(localpath)
-----------------------
C:\Users\xxxx\python_test

>>>os.chdir(r'C:\Users\xxxx\Downloads\python')
>>>print(os.getcwd())
--------------
C:\Users\xxxx\Downloads\python

Extend: 2.os.path.join(path, *paths) # Join two (or more) paths.

>>>newpath=os.path.join(localpath,'temp')
>>>print(newpath)
C:\Users\xxxx\python_test\temp

Extend: 3.sys.exit()和os.exit(),exit()/quit()

sys.exit(n) 退出程序引发SystemExit异常, 可以捕获异常执行些清理工作. n默认值为0, 表示正常退出. 其他都是非正常退出. 还可以sys.exit(“sorry, goodbye!”); 一般主程序中使用此退出.
os._exit(n), 直接退出, 不抛异常, 不执行相关清理工作. 常用在子进程的退出.
exit()/quit(), 跑出SystemExit异常. 一般在交互式shell中退出时使用.
https://blog.csdn.net/index20001/article/details/74294945

10.3.3 fileinput

读写文件

函数	描述
input([files[, inplace[, backup]]])	帮助迭代多个输入流中的行
filename()	返回当前文件的名称
lineno()	返回（累计的）当前行号
filelineno()	返回在当前文件中的行号
isfirstline()	检查当前行是否是文件中的第一行
isstdin()	检查最后一行是否来自sys.stdin
nextfile()	关闭当前文件并移到下一个文件
close()	关闭序列

CASE在Python脚本中添加行号

# numberlines.py 
import fileinput 
for line in fileinput.input(inplace=True): #inplace:是否将标准输出(print方法)的结果写回文件；如果不为TRUE，则文档内容不会改变，执行的结果将在控制台打印出来
	line = line.rstrip() 
	num = fileinput.lineno() 
	print('{:<50} # {:2d}'.format(line, num)) #{:<50} 左对齐，宽度50， {:2d }表示两个宽度的10进制数显示。

如果像下面这样运行这个程序，并将其作为参数传入：

>>> python numberlines.py text.txt

text.txt会作为参数传入fileinput.input(inplace=True)，实际上后面跟着多个文件，如python numberlines.py text.txt temp.txt，会将后面所有文件的行一一处理。
如果只输入python numberlines.py或者python numberlines.py-，那么就会默认sys.stdin等待输入，对输入的内容进行处理。

10.3.4集合、堆和双端队列

Python中有用的数据结构除了字典（散列表）和列表（动态数组），还有一些又是也能排上用场

1.集合

由内置类Set实现

>>>set([0,1,2,3,4])
>>>set(range(5))
{0, 1, 2, 3, 4}

Notice：{}是一个空字典而非空集合，a=set()创建一个空集合
必须在不提供任何参数的情况下调用set。
集合主要用于成员资格检查，因此将忽略重复的元素：

>>> {0, 1, 2, 3, 0, 1, 2, 3, 4, 5} 
{0, 1, 2, 3, 4, 5}

与字典一样，集合中元素的排列顺序是不确定的，因此不能依赖于这一点。

>>> {'fee', 'fie', 'foe'} 
{'foe', 'fee', 'fie'}

对集合进行计算
求并集 .union |

>>> a = {1, 2, 3}
>>> b = {2, 3, 4}
>>> a.union(b)
{1, 2, 3, 4}
>>> a | b
{1, 2, 3, 4}

取交集 .intersection &

>>> a.intersection(b)
{2, 3}
>>> a & b
{2, 3}
>>> c = a & b

a是否包含c .issubset <=

>>> c.issubset(a)
True
>>> c <= a
True

C是否包含a .issuperset >=

>>> c.issuperset(a)
False
>>> c >= a
False

从a中返回在b中不存在的元素 .difference -

>>> a.difference(b)
{1}
>>> a - b
{1}

从ab中返回在交集中不存在的元素 .symmetric_difference ^

>>> a.symmetric_difference(b)
{1, 4}
>>> a   b
{1, 4}

复制 .copy()

>>> a.copy()
{1, 2, 3}
>>> a.copy() is a
False

计算两个集合的并集的函数时,set中方法union的未关联版本

>>>my_sets = [{3,88,99}] 
>>>my_sets.append(set(range(0, 5)))
>>>my_sets
[{3, 88, 99}, {0, 1, 2, 3, 4}]
>>>import functools
>>>functools.reduce(set.union, my_sets) 
{0, 1, 2, 3, 4, 88, 99}

集合是可变的，因此不能用作字典中的键。集合只能包含不可变（可散列）的值，因此不能包含其他集合。但是有frozenset类型，它表示不可变（可散列）的集合。

>>>a={1,2,3,4}
>>>b={2,3,4,5}
>>> a.add(b)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: set objects are unhashable#集合不能包含其他集合
>>> a.add(frozenset(b)) 
>>>a
{1, 2, 3, 4, frozenset({2, 3, 4, 5})}

构造函数frozenset创建给定集合的副本。在需要将集合作为另一个集合的成员或字典中的键时，frozenset很有用。

2.堆 heapq

另一种著名的数据结构是堆（heap），它是一种优先队列。优先队列让你能够以任意顺序添加对象，并随时（可能是在两次添加对象之间）找出（并删除）最小的元素
这个模块名为heapq（其中的q表示队列），它包含6个函数（如表10-5所示），其中前4个与堆操作直接相关。必须使用列表来表示堆对象本身。

函数	描述
heappush(heap, x)	将x压入堆中
heappop(heap)	从堆中弹出最小的元素
heapify(heap)	让列表具备堆特征
heapreplace(heap, x)	弹出最小的元素，并将x压入堆中
nlargest(n, iter)	返回iter中n个最大的元素
nsmallest(n, iter)	返回iter中n个最小的元素

heappush(heap, x)不能将它用于普通列表，而只能用于使用各种堆函数创建的列表。

from heapq import *
from random import shuffle
data=[8,5,6,7,1,3,4,2,0]
shuffle(data)#就地打乱序列seq
#data 直接输出的序列将是无序的
heap=[]
for n in data:
    heappush(heap,n)
heap
[0, 1, 2, 4, 3, 8, 7, 6, 5]

它们虽然不是严格排序的，但必须保证一点：位置i处的元素总是大于位置i // 2处的元素（反过来说就是小于位置2 * i和2 * i + 1处的元素）。
这是底层堆算法的基础，称为堆特征（heap property）
函数heappop弹出最小的元素（总是位于索引0处），并确保剩余元素中最小的那个位于索引0处（保持堆特征）。

>>> heappop(heap) 
0 
>>> heappop(heap) 
1
>>> heappop(heap) 
2
>>> heap 
[3, 4, 6, 7, 5, 8]

函数heapify通过执行尽可能少的移位操作将列表变成合法的堆（即具备堆特征）。如果你的堆并不是使用heappush创建的，应在使用heappush和heappop之前使用这个函数。

>>> heap = [5, 8, 0, 3, 6, 7, 9, 1, 4, 2] 
>>> heapify(heap) 
>>> heap 
[0, 1, 5, 3, 2, 7, 9, 8, 4, 6]

函数heapreplace从堆中弹出最小的元素，再压入一个新元素。
相比于依次执行函数heappop和heappush，这个函数的效率更高

>>> heapreplace(heap, 0.5) 
0 #返回原最小值
>>> heap 
[0.5, 1, 5, 3, 2, 7, 9, 8, 4, 6] 
>>> heapreplace(heap, 10) 
0.5 
>>> heap 
[1, 2, 5, 3, 6, 7, 9, 8, 4, 10]

3.双端队列 collections

需要按添加元素的顺序进行删除时，双端队列很有用。 collections中，包含类型deque以及其他几个集合（collection）类型。

>>> from collections import deque 
>>> q = deque(range(5)) 
>>> q.append(5) 
>>> q.appendleft(6) #左添加
>>> q 
deque([6, 0, 1, 2, 3, 4, 5]) 
>>> q.pop() 
5 
>>> q.popleft() #左删除
6 
>>> q.rotate(3) #轮转函数，集体向右移动3格
>>> q 
deque([2, 3, 4, 0, 1]) 
>>> q.rotate(-1) 
>>> q 
deque([3, 4, 0, 1, 2])

10.3.5time

模块time包含用于获取当前时间、操作时间和日期、从字符串中读取日期、将日期格式化为字符串的函数。
日期可表示为实数（从新纪元1月1日0时记时），也可表示为包含9个整数的元组.
(2008, 1, 21, 12, 2, 56, 0, 21, 0)表示:
2008年1月21日12时2分56秒。这一天是星期一，2008年的第21天（不考虑夏令时）

索引	字段	值
0	年	如2000、2001等
1	月	范围1~12
2	日	范围1~31
3	时	范围0~23
4	分	范围0~59
5	秒	范围0~61
6	星期	范围0~6，其中0表示星期一
7	儒略日	范围1~366
8	夏令时	0、1或-1

秒的取值范围为0~61，这考虑到了闰一秒和闰两秒的情况。
夏令时数字是一个布尔值（True或False），但如果你使用-1，那么mktime［将时间元组转换为时间戳（从新纪元开始后的秒数）的函数］可能得到正确的值。

函数	描述
asctime([tuple])	将时间元组转换为字符串
localtime([secs])	将秒数转换为表示当地时间的日期元组
mktime(tuple)	将时间元组转换为当地时间
sleep(secs)	休眠（什么都不做）secs秒
strptime(string[, format])	将字符串转换为时间元组
time()	当前时间（从新纪元开始后的秒数以UTC为准）

>>>import time
#时间元组--字符串
>>>time.asctime(time.localtime())
>>>time.asctime()#将当前时间元组转换为字符串
'Fri Sep 28 07:44:58 2018'

#秒---时间元组
>>>time.localtime()#本地时间
time.struct_time(tm_year=2018, tm_mon=9, tm_mday=28, tm_hour=7, tm_min=45, tm_sec=22, tm_wday=4, tm_yday=271, tm_isdst=0)
>>>type(time.gmtime())#国际时间
time.struct_time

#时间元组---秒
>>>time.mktime((2018,9,28,7,48,25,4,271,0))#将日期元组转换为从新纪元后的秒数
1538092105.0
>>>time.mktime(time.localtime())#将日期元组转换为从新纪元后的秒数，与localtime的功能相反
1543893840.0

>>>print('wait 3 seconds')
>>>time.sleep(3)#让解释器等待指定的秒数
>>>print('down!')
wait 3 seconds
down!

#字符串--时间元组
>>>time.strptime(time.asctime())
>>>time.strptime("30 Nov 00", "%d %b %y")#将一个字符串（其格式与asctime所返回字符串的格式相同）转换为日期元组
time.struct_time(tm_year=2000, tm_mon=11, tm_mday=30, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=335, tm_isdst=-1)

>>>time.time()#返回当前的国际标准时间
1543894099.0952008

10.3.6random

模块random包含生成伪随机数的函数，有助于编写模拟程序或生成随机输出的程序。(真正的随机，用于加密或实现与安全相关的功能，使用模块os中的函数urandom)

函数	描述
random()	返回一个0~1（含）的随机实数
getrandbits(n)	以长整数方式返回n个随机的二进制位
uniform(a, b)	返回一个a~b（含）的随机实数
randrange([start], stop, [step])	从range(start, stop, step)中随机地选择一个数
choice(seq)	从序列seq中随机地选择一个元素
shuffle(seq[, random])	就地打乱序列seq
sample(seq, n)	从序列seq中随机地选择n个值不同的元素

>>>import random
>>>random.random()#返回一个0~1（含）的伪随机数
0.22018834179900648
>>>random.getrandbits(6)#以一个整数的方式返回指定数量的二进制位
33
>>>random.uniform(4,55)#返回一个a~b（含）的随机（均匀分布的）实数。
29.405905331706407
>>>random.randrange(4,20,2)#生成随机整数,20不含
14
>>>random.choice([0,1,2,3,4,5])#从给定序列中随机（均匀）地选择一个元素
4
>>>q=[0,1,2,3,4,5]
>>>random.shuffle(q)#shuffle随机地打乱一个可变序列中的元素，并确保每种可能的排列顺序出现的概率相同。
>>>q
[5, 3, 2, 0, 1, 4]
>>>random.sample(q,2)#从给定序列中随机（均匀）地选择指定数量的元素，并确保所选择元素
的值各不相同。
[2, 0]

CASE1:生成一个指定区间内的随机时间

>>>from random import *
>>>from time import *
>>>date1 = (2016, 1, 1, 0, 0, 0, -1, -1, -1) 
>>>time1=mktime(date1)#日期元组转化成秒
>>>date2 = (2017, 1, 1, 0, 0, 0, -1, -1, -1) 
>>>time2=mktime(date2)
>>>random_time=uniform(time1,time2)
>>>print(asctime(localtime(random_time)))
Tue Mar  1 05:46:19 2016

CASE2:掷骰子

>>>from random import *
>>>num=int(input('how many dices:'))#要掷多少个骰子
>>>sides=int(input('how many sides:'))#每个骰子有多少面
>>>sum=0
>>>for i in range(num):
	    point=randrange(sides)+1#生成一个1~面数的随机整数
	    print('NO.',(i+1),'dice:',point)
	    sum+=point#每个骰子点数累加
    
>>>print('total points:',sum)
how many dices:2
how many sides:6
NO. 1 dice: 2
NO. 2 dice: 3
total points: 5

CASE3 随机发牌

#创建一副牌
>>>values=list(range(1,11))+'jack queen king'.split()#生成13张牌的集合
>>>suits='dimonds clubs hearts spades'.split()#生成4种花色的集合
>>>deck=['{} of {}'.format(v,s) for v in values for s in suits]#数组相乘的方式生成一副牌
>>>deck
['1 of dimonds',
 '1 of clubs',
 '1 of hearts',
 '1 of spades',
 '2 of dimond',
... ...
 'king of clubs',
 'king of hearts',
 'king of spades']
>>>from random import shuffle
>>>shuffle(deck)#洗牌
>>>deck
['jack of spades',
 '8 of clubs',
 '10 of hearts',
 'jack of dimonds',
......
 'king of dimonds',
 '4 of hearts',
 '6 of spades',
 '7 of clubs']
>>>while deck:input(deck.pop())#将上述列表的元素从后往前取出,按回车发牌
7 of clubs4
6 of spades
4 of hearts
king of dimonds
.......
jack of dimonds
10 of hearts
8 of clubs
jack of spades

10.3.7shelve和json

Shelve的open函数，将一个文件名作为参数，并返回一个shelf对象，以供存储数据。
可像操作普通字典那样操作它（只是键必须为字符串），操作完毕（并将所做的修改存盘）时，可调用其方法close。

1.一个潜在的陷阱

>>>import shelve
>>>s=shelve.open('test.dat')#自动创建test.dat.dat和test.dat.dir文件
>>>s['x']=[1,2,3]
>>>s['x'].append(4)#使用它创建一个新列表，再将'4'附加到这个新列表末尾，但这个修改后的版本未被存储
>>>s['x']
[1, 2, 3]

创建了新列表却没有保存

>>>import shelve
>>>s=shelve.open('test.dat')
>>>s['x']=[1,2,3]
>>>temp=s['x']#临时变量
>>>temp.append(4)
>>>s['x']=temp#再次储存
>>>s['x']
[1, 2, 3, 4]

使用临时变量，修改临时变量后再次储存

>>>import shelve
>>>s=shelve.open('test.dat',writeback=True)#修改open的writeback参数
>>>s['x']=[1,2,3]
>>>s['x'].append(4)
>>>s['x']
[1, 2, 3, 4]

writeback设置为True,从shelf对象读取或赋给它的所有数据结构都将保存到内存（缓存）中,等到关闭shelf对象时才将它们写入磁盘中。

2.一个简单的数据示例

#code1
import sys,shelve
#获取数据
def store_person(db):
    pid=input('输入识别号：')
    person={}
    person['name']=input('输入姓名：')
    person['age']=input('输入年龄：')
    person['phone']=input('输入号码：')
    db[pid]=person#pid是一个字符串，db得是一个字典（该案例应该是一个shelf对象）
    #print(db)

测试一下，传入空字典

>>>#store_person({})
输入识别号：01
输入姓名：kk
输入年龄：1
输入号码：1234
{'01': {'name': 'kk', 'age': '1', 'phone': '1234'}}

可以输入数据，再测试出非空字典

>>>#store_person({'01': {'name': 'kk', 'age': '1', 'phone': '1234'}})
输入识别号：02
输入姓名：kk
输入年龄：21
输入号码：5423
{'01': {'name': 'kk', 'age': '1', 'phone': '1234'}, '02': {'name': 'kk', 'age': '21', 'phone': '5423'}}

#code2
#查询数据
def lookup_person(db):
    pid_v=input('输入要查询的识别号：')
    field=input('输入要查询的信息（name,age,phone）：').strip().lower()
    print(field+':',db[pid_v][field])

测试查询

>>>#lookup_person({'01': {'name': 'kk', 'age': '1', 'phone': '1234'}, '02': {'name': 'kk', 'age': '21', 'phone': '5423'}})
输入要查询的识别号：02
输入要查询的信息（name,age,phone）： AGE
age: 21

新建一个shelf对象来存储数据

#code3
database=shelve.open('C:\\database.dat')#在相应目录下会生成该文件

存储第一条数据

>>>store_person(database)
输入识别号：001
输入姓名：jack
输入年龄：21
输入号码：2589
<shelve.DbfilenameShelf object at 0x0000000004BBAFD0>

存储第二条数据

>>>store_person(database)
输入识别号：002
输入姓名：hellen
输入年龄：25
输入号码：2590
<shelve.DbfilenameShelf object at 0x0000000004BBAFD0>

查询数据

>>>lookup_person(database)
输入要查询的识别号：001
输入要查询的信息（name,age,phone）：phone
phone: 2589

关闭文件

database.close()

整合和优化：
1调用储存和查询函数是手动的，可以通过输入自定义指令条件判断执行对应函数
2主程序位于函数main中，这个函数仅在__name__== 'main’时才会被调用。这意味着可在另一个程序中将这个程序作为模块导入，再调用函数main。
3为确保数据库得以妥善的关闭，使用了try和finally。

#code4
def main():
    try:
        while True:
            cmd =input('输入命令：').strip().lower()#去除空格并且小写
            if cmd == 'store':
                store_person(database)
            elif cmd == 'lookup':
                lookup_person(database)
            elif cmd == 'quit':
                return#原本是while true:break结构,这里除了可以用break,也可以用return,break可以跳出循环，return用于返回函数值，这里的循环在main()函数内部，函数main（）可以被返回值
    finally:
        database.close()          
if __name__=='__main__':main()

该案例整合的完整版可以参考附录
由于数据已经被存在对应目录下的dat文件里面，因此退出程序后，数据得到保留，下次查询，之前的数据还在

10.3.8re

模块re提供对正则表达式的支持

1.正则表达式是什么

是可匹配文本片段模式
*通配符
正则表达式可与多个字符匹配，可以使用特殊字符来创建这种正则表达式
例如： . 可以匹配除换行符以外的其他单个字符，如：‘.ython’可匹配到‘python’、’+ython’等
这里像 . 的特殊符号就是通配符
*对特殊字符进行转义
像 . 是通配符，是一个特殊的符号，可以匹配除了换行符以外的任何单个字符，但是如果让该符号指向代表它原本的意义，不希望再有通配的能力时，需要用反斜杠转义
‘.ython.org’ 可与‘pythonworg’通配
‘.ython\.org’或者r’.ython.org’ 与’python.org’通配不与‘pythonworg’通配
*字符集
不需要匹配任意字符，只需要匹配指定字符
‘[pj]ython’只可与‘python’和’jython‘匹配
‘[a-z]’可以和a~z的任何字母匹配
‘[a-zA-Z0-9]’可以与大写小写字母和数字匹配
‘[^abc]’与排除abc三个字符以外的其他任何字符匹配（^脱字符）
*二选一和子模式
‘p(ython|erl)‘只匹配字符串’python’和’perl’
单个字符也可称为子模式
可选模式和重复模式
将？加载子模式后面，代表该子模式是可选的。
I like (apple、)?(pear、)?banana可匹配
I like Banana
I like apple、Banana
I like apple、pear、banana
每个可选的子模式都可以出现，也可以不出现。
重复多次
(apple) 重复0|1|多次
(apple)+ 重复1|多次
(apple){m,n} 重复m~n多次
*字符串的开头和末尾
要指定字符串开头，可使用脱字符（^），注意其在字符集中使用差别。
例如，’^ht+p’与’http://python.org’和’htttttp://python.org’匹配，但与’www.http.org’不匹配。
要指定字符串末尾，可使用美元符号（$）

2.模块re的内容

函数	描述
compile(pattern[, flags])	根据包含正则表达式的字符串创建模式对象
search(pattern, string[, flags])	在字符串中查找模式
match(pattern, string[, flags])	在字符串开头匹配模式
split(pattern, string[, maxsplit=0])	根据模式来分割字符串
findall(pattern, string)	返回一个列表，其中包含字符串中所有与模式匹配的子串
sub(pat, repl, string[, count=0])	将字符串中与模式pat匹配的子串都替换为repl
escape(string)	对字符串中所有的正则表达式特殊字符都进行转义

函数**re.compile()**将用字符串表示的正则表达式转换为模式对象(pat)，以提高匹配效率。
转换之后才可以供search、match函数使用。不转换也能用，只是效率低。

pat=compile(正则表达式)
re.search(pat.string)等价于pat.search(string)

函数**re.search()**在指定字符串中找到第一个与正则表达式相匹配的子串时，就返回Matchobject，否则就返回none(没有找到，没有返回)

>>>from re import *
>>>pat=compile('.ython')
>>>search(pat,'i love python not jython')
<_sre.SRE_Match object; span=(7, 13), match='python'>#span=(7, 13)被匹配的字符在字符串中的位置

函数**re.match()**在指定字符串开头开始查找与正则表达式相匹配的子串时，就返回Matchobject，否则就返回none

>>>match(pat,'i love python not jython')
没有找到，没有返回
>>>match(pat,'python is my favorite language')
<_sre.SRE_Match object; span=(0, 6), match='python'>

如果要求与整个字符串匹配,可以在match函数基础上添加结尾通配符$

>>>pat2=compile('.ython$')
>>>match(pat2,'python')
<_sre.SRE_Match object; span=(0, 6), match='python'>

函数**re.splite()**用来以正则表达式分隔字符串

>>>text='my,name,,is！lucy'
>>>split('[,！]',text)
['my', 'name', '', 'is', 'lucy']
>>>text='my,name,,is,！lucy'
>>>split('[,！]+',text)#+代表[]内字符重复多次的也匹配
['my', 'name', 'is', 'lucy']

如果模式是圆括号，则除了方括号的分列外，还把本身保留在原位置

>>>text2='ppytthon'
>>>split('[y|h]',text2)
['pp', 'tt', 'on']#这里把text2分列开来的y和h并未被保留
>>>split('(y|h)',text2)
['pp', 'y', 'tt', 'h', 'on']

还可以指定分隔次数

>>>text3='ppytthonyssshppp'
>>>split('[y|h]',text3)#默认全部分隔结束
['pp', 'tt', 'on', 'sss', 'ppp']
>>>split('[y|h]',text3,2)#只分隔两次
['pp', 'tt', 'onyssshppp']

函数**re.findall()**返回一个列表，包含所有了匹配的子串

pat=compile('.ython')
findall(pat,'i love python not jython')
['python', 'jython']
text4='hello!what is your name?jack,nice to see you.'
findall('[a-zA-Z]+',text4)
['hello', 'what', 'is', 'your', 'name', 'jack', 'nice', 'to', 'see', 'you']
findall('[,.!]+',text4)
['!', ',', '.']

函数**re.sub()**从左往右匹配子串替换为指定内容

>>>sub('name','lily','dear,name')
>>>#sub('{name}','lily','dear,{name}')
'dear,lily'
>>> pat = '{name}'
>>> text = 'Dear {name}...'
>>> re.sub(pat, 'Mr. Gumby', text)
'Dear Mr. Gumby...'

函数**re.escape()**是一个工具函数，用于对字符串中所有可能被视为正则表达式运算符的字符进行转义。

>>>escape('www.baidu.com')
'www\\.baidu\\.com'

3.匹配对象和编组

方法	描述
group([group1, …])	获取与给定子模式（编组）匹配的子串.编号为1~99。指定一个或没有编组号（无编组号默认为0，即整个字符串），返回单个字符串，多个编组号返回元组
start([group])	返回与给定编组匹配的子串的起始位置,（默认为0，即整个模式）
end([group])	返回与给定编组匹配的子串的终止位置（与切片一样，不包含终止位置,返回终止索引加1）
span([group])	返回与给定编组匹配的子串的起始和终止位置

#  a(banana)(c)(dD(Ee))
#   1        2  3  4
>>>m=match('a(.*)(c|C)(d.(.e))','abananacdDEe')
>>>m.group(0)
'abananacdDEe'
>>>m.group(1)
'banana'
>>>m.group(3)
'dDEe'
>>>m.group(0,1,2,3,4)
('abananacdDEe', 'banana', 'c', 'dDEe', 'Ee')
>>> m.start(1)
1
>>> m.end(1)
7
>>> m.span(1)
(1, 7)

4.替换中的组号和函数

>>> emphasis_pattern = re.compile(r''' 
... \* # 起始突出标志——一个星号
... ( # 与要突出的内容匹配的编组的起始位置
... [^\*]+ # 与除星号外的其他字符都匹配（在集合里的脱字符是排除）
... ) # 编组到此结束
... \* # 结束突出标志
... ''', re.VERBOSE) #使得表达式更容易理解
#>>>emphasis_pattern = r'\*([^\*]+)\*'
>>> re.sub(emphasis_pattern, r'<em>\1</em>', 'Hello, *world*!')
'Hello, <em>world</em>!'

>>>emphasis_pattern=r'\*([^\*]+)\*([^\*]+)\-'
>>>sub(emphasis_pattern,r'<em>\1<em>\2<ed>','hello,*world*python-!')
#\1表示括号编号为1的分组
'hello,<em>world<em>python<ed>!'

贪婪模式

>>>emphasis_pattern=r'\*(.+)\*'
>>>sub(emphasis_pattern,r'<em>\1<em>','hello,*wor*ld*!')
'hello,<em>wor*ld<em>!'

匹配了从第一个星号到最后一个星号的全部内容，其中包含另外一个星号！这就是贪婪的意思：能匹配多少就匹配多少。
避免过度贪婪，输了使用之前的脱字符集，还可以使用重复运算符的非贪婪版。对于所有的重复运算符，都可在后面加上问号来将其指定为非贪婪的。

>>>em_pa=r'\*(.+?)\*'
>>>sub(em_pa,r'<em>\1</em>','hello,*wor*ld*kk*!')
'hello,<em>wor</em>ld<em>kk</em>!'

然而，字符传中需要符合em_pa模式的子串需要完整地出现（这里是一对*，可以使2对*，3颗*的剩余一颗匹配不出来）

em_pa=r'\*(.+?)\*'
sub(em_pa,r'<em>\1</em>','hello,*wor*ld*!')
'hello,<em>wor</em>ld*!'#第3个*没法匹配

5.找出发件人

CASE1 发件人Foo Fie
以文本格式存储的邮件，要从中找到发件人Foo Fie，找到关键句
From: Foo Fie <[email protected]>
正则表达式
From: (.*) <.*?>$#？非贪婪模式
要输出的是括号里面的，即print(group(1))
代码如下

# find_sender.py 
import fileinput,re#用fileinput查找每一行
pat=e.complie(‘from: (.*?) <.*?>$’)#compile编译正则表达式提高效率

For line in fileinput.input():
	If pat.match(line):#match从头匹配
		Print(group(1))

运行命令

>>>python find_sender.py message.eml 
Foo Fie

CASE2 找到所有邮箱
邮箱的基本格式：
[email protected]
正则表达式：
r'[a-z\-\.]+@[a-z\-\.]+'
代码：

import fileinput,re
pat = re.compile(r'[a-z\-\.]+@[a-z\-\.]+', re.IGNORECASE)

for line in fileinput.input():
	for address in pat.findall(line):
		print(address,fileinput.lineno()) #fileinput.lineno()识别行号

这样就逐行把文档中的邮箱提取出来了，有重复项并且无序，因此,可以利用集合的特点，将符合条件的元素放入有个新的集合中，打印这个集合

import fileinput, re
pat = re.compile(r'[a-z\-\.]+@[a-z\-\.]+', re.IGNORECASE)
addresses = set()
for line in fileinput.input():
	for address in pat.findall(line):
	addresses.add(address)
for address in sorted(addresses):
	print address

这样打印出来的元素是没有重复且有序的（大写排在小写前面）
Notice:这里使用集合set不用列表，集合对于一样的元素值存在一个，相当于自动去重

6.模板系统演示

模板（template）是一种文件，可在其中插入具体的值来得到最终的文本。
*可使用正则表达式来匹配字段并提取其内容。
*可使用eval来计算表达式字符串，并提供包含作用域的字典。可在try/except语句中执行这种操作。如果出现SyntaxError异常，就说明你处理的可能是语句（如赋值语句）而不是表达式，应使用exec来执行它。
*可使用exec来执行语句字符串（和其他语句），并将模板的作用域存储到字典中。
*可使用re.sub将被处理的字符串替换为计算得到的结果。

import fileinput, re 
# 与使用方括号括起的字段匹配
field_pat = re.compile(r'\[(.+?)\]') 
# 我们将把变量收集到这里：
scope = {} 
# 用于调用re.sub：
def replacement(match): #match是一个一个MatchObject对象
	# 返回模式中与给定子模式(组)匹配的子字符串
	code = match.group(1) 
	try: 
		 # 如果字段为表达式，就返回其结果：
		return str(eval(code, scope)) #计算表达式，如time.asctime()，得到一个结果
	except SyntaxError: 
		 # 否则在当前作用域内执行该赋值语句
 		exec(code,scope)#执行语句，如 import time
 		# 并返回一个空字符串
		return '' 
# 获取所有文本并合并成一个字符串：
lines = [] 
for line in fileinput.input(): 
	lines.append(line) 
text = ''.join(lines) #text 将读取到所有传入文件的内容
# 替换所有与字段模式匹配的内容：
print(field_pat.sub(replacement, text)) #replacement函数返回经过自定义处理的text中与模式匹配的字符串
#print(re.sub(field_pat,replacement, text))

一个定义文件 magnus.txt

[name = 'Magnus Lie Hetland' ] 
[email = '[email protected]' ] 
[language = 'python' ]

一个模板文件template.txt

[import time] 
Dear [name], 
I would like to learn how to program. I hear you use the [language] language a lot -- is it something I 
should consider? 
And, by the way, is [email] your correct email address? 
Fooville, [time.asctime()] 
Oscar Frozzbozz

运行

>>>python templates.py magnus.txt template.txt

输出




Dear Magnus Lie Hetland, 
I would like to learn how to program. I hear you use the python language a lot -- is it something I 
should consider? 
And, by the way, is [email protected] your correct email address? 
Fooville Mon Jul 18 15:24:10 2016 
Oscar Frozzbozz

解析：
1、text内容:
[name = ‘Magnus Lie Hetland’ ]
[email = ‘[email protected]’ ]
[language = ‘python’ ] [import time]
Dear [name],
I would like to learn how to program. I hear you use the [language] language a lot – is it something I should consider?
And, by the way, is [email] your correct email address?
Fooville, [time.asctime()]
Oscar Frozzbozz
其中前3行的4个[]都被return ‘’,因为这4个[]都被exec(code,scope)执行，它们是语句，于是输出结果中会有3行空行
2、field_pat.sub(replacement, text)
text中一共有8个[]，replacement函数其实也执行了8次
match: [name = ‘Magnus Lie Hetland’ ]
code: name = ‘Magnus Lie Hetland
match: [email = ‘[email protected]’ ]
code: email = ‘[email protected]’
match: [language = ‘python’ ]
code: language = ‘python’
match: [import time]
code: code:import time
match: [name]
code: name
match: [language]
code: language
match: [email]
code: email
match: [time.asctime()]
code: time.asctime()
sub方法使它每执行一次，替换掉一个[]，循环不重复
3、field_pat
field_pat = re.compile(r’[(.+?)]’)
其实不需要（）也可以完成最后一句

>>>field_pat=re.compile(r'\[.+?\]') 
>>>re.sub(field_pat,'7:00', 'Fooville, [time.asctime()]' )
'Fooville, 7:00'

但是，field_pat还会被应用在replacement函数里面，r’[(.+?)]'里面的（）使得可以提取到[]的字符串用于计算或者执行，然而在replacement函数里面并没有提到field_pat，但match就已经是MatchObject对象了

-----------------------------------------附录----------------------------------------：

# database.py
import sys, shelve
def store_person(db):
	"""
	让用户输入数据并将其存储到shelf对象中
	"""
	pid = input('Enter unique ID number: ')
	person = {}
	person['name'] = input('Enter name: ')
	person['age'] = input('Enter age: ')
	person['phone'] = input('Enter phone number: ')
	db[pid] = person
	
def lookup_person(db):
	"""
	让用户输入ID和所需的字段，并从shelf对象中获取相应的数据
	"""
	pid = input('Enter ID number: ')
	field = input('What would you like to know? (name, age, phone) ')
	field = field.strip().lower()
	print(field.capitalize() + ':', db[pid][field])
	
def print_help():
	print('The available commands are:')
	print('store : Stores information about a person')
	print('lookup : Looks up a person from ID number')
	print('quit : Save changes and exit')
	print('? : Prints this message')
	
def enter_command():
	cmd = input('Enter command (? for help): ')
	cmd = cmd.strip().lower()
	return cmd
	
def main():
	database = shelve.open('C:\\database.dat') # 你可能想修改这个名称
	try:
		while True:
			cmd = enter_command()
			if cmd == 'store':
				store_person(database)
			elif cmd == 'lookup':
				lookup_person(database)
			elif cmd == '?':
				print_help()
			elif cmd == 'quit':
				return
		finally:
		database.close()
		
if name == '__main__': main()