《Python cookbook》

第一章数据结构和算法

— `*`号解压多个变量 —

如果一个可迭代对象的元素个数超过变量个数时，会抛出一个 ValueError 。那么
怎样才能从这个可迭代对象中解压出 N 个元素出来？

>>> record = ('Dave', '[email protected]', '773-555-1212', '847-555-1212')
>>> name, email, *phone_numbers = record
>>> name
'Dave'
>>> email
'[email protected]'
>>> phone_numbers
['773-555-1212', '847-555-1212']
>>>

— 使用 deque(maxlen=N) 构造一个固定大小的队列 —

在迭代操作或者其他操作的时候，怎样只保留最后有限几个元素的历史记录？

>>> from collections import deque
>>> q = deque(maxlen=3)
>>> q.append(1)
>>> q.append(2)
>>> q.append(3)
>>> q
deque([1, 2, 3], maxlen=3)
>>> q.append(4)
>>> q
deque([2, 3, 4], maxlen=3)
>>> q.append(5)
>>> q
deque([3, 4, 5], maxlen=3)
>>> q = deque()
>>> q.append(1)
>>> q.append(2)
>>> q.append(3)
>>> q
deque([1, 2, 3])
>>> q.appendleft(4)
>>> q
deque([4, 1, 2, 3])
>>> q.pop()
3
>>> q
deque([4, 1, 2])
>>> q.popleft()
4

—从一个集合中获得最大或者最小的 N 个元素—

怎样从一个集合中获得最大或者最小的 N 个元素列表？

import heapq
nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
print(heapq.nlargest(3, nums)) # Prints [42, 37, 23]

portfolio = [
{
    
    'name': 'IBM', 'shares': 100, 'price': 91.1},
{
    
    'name': 'AAPL', 'shares': 50, 'price': 543.22},
{
    
    'name': 'FB', 'shares': 200, 'price': 21.09},
{
    
    'name': 'HPQ', 'shares': 35, 'price': 31.75},
{
    
    'name': 'YHOO', 'shares': 45, 'price': 16.35},
{
    
    'name': 'ACME', 'shares': 75, 'price': 115.65}
]
cheap = heapq.nsmallest(3, portfolio, key=lambda s: s['price'])
# 若只想找到一个最大或最小，min(),max()会更快---tip：为什么？

>>> nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
>>> import heapq
>>> heapq.heapify(nums) # 先会先将集合数据进行堆排序后放入一个列表中
>>> nums                # 堆数据结构最重要的特征是 heap[0] 永远是最小的元素
[-4, 2, 1, 23, 7, 2, 18, 23, 42, 37, 8]
>>>
>>> heapq.heappop(nums) # heapq.heappop() 方法得到，该方法会先将第一个元素弹出来，然后
-4                      # 用下一个最小的元素来取代被弹出元素 
>>> heapq.heappop(nums)
1
>>> heapq.heappop(nums)
2

>>> h = []
>>> heappush(h, (5, 'write code'))
>>> heappush(h, (7, 'release product'))
>>> heappush(h, (1, 'write spec'))
>>> heappush(h, (3, 'create tests'))
>>> heappop(h)
(1, 'write spec')
>>> a = (1, 0, Item('foo'))
>>> b = (5, 1, Item('bar'))
>>> c = (1, 2, Item('grok'))
>>> a < b
True
>>> a < c
True

—字典中的键映射多个值—

怎样实现一个键对应多个值的字典 (也叫 multidict )？

from collections import defaultdict
d = defaultdict(list)
d['a'].append(1)
d['a'].append(2)
d['b'].append(4)    # defaultdict(<class 'list'>, {'a': [1, 2], 'b': [4]})
d = defaultdict(set)
d['a'].add(1)
d['a'].add(2)
d['b'].add(4)    # defaultdict(<class 'set'>, {'a': {1, 2}, 'b': {4}})

d = {} # A regular dictionary
d.setdefault('a', []).append(1)
d.setdefault('a', []).append(2)
d.setdefault('b', []).append(4)

d = {}
for key, value in pairs:
	if key not in d:
		d[key] = []
		d[key].append(value)

d = defaultdict(list)
for key, value in pairs:
	d[key].append(value)
# 显然defaultdict更简洁

tip：OrderedDict 的大小是一个普通字典的两倍，内部维护着一个根据键插入顺序排序的双向链表

—字典的运算—

怎样在数据字典中执行一些计算操作 (比如求最小值、最大值、排序等等)？

tip：1，可以使用 zip() 将键和值置换后配合 sorted() 函数来排列字典数据

2，需要注意的是 zip() 函数创建的是一个只能访问一次的迭代器

prices = {'ACME': 45.23,'AAPL': 612.78,'IBM': 205.55,'HPQ': 37.20,'FB': 10.75}

min_price = min(zip(prices.values(), prices.keys()))
# min_price is (10.75, 'FB')
max_price = max(zip(prices.values(), prices.keys()))
# max_price is (612.78, 'AAPL')

min(prices, key=lambda k: prices[k]) # Returns 'FB'
max(prices, key=lambda k: prices[k]) # Returns 'AAPL'

# keys（）键视图的一个很少被了解的特性就是它们也支持集合操作，values() 方法不支持集合操作
a = {'x' : 1,'y' : 2,'z' : 3}
b = {'w' : 10,'x' : 11,'y' : 2}
# Find keys in common
a.keys() & b.keys() # { 'x', 'y' }
# Find keys in a that are not in b
a.keys() - b.keys() # { 'z' }
# Find (key,value) pairs in common
a.items() & b.items() # { ('y', 2) }

# Make a new dictionary with certain keys removed
c = {key:a[key] for key in a.keys() - {'z', 'w'}}
# c is {'x': 1, 'y': 2}

—删除序列相同元素并保持顺序—

怎样在一个序列上面保持元素顺序的同时消除重复的值？

set()只能去重，不能维护顺序

# 序列值为hashable类型
def dedupe(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item
            seen.add(item)
            
>>> a = [1, 5, 2, 1, 9, 1, 5, 10]
>>> list(dedupe(a))
[1, 5, 2, 9, 10]

# 序列值不是hashable类型
def dedupe(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in seen:
            yield item
            seen.add(val)

>>> a = [ {'x':1, 'y':2}, {'x':1, 'y':3}, {'x':1, 'y':2}, {'x':2, 'y':4}]
>>> list(dedupe(a, key=lambda d: (d['x'],d['y'])))
[{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}]

—命名切片—

你的程序已经出现一大堆已无法直视的硬编码切片下标，然后你想清理下代码

###### 0123456789012345678901234567890123456789012345678901234567890'
record = '....................100 .......513.25 ..........'
cost = int(record[20:23]) * float(record[31:37])

SHARES = slice(20, 23)
PRICE = slice(31, 37)
cost = int(record[SHARES]) * float(record[PRICE])

如果你有一个切片对象 s，你可以分别调用它的 s.start , s.stop , s.step 属性来
获取更多的信息
>>> s = slice(5, 50, 2)
>>> s.start
5
>>> s.stop
50
>>> s.step
2

调用切片的 indices(size) 方法将它映射到一个确定大小的序列上，从而使用的时候避免出现 IndexError 异常
>>> s = 'HelloWorld'
>>> a.indices(len(s))
(5, 10, 2)
>>> for i in range(*a.indices(len(s))):
... print(s[i])
W
r
d

—序列中出现次数最多的元素—

怎样找出一个序列中出现次数最多的元素呢？

words = [
'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',
'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the',
'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into',
'my', 'eyes', "you're", 'under'
]
from collections import Counter
word_counts = Counter(words)    # Counter()返回一个统计词频的字典，如下
# 出现频率最高的 3 个单词
top_three = word_counts.most_common(3)
print(top_three)
# Outputs [('eyes', 8), ('the', 5), ('look', 4)]

morewords = ['why','are','you','not','looking','in','my','eyes']

Counter 实例一个鲜为人知的特性是它们可以很容易的跟数学运算操作相结合
>>> a = Counter(words)
>>> b = Counter(morewords)
>>> a
Counter({
    
    'eyes': 8, 'the': 5, 'look': 4, 'into': 3, 'my': 3, 'around': 2,
"you're": 1, "don't": 1, 'under': 1, 'not': 1})
>>> b
Counter({
    
    'eyes': 1, 'looking': 1, 'are': 1, 'in': 1, 'not': 1, 'you': 1,
'my': 1, 'why': 1})
>>> # Combine counts
>>> c = a + b
>>> c
Counter({
    
    'eyes': 9, 'the': 5, 'look': 4, 'my': 4, 'into': 3, 'not': 2,
'around': 2, "you're": 1, "don't": 1, 'in': 1, 'why': 1,
'looking': 1, 'are': 1, 'under': 1, 'you': 1})

—通过某个关键字排序一个字典列表—

你有一个字典列表，你想根据某个或某几个字典字段来排序这个列表。

rows = [
{
    
    'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
{
    
    'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
{
    
    'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
{
    
    'fname': 'Big', 'lname': 'Jones', 'uid': 1004}
]
from operator import itemgetter
rows_by_fname = sorted(rows, key=itemgetter('fname'))
"""
[{'fname': 'Big', 'uid': 1004, 'lname': 'Jones'},
{'fname': 'Brian', 'uid': 1003, 'lname': 'Jones'},
{'fname': 'David', 'uid': 1002, 'lname': 'Beazley'},
{'fname': 'John', 'uid': 1001, 'lname': 'Cleese'}]
"""
rows_by_lfname = sorted(rows, key=itemgetter('lname','fname'))
"""
[{'fname': 'David', 'uid': 1002, 'lname': 'Beazley'},
{'fname': 'John', 'uid': 1001, 'lname': 'Cleese'},
{'fname': 'Big', 'uid': 1004, 'lname': 'Jones'},
{'fname': 'Brian', 'uid': 1003, 'lname': 'Jones'}]
"""
# itemgetter() 有时候也可以用 lambda 表达式代替,但是，使用 itemgetter() 方式会运行的稍微快点(attrgetter（）)。
rows_by_lfname = sorted(rows, key=lambda r: (r['lname'],r['fname']))
# 这样的方法也同样适用于 min() 和 max() 等函数
>>> min(rows, key=itemgetter('uid'))
{
    
    'fname': 'John', 'lname': 'Cleese', 'uid': 1001}

—通过某个字段将记录分组—

你有一个字典或者实例的序列，然后你想根据某个特定的字段比如 date 来分组迭
代访问。

rows = [
{
    
    'address': '5412 N CLARK', 'date': '07/01/2012'},
{
    
    'address': '5148 N CLARK', 'date': '07/04/2012'},
{
    
    'address': '5800 E 58TH', 'date': '07/02/2012'},
{
    
    'address': '2122 N CLARK', 'date': '07/03/2012'},
{
    
    'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'},
{
    
    'address': '1060 W ADDISON', 'date': '07/02/2012'},
{
    
    'address': '4801 N BROADWAY', 'date': '07/01/2012'},
{
    
    'address': '1039 W GRANVILLE', 'date': '07/04/2012'},
]
from operator import itemgetter    # 调用 itertools.groupby() 函数
from itertools import groupby
# Sort by the desired field first
rows.sort(key=itemgetter('date'))
# Iterate in groups
for date, items in groupby(rows, key=itemgetter('date')):
	print(date)
for i in items:
	print(' ', i)

07/01/2012
{
    
    'date': '07/01/2012', 'address': '5412 N CLARK'}
{
    
    'date': '07/01/2012', 'address': '4801 N BROADWAY'}
07/02/2012
{
    
    'date': '07/02/2012', 'address': '5800 E 58TH'}
{
    
    'date': '07/02/2012', 'address': '5645 N RAVENSWOOD'}
{
    
    'date': '07/02/2012', 'address': '1060 W ADDISON'}
07/03/2012
{
    
    'date': '07/03/2012', 'address': '2122 N CLARK'}
07/04/2012
{
    
    'date': '07/04/2012', 'address': '5148 N CLARK'}
{
    
    'date': '07/04/2012', 'address': '1039 W GRANVILLE'}

—过滤序列元素—

你有一个数据序列，想利用一些规则从中提取出需要的值或者是缩短序列

values = ['1', '2', '-3', '-', '4', 'N/A', '5']
def is_int(val):
	try:
		x = int(val)
		return True
	except ValueError:
		return False
ivals = list(filter(is_int, values))
print(ivals)
# Outputs ['1', '2', '-3', '4', '5']

addresses = [
'5412 N CLARK',
'5148 N CLARK',
'5800 E 58TH',
'2122 N CLARK'
'5645 N RAVENSWOOD',
'1060 W ADDISON',
'4801 N BROADWAY',
'1039 W GRANVILLE',
]
counts = [ 0, 3, 10, 4, 1, 7, 6, 1]

现在你想将那些对应 count 值大于 5 的地址全部输出，那么你可以这样做：
>>> from itertools import compress
>>> more5 = [n > 5 for n in counts]
>>> more5
[False, False, True, False, False, True, True, False]
>>> list(compress(addresses, more5))
['5800 E 58TH', '4801 N BROADWAY', '1039 W GRANVILLE']

—从字典中提取子集—

你想构造一个字典，它是另外一个字典的子集。

prices = {
    
    
'ACME': 45.23,
'AAPL': 612.78,
'IBM': 205.55,
'HPQ': 37.20,
'FB': 10.75
}
# Make a dictionary of all prices over 200
p1 = {
    
    key: value for key, value in prices.items() if value > 200}
# Make a dictionary of tech stocks
tech_names = {
    
    'AAPL', 'IBM', 'HPQ', 'MSFT'}
p2 = {
    
    key: value for key, value in prices.items() if key in tech_names}

p1 = dict((key, value) for key, value in prices.items() if value > 200)
# 字典推导方式表意更清晰，并且实际上也会运行的更快些 (在这个例子中，实际测试几乎比 dcit() 函数方式快整整一倍)

—映射名称到序列元素—

你有一段通过下标访问列表或者元组中元素的代码，但是这样有时候会使得你的代
码难以阅读，于是你想通过名称来访问元素。

# 为了说明清楚，下面是使用普通元组的代码
def compute_cost(records):
	total = 0.0
	for rec in records:
		total += rec[1] * rec[2]
	return total

# 下标操作通常会让代码表意不清晰，并且非常依赖记录的结构。下面是使用命名元组的版本
from collections import namedtuple
Stock = namedtuple('Stock', ['name', 'shares', 'price'])
def compute_cost(records):
	total = 0.0
	for rec in records:
		s = Stock(*rec)
		total += s.shares * s.price
	return total

tip:命名元组另一个用途就是作为字典的替代，因为字典存储需要更多的内存空间。但
一个命名元组是不可更改的。

>>> s = Stock('ACME', 100, 123.45)
>>> s
Stock(name='ACME', shares=100, price=123.45)
>>> s.shares = 75
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: can't set attribute

如果你真的需要改变然后的属性，那么可以使用命名元组实例的 replace() 方法，
它会创建一个全新的命名元组并将对应的字段用新的值取代。
>>> s = s._replace(shares=75)
>>> s
Stock(name='ACME', shares=75, price=123.45)

_replace() 方法还有一个很有用的特性就是当你的命名元组拥有可选或者缺失字
段时候，它是一个非常方便的填充数据的方法。
from collections import namedtuple
Stock = namedtuple('Stock', ['name', 'shares', 'price', 'date', 'time'])
# Create a prototype instance
stock_prototype = Stock('', 0, 0.0, None, None)
# Function to convert a dictionary to a Stock
def dict_to_stock(s):
	return stock_prototype._replace(**s)

>>> a = {
    
    'name': 'ACME', 'shares': 100, 'price': 123.45}
>>> dict_to_stock(a)
Stock(name='ACME', shares=100, price=123.45, date=None, time=None)
>>> b = {
    
    'name': 'ACME', 'shares': 100, 'price': 123.45, 'date': '12/17/2012'}
>>> dict_to_stock(b)
Stock(name='ACME', shares=100, price=123.45, date='12/17/2012', time=None)

—合并多个字典或映射—

现在有多个字典或者映射，你想将它们从逻辑上合并为一个单一的映射后执行某些
操作，比如查找值或者检查某些键是否存在。

a = {
    
    'x': 1, 'z': 3 }
b = {
    
    'y': 2, 'z': 4 }
from collections import ChainMap    # ChainMap优先返回前面的
c = ChainMap(a,b)
print(c['x']) # Outputs 1 (from a)
print(c['y']) # Outputs 2 (from b)
print(c['z']) # Outputs 3 (from a) 

>>> len(c)
3
>>> list(c.keys())
['x', 'y', 'z']
>>> list(c.values())
[1, 2, 3]
>>> c['z'] = 10
>>> c['w'] = 40
>>> del c['x']
>>> a
{
    
    'w': 40, 'z': 10}
>>> del c['y']
Traceback (most recent call last):
...
KeyError: "Key not found in the first mapping: 'y'"

《Python cookbook》笔记一

《Python cookbook》

第一章数据结构和算法

— `*`号解压多个变量 —

— 使用 deque(maxlen=N) 构造一个固定大小的队列 —

—从一个集合中获得最大或者最小的 N 个元素—

—字典中的键映射多个值—

tip：OrderedDict 的大小是一个普通字典的两倍，内部维护着一个根据键插入顺序排序的双向链表

—字典的运算—

tip：1，可以使用 zip() 将键和值置换后配合 sorted() 函数来排列字典数据

2，需要注意的是 zip() 函数创建的是一个只能访问一次的迭代器

—删除序列相同元素并保持顺序—

—命名切片—

—序列中出现次数最多的元素—

—通过某个关键字排序一个字典列表—

—通过某个字段将记录分组—

—过滤序列元素—

—从字典中提取子集—

—映射名称到序列元素—

—合并多个字典或映射—

猜你喜欢

《Python cookbook》笔记一

《Python cookbook》

第一章 数据结构和算法

— *号解压多个变量 —

— 使用 deque(maxlen=N) 构造一个固定大小的队列 —

—从一个集合中获得最大或者最小的 N 个元素—

—字典中的键映射多个值—

tip：OrderedDict 的大小是一个普通字典的两倍，内部维护着一个根据键插入顺序排序的双向链表

—字典的运算—

tip：1，可以使用 zip() 将键和值置换后配合 sorted() 函数来排列字典数据

2，需要注意的是 zip() 函数创建的是一个只能访问一次的迭代器

—删除序列相同元素并保持顺序—

—命名切片—

—序列中出现次数最多的元素—

—通过某个关键字排序一个字典列表—

—通过某个字段将记录分组—

—过滤序列元素—

—从字典中提取子集—

—映射名称到序列元素—

—合并多个字典或映射—

猜你喜欢

第一章数据结构和算法

— `*`号解压多个变量 —