怎样在一个序列上面保持元素顺序的同时消除重复的值?
- 对列表去重,并保留顺序
例:
def dedupe(items):
S = set()
for item in items:
if item not in S:
yield item
S.add(item)
a = [1, 5, 2, 1, 9, 1, 5, 10]
print(list(dedupe(a))) #[1, 5, 2, 9, 10]
- 对字典去重,并保留顺序
例:
def dedupe(items, key=None):
S = set()
for item in items:
val = item if key is None else key(item)
if val not in S:
yield item
S.add(val)
a = [
{'x':1, 'y':2},
{'x':1, 'y':3},
{'x':1, 'y':2},
{'x':2, 'y':4}
]
#'x'键值相同的元素去重
mlist = list(dedupe(a, key=lambda d: (d['x'],d['y'])))
print(mlist)
#[{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}}
#'x'和'y'键值都相同的元素进行去重
ylist = list(dedupe(a,key=lambda d : d['x']))
print(ylist) #[{'x': 1, 'y': 2}, {'x': 2, 'y': 4}]
- 简单去重,不考虑元素顺序
mlist = [1,5,3,6,8,0,3,6,8,2,4]
print(set(mlist))
#{0, 1, 2, 3, 4, 5, 6, 8}
命名切片
例:
word = '9683647163821230739592020257202'
countdays = int(word[11])*int(word[12:14])*int(word[14:16])
print(countdays) #720
上面的硬编码很难看出谁与谁相乘
- 使用命名切片优化
例:
word = '9683647163821230739592020257202'
years = slice(11,12)
month = slice(12,14)
days = slice(14,16)
countdays = int(word[years])*int(word[month])*int(word[days])
print(countdays) #720
tips:如果你有一个切片对象 s,你可以分别调用它的 s.start , s.stop , s.step 属性来
获取更多的信息。比如:
s = slice(5, 50, 2)
print(s.start) #5
print(s.stop) #59
print(s.step) #2
序列中出现次数最多的元素
- collections.Counter 类就是专门为这类问题而设计的,它甚至有一个有用的
most common() 方法
例:
words = [
'look', 'into', 'my', 'eyes', 'look',
'into', 'my', 'eyes','the', 'eyes', 'the',
'eyes', 'the', 'eyes', 'not', 'around',
'the','eyes', "don't", 'look', 'around',
'the', 'eyes', 'look', 'into','my', 'eyes',
"you're", 'under'
]
from collections import Counter
word_counts = Counter(words)
# 出现频率最高的 3 个单词
top_three = word_counts.most_common(3)
print(top_three) # [('eyes', 8), ('the', 5), ('look', 4)]