数据结构和算法05-去重/命名切片/序列中出现次数最多的元素

怎样在一个序列上面保持元素顺序的同时消除重复的值？

对列表去重，并保留顺序
例：

def dedupe(items):
    S = set()
    for item in items:
        if item not in S:
            yield item
            S.add(item)


a = [1, 5, 2, 1, 9, 1, 5, 10]
print(list(dedupe(a)))  #[1, 5, 2, 9, 10]

对字典去重，并保留顺序
例：

def dedupe(items, key=None):
    S = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in S:
            yield item
            S.add(val)
a = [
    {'x':1, 'y':2},
    {'x':1, 'y':3},
    {'x':1, 'y':2},
    {'x':2, 'y':4}
]
#'x'键值相同的元素去重
mlist = list(dedupe(a, key=lambda d: (d['x'],d['y'])))
print(mlist)
#[{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}}

#'x'和'y'键值都相同的元素进行去重
ylist = list(dedupe(a,key=lambda d : d['x']))
print(ylist)  #[{'x': 1, 'y': 2}, {'x': 2, 'y': 4}]

简单去重，不考虑元素顺序

mlist = [1,5,3,6,8,0,3,6,8,2,4]
print(set(mlist))
#{0, 1, 2, 3, 4, 5, 6, 8}

命名切片

例：

word = '9683647163821230739592020257202'

countdays = int(word[11])*int(word[12:14])*int(word[14:16])
print(countdays)  #720

上面的硬编码很难看出谁与谁相乘

使用命名切片优化
例：

word = '9683647163821230739592020257202'
years = slice(11,12)
month = slice(12,14)
days = slice(14,16)

countdays = int(word[years])*int(word[month])*int(word[days])
print(countdays)  #720

tips:如果你有一个切片对象 s，你可以分别调用它的 s.start , s.stop , s.step 属性来
获取更多的信息。比如：

s = slice(5, 50, 2)
print(s.start)  #5
print(s.stop)  #59
print(s.step)  #2

序列中出现次数最多的元素

collections.Counter 类就是专门为这类问题而设计的，它甚至有一个有用的
most common() 方法
例：

words = [
    'look', 'into', 'my', 'eyes', 'look',
    'into', 'my', 'eyes','the', 'eyes', 'the',
    'eyes', 'the', 'eyes', 'not', 'around',
    'the','eyes', "don't", 'look', 'around',
    'the', 'eyes', 'look', 'into','my', 'eyes',
    "you're", 'under'
]

from collections import Counter

word_counts = Counter(words)
# 出现频率最高的 3 个单词
top_three = word_counts.most_common(3)
print(top_three)      #  [('eyes', 8), ('the', 5), ('look', 4)]

数据结构和算法05-去重/命名切片/序列中出现次数最多的元素

怎样在一个序列上面保持元素顺序的同时消除重复的值？

命名切片

序列中出现次数最多的元素

猜你喜欢