While maintaining the order of items in the sequence to eliminate duplicates

In the sequence, often we encounter has a duplicate cases, and sometimes need to eliminate duplicate entries.

Solution: Use the set to construct no duplicate data entry type. Such as:

a = [1, 5, 2, 1, 9, 1, 5, 10]
>>> set(a)
{1, 2, 10, 5, 9}

After using the configuration data set, lost the original sequence elements relative order of the original. At the same time using a simple set have another question: can hash. We know that the objects were created as set list included such as list or dict match type as a constructor parameter set and when, you will have the following error:

>>> c = [{'x': 1, 'y': 2}, {'x': 2, 'y': 3}]
>>> set(c)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'

In order to solve these problems, we can write a function to eliminate duplicates the sequence of hash elimination, while maintaining the same relative order of the child.

def dedupe(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item
            seen.add(item)

>>> a = [1, 5, 2, 1, 9, 1, 5, 10]
>>> list(dedupe(a))
[1, 5, 2, 9, 10]

By constructing a generator, according to the order of traversal of the original items. Traversing each time, to check whether the elements in the collection vessel receiving, if not in, is returned, and added to the set. Ruoyi in the set, skip, continue to traverse.

The only function of all sequences in all children may be useful hash. It will complain responsible for carrying out a set of add.

We achieve the above functions can be improved, so that it can not be special treatment hash (such as dict) of the object, as follows:

def dedupe(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in seen:
            yield item
            seen.add(val)

>>> a = [ {'x':1, 'y':2}, {'x':1, 'y':3}, {'x':1, 'y':2}, {'x':2, 'y':4}] 
>>> list(dedupe(a, key=lambda d: (d['x'],d['y'])))
[{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}]
>>> list(dedupe(a, key=lambda d: d['x']))
[{'x': 1, 'y': 2}, {'x': 2, 'y': 4}]
>>>

Provide a similar max, min key function parameters, and then using lambda anonymous function during the actual call, the actual value val is the value returned by the function anonymous, if the key is not when None.
>>>

Guess you like

Origin www.cnblogs.com/jeffrey-yang/p/11290841.html