总结python中列表去重的方法与性能对比

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u010339879/article/details/82291878
一 总结python 列表去重的方法, 与性能对比
1 总结一下 列表 去重的方法,也参考了网上一些博客 的写法.

好奇心,让我来测试一下 他们的效率如何?


#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
"""
@author: Frank 
@contact: [email protected]
@file: test_duration_time.py
@time: 2018/8/26 下午5:23

讨论列表去重的方法

"""
import random

from tools import fn_timer
import numpy as np
import pandas  as pd

Count = 10_0000

l = [random.randint(-5, 5) for i in range(Count)]


# print(l)


@fn_timer
def fun_set(l):
    return list(set(l))


@fn_timer
def fun_comprehension(l):
    """
    用列表推导式
    :param l:
    :return:
    """
    ret = []

    [ret.append(i) for i in l if i not in ret]

    return ret


@fn_timer
def fun_dict_fromkeys(l):
    l2 = dict.fromkeys(l).keys()

    return list(l2)


@fn_timer
def fun_dict_comprehension(l):
    l2 = dict.fromkeys(l)

    return [key for key in l2]




@fn_timer
def fun_series(l):
    s = pd.Series(l)
    s.drop_duplicates(inplace=True)
    return s.tolist()


def test_duration_time():
    ret1 = fun_set(l)
    ret2 = fun_comprehension(l)
    ret3 = fun_dict_fromkeys(l)

    ret4 = fun_dict_comprehension(l)

    ret5 = fun_series(l)

    print(ret1)
    print(ret2)
    print(ret3)
    print(ret4)
    print(ret5)


if __name__ == '__main__':
    test_duration_time()
简单说一下这几个函数

fun_set 直接调用set() 去重 之后 转成list

fun_comprehension 用列表推导式 遍历list 放到另外一个list中

fun_dict_fromkeys 利用字典的fromkeys() 来生成字典,最后 取keys 转成list

fun_dict_comprehension 是对 fun_dict_fromkeys 的改进, 不用list 转,而是用列表推导式 生成list

fun_series 通过series 去重获取 一个series 之后转成list

首先用random 生成一组随机的数组. Count 用来控制生成数组的长度.

测试结果如下:
Count= 10000


fun_set  total running time 0.00026798248291015625 seconds
fun_comprehension  total running time 0.0012478828430175781 seconds
fun_dict_fromkeys  total running time 0.0003228187561035156 seconds
fun_dict_comprehension  total running time 0.0003199577331542969 seconds
fun_series  total running time 0.008519887924194336 seconds
[0, 1, 2, 3, 4, 5, -1, -5, -4, -3, -2]
[0, -4, 1, -2, -1, -3, 4, 5, -5, 2, 3]
[0, -4, 1, -2, -1, -3, 4, 5, -5, 2, 3]
[0, -4, 1, -2, -1, -3, 4, 5, -5, 2, 3]
[0, -4, 1, -2, -1, -3, 4, 5, -5, 2, 3]

Process finished with exit code 0

Count=100000

fun_set  total running time 0.002073049545288086 seconds
fun_comprehension  total running time 0.009258031845092773 seconds
fun_dict_fromkeys  total running time 0.002582073211669922 seconds
fun_dict_comprehension  total running time 0.0025658607482910156 seconds
fun_series  total running time 0.028805017471313477 seconds
[0, 1, 2, 3, 4, 5, -1, -5, -4, -3, -2]
[-2, 1, 2, -3, -1, 5, 3, -4, -5, 0, 4]
[-2, 1, 2, -3, -1, 5, 3, -4, -5, 0, 4]
[-2, 1, 2, -3, -1, 5, 3, -4, -5, 0, 4]
[-2, 1, 2, -3, -1, 5, 3, -4, -5, 0, 4]

Count = 200000

fun_set  total running time 0.0038759708404541016 seconds
fun_comprehension  total running time 0.018622875213623047 seconds
fun_dict_fromkeys  total running time 0.005755186080932617 seconds
fun_dict_comprehension  total running time 0.0052568912506103516 seconds
fun_series  total running time 0.06122875213623047 seconds

Count= 50 0000

fun_set  total running time 0.009798765182495117 seconds
fun_comprehension  total running time 0.04983782768249512 seconds
fun_dict_fromkeys  total running time 0.021397829055786133 seconds
fun_dict_comprehension  total running time 0.01348114013671875 seconds
fun_series  total running time 0.1094961166381836 seconds
[0, 1, 2, 3, 4, 5, -2, -5, -4, -3, -1]
[1, 4, -3, -1, -5, 3, 0, 5, -4, -2, 2]
[1, 4, -3, -1, -5, 3, 0, 5, -4, -2, 2]
[1, 4, -3, -1, -5, 3, 0, 5, -4, -2, 2]
[1, 4, -3, -1, -5, 3, 0, 5, -4, -2, 2]

Count = 100 0000

fun_set  total running time 0.018161773681640625 seconds
fun_comprehension  total running time 0.09214401245117188 seconds
fun_dict_fromkeys  total running time 0.026532888412475586 seconds
fun_dict_comprehension  total running time 0.024698734283447266 seconds
fun_series  total running time 0.21193814277648926 seconds
2.分析结果

其实从时间上来看,已经比较明显了, fun_set 是明显有优势的. 其次应该是 fun_dict_comprehension 函数, 在次之 fun_dict_fromkeys, 在次之
fun_comprehension ,最后是 fun_series .

所有时间

所花费的时间对比
fun_set < fun_dict_comprehension< fun_dict_fromkeys< fun_comprehension < fun_series

3.总结

去重 其实直接用 set(),之后转list 性能不是那么差, 100w 数据量差不多20ms 左右就能完成. 用其他的方法, 好像也都没有 直接用set() 之后再转list 快.
当然如果您有更好的办法, 更高的效率去重,欢迎告诉我,一起学习.


分享快乐,留住感动.2018-09-01 23:13:58 –frank

猜你喜欢

转载自blog.csdn.net/u010339879/article/details/82291878