列表‘真’去重-删除全部重复元素

最近遇到一个问题:在遍历列表时删除重复内容,不够彻底;再解决后,来分享下我的思路

实际情景

下图是某张表的记录【仅考虑这些字段】,直接看来,就感觉大部分都是重复的,全部去重后也就三条;但在脚本,执行我写的方法后 ,结果还是很多条,我就有些犯迷糊了,没头绪。

实际我用的代码如下:

        for i in abc:
            if abc.count(i) != 1:
                abc.remove(i)

在这里插入图片描述

为了不泄露公司的数据,用列表abc来分享:

    def test_012(self):
        abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               66666, 66666, 66666, 66666,
               7878, 7878]
        print(len(abc), '最初的长度')

        for i in abc:
            if abc.count(i) != 1:
                abc.remove(i)
        print(len(abc), '处理后的长度')

这一部分是我第一版的代码,感觉没毛病呢。执行结果却是:
在这里插入图片描述
但肉眼看着就觉得不对;若是全部去重,肯定是4条啊,为什么会是7条呢。

解决方法

上面的思路是:对列表abc做个遍历,若当前元素在列表中出现的次数不等于1,就移除当前元素;但这次移除某元素后的 列表(长度、元素)实际变了。【遍历在新的列表操作】看下图:

在这里插入图片描述

那要怎样才能实现呢?

思路1a:拿出一个不会变、元素相同的列表 来代替列表abc,因为列表 真正的拷贝是要使用分片的方法,故而

    def test_234(self):
        abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               66666, 66666, 66666, 66666,
               7878, 7878]
        print(len(abc), '最初的长度')
        for i in abc[:]:
            if abc.count(i) != 1:
                abc.remove(i)
        print(len(abc), '处理后的长度')
        print(abc)

倒序的列表 来代替列表abc:

    def test_345(self):
        abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               66666, 66666, 66666, 66666,
               7878, 7878]
        print(len(abc), '最初的长度')
        for i in abc[::-1]:
            if abc.count(i) != 1:
                abc.remove(i)
        print(len(abc), '处理后的长度')
        print(abc)

思路1a:拿出一个元素相同的元组 来代替列表abc

    def test_45678(self):
        abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               66666, 66666, 66666, 66666,
               7878, 7878]
        print(len(abc), '最初的长度')
        for i in tuple(abc):
            if abc.count(i) != 1:
                abc.remove(i)
        print(len(abc), '处理后的长度')
        print(abc)

思路2:如果我非要坚持使用会变的列表abc作遍历呢,我想到的是 让其一直在做遍历,直到某次遍历前后 列表abc的长度不做改变,就跳出循环;

    def test_123(self):
        abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               66666, 66666, 66666, 66666,
               7878, 7878]
        print(len(abc), '最初的长度')
        while True:
            length1 = len(abc)
            for i in abc:
                if abc.count(i) != 1:
                    abc.remove(i)
                    length2 = len(abc)
            if length1 == length2:
                break
        print(len(abc), '处理后的长度')
        print(abc)

思路3:把列表abc中count为1的元素内容扔进新list

    def test_012(self):
        abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               66666, 66666, 66666, 66666,
               7878, 7878]
        print(len(abc), '最初的长度')

        ABC = list()
        for i in abc[:]:
            if abc.count(i) != 1:
                abc.remove(i)
            else:
                ABC.append(i)

        print(len(ABC), '处理后的长度')
        print(ABC)
    def test_7892(self):
        abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               66666, 66666, 66666, 66666,
               7878, 7878]
        print(len(abc), '最初的长度')
        temp = abc[:]
        abc.clear()
        for e in temp:
            if e not in abc:
                abc.append(e)

        print(len(abc), '处理后的长度')
        print(abc, '新的')

上面的执行结果:
在这里插入图片描述

思路4:如果不特别关注列表中元素的前后顺序,使用set()函数

    def test_789(self):
        abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('115', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               ('66666', 'C2019093015002', 1569826868000, 1569826868000),
               66666, 66666, 66666, 66666,
               7878, 7878]
        print(len(abc), '最初的长度')
        set_abc = set(abc)
        print(len(set_abc), '处理后的长度')
        print(list(set_abc), '新的')

这一部分执行的结果就有可能是:

在这里插入图片描述

交流技术 欢迎+QQ 153132336 zy
个人博客 https://blog.csdn.net/zyooooxie

发布了78 篇原创文章 · 获赞 24 · 访问量 3万+

猜你喜欢

转载自blog.csdn.net/zyooooxie/article/details/102768653