最近遇到一个问题:在遍历列表时删除重复内容,不够彻底;再解决后,来分享下我的思路
实际情景
下图是某张表的记录【仅考虑这些字段】,直接看来,就感觉大部分都是重复的,全部去重后也就三条;但在脚本,执行我写的方法后 ,结果还是很多条,我就有些犯迷糊了,没头绪。
实际我用的代码如下:
for i in abc:
if abc.count(i) != 1:
abc.remove(i)
为了不泄露公司的数据,用列表abc来分享:
def test_012(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的长度')
for i in abc:
if abc.count(i) != 1:
abc.remove(i)
print(len(abc), '处理后的长度')
这一部分是我第一版的代码,感觉没毛病呢。执行结果却是:
但肉眼看着就觉得不对;若是全部去重,肯定是4条啊,为什么会是7条呢。
解决方法
上面的思路是:对列表abc做个遍历,若当前元素在列表中出现的次数不等于1,就移除当前元素;但这次移除某元素后的 列表(长度、元素)实际变了。【遍历在新的列表操作】看下图:
那要怎样才能实现呢?
思路1a:拿出一个不会变、元素相同的列表 来代替列表abc,因为列表 真正的拷贝是要使用分片的方法,故而
def test_234(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的长度')
for i in abc[:]:
if abc.count(i) != 1:
abc.remove(i)
print(len(abc), '处理后的长度')
print(abc)
倒序的列表 来代替列表abc:
def test_345(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的长度')
for i in abc[::-1]:
if abc.count(i) != 1:
abc.remove(i)
print(len(abc), '处理后的长度')
print(abc)
思路1a:拿出一个元素相同的元组 来代替列表abc
def test_45678(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的长度')
for i in tuple(abc):
if abc.count(i) != 1:
abc.remove(i)
print(len(abc), '处理后的长度')
print(abc)
思路2:如果我非要坚持使用会变的列表abc作遍历呢,我想到的是 让其一直在做遍历,直到某次遍历前后 列表abc的长度不做改变,就跳出循环;
def test_123(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的长度')
while True:
length1 = len(abc)
for i in abc:
if abc.count(i) != 1:
abc.remove(i)
length2 = len(abc)
if length1 == length2:
break
print(len(abc), '处理后的长度')
print(abc)
思路3:把列表abc中count为1的元素内容扔进新list
def test_012(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的长度')
ABC = list()
for i in abc[:]:
if abc.count(i) != 1:
abc.remove(i)
else:
ABC.append(i)
print(len(ABC), '处理后的长度')
print(ABC)
def test_7892(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的长度')
temp = abc[:]
abc.clear()
for e in temp:
if e not in abc:
abc.append(e)
print(len(abc), '处理后的长度')
print(abc, '新的')
上面的执行结果:
思路4:如果不特别关注列表中元素的前后顺序,使用set()函数
def test_789(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的长度')
set_abc = set(abc)
print(len(set_abc), '处理后的长度')
print(list(set_abc), '新的')
这一部分执行的结果就有可能是:
交流技术 欢迎+QQ 153132336 zy
个人博客 https://blog.csdn.net/zyooooxie