版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/muumian123/article/details/81942266
先介绍几种列表去重方法:
1. 清晰明了版(不改变顺序):
ids = [1,2,3,3,4,2,3,4,5,6,1]
news_ids = []
for id in ids:
if id not in news_ids:
news_ids.append(id)
print (news_ids)
2. 简介快速版
利用set的自动去重功能:
li=[1,2,3,4,5,1,2,3]
li=list(set(li))
print(li)
这样处理会改变list原有顺序,若想保持顺序不变,则如下:
li=[1,2,3,4,5,1,2,3]
new_li=list(set(li))
new_li.sort(key=li.index)
print(new_li)
3. 匿名函数版
ids = [1,4,3,3,4,2,3,4,5,6,1]
func = lambda x,y:x if y in x else x + [y]
reduce(func, [[], ] + ids)
4. 高级模块版
import itertools
ids = [1,4,3,3,4,2,3,4,5,6,1]
ids.sort()
it = itertools.groupby(ids)
for k, g in it:
print (k)
5. 数量级GB左右文本快速去重
#coding=utf-8
import sys, re, os
def quchong(infile, outfile):
inopen = open(infile, 'r', encoding='utf-8')
outopen = open(outfile, 'w', encoding='utf-8')
data = inopen.read()
list_1 = list(set(data.split('\n')))
print(list_1)
for line in list_1:
if line != '':
outopen.write(line + '\n')
inopen.close()
outopen.close()
有优秀的方法欢迎交流指正!