Chops :
I am trying to replace a number of elements (around 40k) from a much larger list (3 million elements) with a tag. The code below works however it is extremely slow.
def UNKWords(words):
words = Counter(words)
wordsToBeReplaced = []
for key, value in words.items():
if(value == 1):
wordsToBeReplaced.append(key)
return wordsToBeReplaced
wordsToBeReplaced = UNKWords(trainingWords)
replacedWordsList = ["<UNK>" if word in wordsToBeReplaced else word for word in trainingWords]
Is there a more efficient way of replacing elements in such a large list?
blhsing :
You can make wordsToBeReplaced
a set instead so that lookups can be done in constant time on average rather than linear time:
def UNKWords(words):
return {word for word, count in Counter(words).items() if count == 1}