How to remove strings containing certain words from list FASTER - Code World

How to remove strings containing certain words from list FASTER

Others 2022-04-20 04:18:53 views: 0

hyhno01 :

There is a list of sentencens sentences = ['Ask the swordsmith', 'He knows everything']. The goal is to remove those sentences that a word from a wordlist lexicon = ['word', 'every', 'thing']. This can be achieved using the following list comprehension:

newlist = [sentence for sentence in sentences if not any(word in sentence.split(' ') for word in lexicon)]

Note that if not word in sentence is not a sufficient condition as it would also remove sentences that contain words in which a word from the lexicon is embedded, e.g. word is embedded in swordsmith, and every and thing are embedded in everything.

However, my list of sentences consists of 1.000.000 sentences and my lexicon of 200.000 words. Applying the list comprehension mentioned takes hours! Because of that, I'm looking for a faster method to remove strings from a list that contain words from another list. Any suggestions? Maybe using regex?

Mad Physicist :

Do your lookup in a set. This makes it fast, and alleviates the containment issue because you only look for whole words in the lexicon.

lexicon = set(lexicon)
newlist = [s for s in sentences if not any(w in lexicon for w in s.split())]

This is pretty efficient because w in lexicon is an O(1) operation, and any short-circuits. The main issue is splitting your sentence into words properly. A regular expression is inevitably going to be slower than a customized solution, but may be the best choice, depending on how robust you want to be against punctuation and the like. For example:

lexicon = set(lexicon)
pattern = re.compile(r'\w+')
newlist = [s for s in sentences if not any(m.group() in lexicon for m in pattern.finditer(s))]

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=8174&siteId=1

How to remove strings containing certain words from list FASTER

How to remove strings containing certain words from list FASTER

How to remove certain values from a pandas dataframe, which are not in a list?

Remove certain elements in one list based on condition from another list

How to remove null values from array containing array

Remove objects from list - contains strings - Comparing the List

Remove element from list or set if it contains certain character

Java 8 remove duplicate strings irrespective of case from a list

How to remove duplicate entries from the list Python

How to remove controller list from Swagger UI

How to remove spaces and carriage returns from strings in python

Transform List of Strings to List of objects containing those strings as field

How to extract different substrings between certain substrings from array list?

How to remove the event from the event delegate list of events

How to remove a field from each object of a list of object?

How to remove all occurrences of an element from list in Python?

How Can i remove duplicate from a list in Groovy

How to remove null or empty list from an "unmodifiable" map

How to remove the C # Strings in space?

Remove certain elements from a collection while traversing it

How to remove spaces from string only if it occurs once between two words but not if it occurs thrice?

How can i replace a certain element in a list with an element from another list according to a condition

Python algorithm exercise: remove strings in a list

Remove doctype containing entity from xml using java

remove empty elements from the list

Several ways to remove spaces from Java strings

remove sublist from a single list of list python

How do I remove the last " & " in a concatenation of Strings?

How to remove a list of designated origin

How to filter a list of strings in Python

Recommended

The United States plans to restrict the export of large AI models to China and Russia

Apple to reach agreement with OpenAI to bring ChatGPT to iPhone

Ranking

whisper-webui installation tutorial is silky and easy to use

[Base] Laravel concepts laravel basis, the custom service provider: Contracts, ServiceContainer, ServiceProvider, Facades relations

Import torchvision error problem solving DLL: module not found

observer & watch & notify = pub & sub

A small turntable program [HTML + CSS + JS]

CorelDRAW 2018 shortcuts Daquan

Supervise el botón de menú para lograr un gatillo de presión prolongada

JS将时间秒转换成天小时分钟秒的字符串

RIP basic configuration

[Deleted] solution to a problem a few questions (Noip1994)

Daily

More

2024-05-11(32)

2024-05-10(34)

2024-05-09(32)

2024-05-08(18)

2024-05-07(34)

2024-05-06(6)

2024-05-05(0)

2024-05-04(18)

2024-05-03(8)

2024-05-02(0)