Determining whether the text string in the dictionary is determined whether there is an element of a set

Determine whether a text contains a word in a dictionary

Bloom algorithm

Bloom filter required under what circumstances? - Avoid high memory

First look at some of the more common examples

  • Word processing software, it is necessary to check whether an English word spelled correctly
  • In the FBI, a suspect's name is already on the list of suspects
  • In the crawler where a URL is being visited
  • yahoo, gmail and other mail spam filtering

These few examples have one thing in common:  how to determine whether there is an element in a collection?

Conventional thinking

  • Array
  • List
  • Tree, balanced binary tree, Trie
  • Map (red-black tree)
  • Hash table

For low-dictionary memory, as follows:

1 
import jieba 2 def check(s): 3 huangfan_path = 'path/to/dict.txt' 4 jieba.load_userdict(huangfan_path) 5 huangfan_words_dict = set() 6 with open(huangfan_path, 'rb') as fr: 7 for line in fr.readlines(): 8 huangfan_words_dict.add(line.strip().decode('utf-8')) 9 return set(jieba.lcut(s)) & self.huangfan_words_dict

 

Guess you like

Origin www.cnblogs.com/cupleo/p/11410564.html