[Introduction to DFA Algorithm for Sensitive Word Filtering]

1. It is necessary to filter sensitive words in the project. First, there are several options to choose from:

a. After directly organizing the sensitive words into String, use the indexOf method to query.

b. SQL query after traditional sensitive words are stored in the database.

c. Use Lucene to build a word segmentation index to query.

d. Use the DFA algorithm to carry out.

First of all, there are thousands of sensitive words collected by the project, and it is definitely not possible to use the a scheme. Secondly, in order to facilitate future scalability and minimize the dependence on the database, plan b is abandoned. Then Lucene itself is used as a local index. After the sensitive words are added, the index needs to be updated, and based on the principle of light weight, we do not want to introduce more libraries, so we give up the c scheme. So we choose d scheme as the research goal.

 

2. Introduction to DFA Algorithm

The full name of DFA is: Deterministic Finite Automaton, which is a deterministic finite automaton. Its characteristics are: there is a finite set of states and some edges from one state to another state, each edge is marked with a symbol, one of the states is the initial state, and some states are the final state. But unlike an indeterminate finite automaton, there will not be two edge markers with the same sign starting from the same state in a DFA.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326402854&siteId=291194637
Recommended