iOS implements sensitive word filtering DFA algorithm

Recently, a function needs to be implemented to filter strings for sensitive words and text. Make a note here for later study.

1. Introduction to DFA

Among the algorithms for implementing text filtering, DFA is the only better implementation algorithm. DFA stands for Deterministic Finite Automaton, which is a deterministic finite automaton. It obtains the next state through event and current state, that is, event+state=nextstate. The following figure shows its state transition.
Insert image description here
In this figure, the uppercase letters (S, U, V, Q) are states, and the lowercase letters a and b are actions. From the above figure we can see the following relationship

                                    a b b 
                       S -----> U S -----> V U -----> V

In the algorithm to implement sensitive word filtering, we must reduce operations, and DFA has almost no calculations in the DFA algorithm, and some are just state transitions.

references:http://www.iteye.com/topic/336577

2. iOS uses DFA algorithm to implement sensitive word filtering

The key to implementing sensitive word filtering in iOS is the implementation of the DFA algorithm. First, we analyze the above picture. For example, there are several sensitive words in our sensitive vocabulary: Douluo Dalu, Tangmen, Tang Sanxiaowu. Then the structure I need to build is as follows:
Insert image description here
In this way, we build our sensitive vocabulary into a tree similar to one by one, so that when we judge whether a word is a sensitive word, the matching scope of the retrieval is greatly reduced. For example, if we want to judge Tangmen, we can confirm which tree we need to search based on the first word, and then search in this tree.

But how to judge that a sensitive word has ended? Use the identification bit to determine. That is: each branch adds an end identifier.
Insert image description here
Disadvantages - nextWord can solve it, please refer to the program for details
Insert image description here

The key procedures are implemented as follows:

//创建Node
- (void)addFilterWords: (NSArray *)filterWords {
    
    
    self.root = [NSMutableDictionary dictionary];
    for (NSString *str in filterWords) {
    
    
        if(str.length > 0)
            [self insertWords:str];
    }
}
//敏感词words 插入树枝, 
-(void)insertWords:(NSString *)words{
    
    
    NSMutableDictionary *node = self.root;
    /*
     1、当i==0, node == self.root
        当i>=1, node == self.root[word]
     */
    for (int i = 0; i < words.length; i ++) {
    
    
        NSString *word = [words substringWithRange:NSMakeRange(i, 1)];
        
        if (node[word] == nil) {
    
    
            node[word] = [NSMutableDictionary dictionary];
        }
        
        node = node[word]; //指向self.root[word]
    }
    
    //敏感词最后一个字符标识
    node[EXIST] = [NSNumber numberWithInt:1];
}

//过滤str中得敏感字
- (NSString *)filter:(NSString *)str {
    
    
    
    if (self.isFilterClose || !self.root) {
    
    
        return str;
    }
    
    NSMutableString *result = result = [str mutableCopy];
    
    for (int i = 0; i < str.length; i ++) {
    
    
        NSString *subString = [str substringFromIndex:i];
        NSMutableDictionary *node = [self.root mutableCopy] ;
        int num = 0;
        
        for (int j = 0; j < subString.length; j ++) {
    
    
            NSString *word = [subString substringWithRange:NSMakeRange(j, 1)];
            //解决上图弊端
            NSString *nextWord;
            if (j+1<subString.length) {
    
    
                nextWord = [subString substringWithRange:NSMakeRange(j+1, 1)];
            }
            
            if (node[word] == nil) {
    
    
                break;
            }else{
    
    
                num ++;
                node = node[word];
            }
            
            //有节点,并且不存在下一个枝干,则敏感词匹配成功
            if ([node[EXIST]integerValue] == 1 && node[nextWord] == nil) {
    
    
                
                NSMutableString *symbolStr = [NSMutableString string];
                for (int k = 0; k < num; k ++) {
    
    
                    [symbolStr appendString:@"*"];
                }
                
                [result replaceCharactersInRange:NSMakeRange(i, num) withString:symbolStr];
                
                i += j;
                break;
            }
        }
    }
    
    return result;
}

demo
https://www.jianshu.com/p/6921a550bf3a

Guess you like

Origin blog.csdn.net/haifangnihao/article/details/97277811