sensitive-word
Usually work, as long as it comes to the user can speak freely (blog, documents, forums), we must consider the sensitivity of processing content.
sensitive-word high-performance tool based on sensitive words DFA algorithm implemented. Tool use java to achieve, to help us solve common problems.
characteristic
6W + thesaurus, and continually optimize update
Based on DFA algorithm, better performance
Based fluent-api, using the elegant simplicity
Support sensitive words of judgment, return, desensitization and other common operations
Support for full-width half-angle interchangeable
Support case in English swap
Quick Start
ready
JDK1.7 +
Maven 3.x+
Maven introduced
<dependency>
<groupId>com.github.houbb</groupId>
<artifactId>sensitive-word</artifactId>
<version>0.0.4</version>
</dependency>
api Overview
SensitiveWordBs
Class sensitive words as a guide, the core as follows:
method | parameter | return value | Explanation |
---|---|---|---|
newInstance() | no | Bootstrap class | Initial boot class |
contains(String) | String to be verified | Boolean value | Verify that the string contains sensitive words |
findAll(String) | String to be verified | List of strings | Returns a string in all sensitive words |
replace(String, char) | Replace the specified char sensitive words | String | Returns string desensitization |
replace(String) | Use * replace sensitive words |
String | Returns string desensitization |
Use Case
All test cases see SensitiveWordBsTest
Judgment contains sensitive words
final String text = "五星红旗迎风飘扬,毛主席的画像屹立在天安门前。";
Assert.assertTrue(SensitiveWordBs.newInstance().contains(text));
Returns a sensitive word
final String text = "五星红旗迎风飘扬,毛主席的画像屹立在天安门前。";
String word = SensitiveWordBs.newInstance().findFirst(text);
Assert.assertEquals("五星红旗", word);
Back to all sensitive words
final String text = "五星红旗迎风飘扬,毛主席的画像屹立在天安门前。";
List<String> wordList = SensitiveWordBs.newInstance().findAll(text);
Assert.assertEquals("[五星红旗, 毛主席, 天安门]", wordList.toString());
The default replacement policy
final String text = "五星红旗迎风飘扬,毛主席的画像屹立在天安门前。";
String result = SensitiveWordBs.newInstance().replace(text);
Assert.assertEquals("****迎风飘扬,***的画像屹立在***前。", result);
Specify alternative content
final String text = "五星红旗迎风飘扬,毛主席的画像屹立在天安门前。";
String result = SensitiveWordBs.newInstance().replace(text, '0');
Assert.assertEquals("0000迎风飘扬,000的画像屹立在000前。", result);
More Features
Subsequent many features, mainly for various treatments for a variety of situations, to enhance hit rate sensitive words as possible.
It was a long offensive and defensive battle.
Ignore case
final String text = "fuCK the bad words.";
String word = SensitiveWordBs.newInstance().findFirst(text);
Assert.assertEquals("fuCK", word);
Ignore the half-angle fillet
final String text = "fuck the bad words.";
String word = SensitiveWordBs.newInstance().findFirst(text);
Assert.assertEquals("fuck", word);
Late road-map
Digital conversion process
Traditional and simplified exchange
Repeat the word
Pause word
Pinyin swap
User-defined list of sensitive words and White
Text Mirror Flip
Sensitive word label support
Further Reading
Sensitive words thinking tools to achieve