java achieve sensitive words (sensitive word) Detailed instructions for use tool

sensitive-word

Usually work, as long as it comes to the user can speak freely (blog, documents, forums), we must consider the sensitivity of processing content.

sensitive-word high-performance tool based on sensitive words DFA algorithm implemented. Tool use java to achieve, to help us solve common problems.

characteristic

  • 6W + thesaurus, and continually optimize update

  • Based on DFA algorithm, better performance

  • Based fluent-api, using the elegant simplicity

  • Support sensitive words of judgment, return, desensitization and other common operations

  • Support for full-width half-angle interchangeable

  • Support case in English swap

Quick Start

ready

  • JDK1.7 +

  • Maven 3.x+

Maven introduced

<dependency>
    <groupId>com.github.houbb</groupId>
    <artifactId>sensitive-word</artifactId>
    <version>0.0.4</version>
</dependency>

api Overview

SensitiveWordBs Class sensitive words as a guide, the core as follows:

method parameter return value Explanation
newInstance() no Bootstrap class Initial boot class
contains(String) String to be verified Boolean value Verify that the string contains sensitive words
findAll(String) String to be verified List of strings Returns a string in all sensitive words
replace(String, char) Replace the specified char sensitive words String Returns string desensitization
replace(String) Use *replace sensitive words String Returns string desensitization

Use Case

All test cases see SensitiveWordBsTest

Judgment contains sensitive words

final String text = "五星红旗迎风飘扬,毛主席的画像屹立在天安门前。";

Assert.assertTrue(SensitiveWordBs.newInstance().contains(text));

Returns a sensitive word

final String text = "五星红旗迎风飘扬,毛主席的画像屹立在天安门前。";

String word = SensitiveWordBs.newInstance().findFirst(text);
Assert.assertEquals("五星红旗", word);

Back to all sensitive words

final String text = "五星红旗迎风飘扬,毛主席的画像屹立在天安门前。";

List<String> wordList = SensitiveWordBs.newInstance().findAll(text);
Assert.assertEquals("[五星红旗, 毛主席, 天安门]", wordList.toString());

The default replacement policy

final String text = "五星红旗迎风飘扬,毛主席的画像屹立在天安门前。";
String result = SensitiveWordBs.newInstance().replace(text);
Assert.assertEquals("****迎风飘扬,***的画像屹立在***前。", result);

Specify alternative content

final String text = "五星红旗迎风飘扬,毛主席的画像屹立在天安门前。";
String result = SensitiveWordBs.newInstance().replace(text, '0');
Assert.assertEquals("0000迎风飘扬,000的画像屹立在000前。", result);

More Features

Subsequent many features, mainly for various treatments for a variety of situations, to enhance hit rate sensitive words as possible.

It was a long offensive and defensive battle.

Ignore case

final String text = "fuCK the bad words.";

String word = SensitiveWordBs.newInstance().findFirst(text);
Assert.assertEquals("fuCK", word);

Ignore the half-angle fillet

final String text = "fuck the bad words.";

String word = SensitiveWordBs.newInstance().findFirst(text);
Assert.assertEquals("fuck", word);

Late road-map

  • Digital conversion process

  • Traditional and simplified exchange

  • Repeat the word

  • Pause word

  • Pinyin swap

  • User-defined list of sensitive words and White

  • Text Mirror Flip

  • Sensitive word label support

Further Reading

Sensitive words thinking tools to achieve

DFA algorithm to explain

Sensitive thesaurus optimization process

Stop thinking of the word record

Guess you like

Origin www.cnblogs.com/houbbBlogs/p/12171455.html