【HanLP】--Natural language processing scenario application

I. Introduction

HanLP is a toolkit composed of a series of models and algorithms. Its main functions include word segmentation, part-of-speech tagging, keyword extraction, automatic summarization, dependency syntax analysis, named entity recognition, phrase extraction, pinyin conversion, Simplified-Traditional conversion, etc.
The following will introduce how HanLP is integrated locally and the application of some common functions in the project!

2. Springboot integrates HanLP

HanLP requires some data sets and configurations [Refer to HanLP's GitHub project ]. Prepare data and hanlp.properties and put them in the /resources directory.
Insert image description here
Reference jar package

<dependency>
            <groupId>com.hankcs</groupId>
            <artifactId>hanlp</artifactId>
            <version>portable-1.7.5</version>
        </dependency>

----Follow the above configuration and you can use HanLP.

3. HanLP word segmentation

Related code operations

  String[] wordStr = new String[]{
   
    
    "中华","华夏","中国","炎黄"};
        List<String> wordList = Arrays.asList(wordStr);
        wordList.forEach(item -> CustomDictionary.insert(item, "cus"));
        BinTrie<CoreDictionary.Attribute> trie =  CustomDictionary.getTrie();

 List<Term> termList1 = HanLP.segment("你好,欢迎使用HanLP汉语处理包!中华人.华夏无敌");
        System.out.println("1-->" + termList1);
        for (Term term : termList1) {
   
    
    
            String word = term.word;
            String nature = term.nature.toString();
            System.out.println("word-->" + word+"    nature--->"+nature);
        }

The output is:

1-->[你好/vl,/w, 欢迎/v, 使用/v, HanLP/nx, 汉语/gi, 处理/vn,/v,/w,/f, 华人/n, ./w, 华夏/cus, 无敌/vi]
word-->你好    nature--->

Guess you like

Origin blog.csdn.net/xunmengyou1990/article/details/131768960