StanfordCoreNLP: 英文句子词性还原、词干标注工具包简单使用(Java)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/quiet_girl/article/details/79974788

一、说明

StanfordCoreNLP是Stanford开发的关于自然语言处理的工具包,其包括分词、词性还原以及词性标注等很多功能。具体可参考官网:https://stanfordnlp.github.io/CoreNLP/。 这里主要是将其词性还原功能的简单使用。

二、下载和使用

1、下载地址:https://stanfordnlp.github.io/CoreNLP/,下载界面如下图:
这里写图片描述
2、下载好之后解压,从解压后的文件中找到以下6个jar包,添加到java项目中:
这里写图片描述
3、接下来就可以使用代码直接调用了。

三、代码

import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;

import java.util.List;
import java.util.Properties;

/**
 * 代码功能:词性还原、词干提取
 * jar包下载地址:https://stanfordnlp.github.io/CoreNLP/
 * 工具包API地址:https://stanfordnlp.github.io/CoreNLP/api.html
 */
public class StemmerTest {

    public static void main(String[] args){
        Properties props = new Properties();  // set up pipeline properties
        props.put("annotators", "tokenize, ssplit, pos, lemma");   //分词、分句、词性标注和次元信息。
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        String txtWords = "Franklin said, If a man empties his purse into his head,no man can take it away from him,an investment in knowledge always pays the best interest.";  // 待处理文本
        Annotation document = new Annotation(txtWords);
        pipeline.annotate(document);
        List<CoreMap> words = document.get(CoreAnnotations.SentencesAnnotation.class);
        for(CoreMap word_temp: words) {
            for (CoreLabel token: word_temp.get(CoreAnnotations.TokensAnnotation.class)) {
                String word = token.get(CoreAnnotations.TextAnnotation.class);   // 获取单词信息
                String lema = token.get(CoreAnnotations.LemmaAnnotation.class);  // 获取对应上面word的词元信息,即我所需要的词形还原后的单词
                System.out.println(word + " " + lema);
            }
        }
    }
}

输出结果如下:

Franklin Franklin
said say
, ,
If if
a a
man man
empties empty
his he
purse purse
into into
his he
head head
, ,
no no
man man
can can
take take
it it
away away
from from
him he
, ,
an a
investment investment
in in
knowledge knowledge
always always
pays pay
the the
best best
interest interest
. .


附:
关于词性标注等功能请详见官网API文档:https://stanfordnlp.github.io/CoreNLP/api.html


参考文献:
https://blog.csdn.net/cuixianpeng/article/details/12999537
https://blog.csdn.net/hksskh/article/details/49183175

猜你喜欢

转载自blog.csdn.net/quiet_girl/article/details/79974788