大部分内容来源于:https://github.com/Yuzhen-Li/yuzhenli.github.io/wiki/Stanford-CoreNLP%E5%9C%A8Ubuntu%E4%B8%8B%E7%9A%84%E5%AE%89%E8%A3%85%E4%B8%8E%E4%BD%BF%E7%94%A8
1, 安装java运行环境
sudo apt-get install default-jre
sudo apt-get install default-jdk
2, 下载stanford corenlp包
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip unzip stanford-corenlp-full-2018-02-27.zip cd stanford-corenlp-full-2018-02-27/
3, 配置环境变量
for file in `find . -name "*.jar"`; do export CLASSPATH="$CLASSPATH:`realpath $file`"; done
4, 安装
sudo pip3 install stanfordcorenlp
5, 下载中文支持
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar
6,使用方法
from stanfordcorenlp import StanfordCoreNLP nlp = StanfordCoreNLP(r'/mnt/f/CMBNLP/stanford-corenlp-full-2018-02-27/', lang='zh') ## 这里是coreNLP的路径,英文去掉 lang='zh'
使用方法1:wrapper
sentence = '中国科学院大学位于北京。' print(nlp.word_tokenize(sentence)) print(nlp.pos_tag(sentence)) print(nlp.ner(sentence)) print(nlp.parse(sentence)) print(nlp.dependency_parse(sentence))
text = '中国科学院大学位于北京' ## 目前测试openie功能异常 output = nlp.annotate(text, properties={ 'annotators': 'tokenize, ssplit, pos, depparse, parse, openie', 'outputFormat': 'json' })
使用方法2:启用服务器,据说会快一些
from stanfordcorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost', port=9000) # 样例源自https://blog.csdn.net/Hallywood/article/details/80154146 未测试
sentence = "Kosgi Santosh sent an email to Stanford University. He didn't get a reply"
print (('Tokenize:', nlp.coref(sentence))nlp.close())
使用方法3:命令调用
扫描二维码关注公众号,回复:
3473121 查看本文章
<code class="language-html">import subprocess ## 来源同上样例,未测试
subprocess.call(['java','-cp','F:/Program Files/jars/stanford-corenlp-full-2018-02-27/*','-Xmx4g',
'edu.stanford.nlp.pipeline.StanfordCoreNLP',"-annotators",
"tokenize,ssplit,pos,lemma,ner",'-file','subprocesstest.txt'])</code>