自然语言处理基础技术工具篇之spaCy

版权声明:转载请注明出处,谢谢~~ https://blog.csdn.net/m0_37306360/article/details/85872718

更多实时更新的个人学习笔记分享,请关注:
知乎:https://www.zhihu.com/people/yuquanle/columns
微信订阅号:AI小白入门
ID: StudyForAI



spaCy简介

  • spaCy是世界上最快的工业级自然语言处理工具。 支持多种自然语言处理基本功能。
  • 官网地址:https://spacy.io/
  • spaCy主要功能包括分词、词性标注、词干化、命名实体识别、名词短语提取等等。

spaCy

安装:pip install spaCy

国内源安装:pip install spaCy -i https://pypi.tuna.tsinghua.edu.cn/simple

import spacy
nlp = spacy.load('en')
doc = nlp(u'This is a sentence.')

1.tokenize功能

for token in doc:
    print(token)
This
is
a
sentence
.

2.词干化(Lemmatize)

for token in doc:
    print(token, token.lemma_, token.lemma)
This this 1995909169258310477
is be 10382539506755952630
a a 11901859001352538922
sentence sentence 18108853898452662235
. . 12646065887601541794

3.词性标注(POS Tagging)

for token in doc:
    print(token, token.pos_, token.pos)
This DET 89
is VERB 99
a DET 89
sentence NOUN 91
. PUNCT 96

4.命名实体识别(NER)

for entity in doc.ents:
    print(entity, entity.label_, entity.label)

5.名词短语提取

for nounc in doc.noun_chunks:
    print(nounc)
a sentence

猜你喜欢

转载自blog.csdn.net/m0_37306360/article/details/85872718