nlp part-of-speech tagging
Different from the word segmentation function, the form of the jieba library and the pyltp library part-of-speech tagging function is very different.
Jieba's part-of-speech tagging function is similar to the word segmentation function. The jieba.posseg.cut(sentence,HMM=True)
function has two parameters, and sentence is a piece of text.
The part-of-speech tagging function of pyltp pyltp.Postagger.postag(words)
has one parameter, words is the return value of the word segmentation module, or the Python native list type.
The part-of-speech tagging function in nltk is similar to that of pyltp, and it is also an input list type. nltk.pos_tag(tokens,tagset=None,lang='eng')
Among them, tokens is a list-type phrase; tagset is a specified tag set, including "universal", "wsj" and "brown". Different tag sets identify different tags for part-of-speech; lang is a language type, currently supported are " eng" and "rus", support for "zho" needs to be improved.
#coding:utf-8
import os
import nltk
import jieba
import jieba.posseg as pseg
from pyltp import Segmentor,Postagger
text='你是我的眼'
#jieba分词同时标注词性
segs=pseg.cut(text)
for word,pos in segs:
print('%s %s'%(word,pos))
#pyltp以list为参数标注词性
data_dir=r"D:\ltp_data"
segmentor=Segmentor()
segmentor.load(os.path.join(data_dir,'cws.model'))
ptgger=Postagger()
ptgger.load(os.path.join(data_dir,'pos.model'))
segs2=segmentor.segment(text)
segs2=list(segs2)
poses2=ptgger.postag(segs2)
for i in range(len(segs2)):
print('%s %s'%(segs2[i],poses2[i]))
segmentor.release()
ptgger.release()
segs3=nltk.pos_tag(segs2,lang='zho')
for word,pos in segs3:
print('%s %s'%(word,pos))
The results are as follows:
你 r
是 v
我 r
的 uj
眼 n
你 r
是 v
我 r
的 u
眼 n
你 JJ
是 NNP
我 NNP
的 NNP
眼 NN