nlp part-of-speech tagging

nlp part-of-speech tagging

Different from the word segmentation function, the form of the jieba library and the pyltp library part-of-speech tagging function is very different.

Jieba's part-of-speech tagging function is similar to the word segmentation function. The jieba.posseg.cut(sentence,HMM=True)function has two parameters, and sentence is a piece of text.

The part-of-speech tagging function of pyltp pyltp.Postagger.postag(words)has one parameter, words is the return value of the word segmentation module, or the Python native list type.

The part-of-speech tagging function in nltk is similar to that of pyltp, and it is also an input list type. nltk.pos_tag(tokens,tagset=None,lang='eng')Among them, tokens is a list-type phrase; tagset is a specified tag set, including "universal", "wsj" and "brown". Different tag sets identify different tags for part-of-speech; lang is a language type, currently supported are " eng" and "rus", support for "zho" needs to be improved.

#coding:utf-8
import os
import nltk
import jieba
import jieba.posseg as pseg
from pyltp import Segmentor,Postagger

text='你是我的眼'

#jieba分词同时标注词性
segs=pseg.cut(text)
for word,pos in segs:
    print('%s %s'%(word,pos))

#pyltp以list为参数标注词性
data_dir=r"D:\ltp_data"
segmentor=Segmentor()
segmentor.load(os.path.join(data_dir,'cws.model'))
ptgger=Postagger()
ptgger.load(os.path.join(data_dir,'pos.model'))
segs2=segmentor.segment(text)
segs2=list(segs2)
poses2=ptgger.postag(segs2)
for i in range(len(segs2)):
    print('%s %s'%(segs2[i],poses2[i]))

segmentor.release()
ptgger.release()

segs3=nltk.pos_tag(segs2,lang='zho')
for word,pos in segs3:
    print('%s %s'%(word,pos))

The results are as follows:

你 r
是 v
我 r
的 uj
眼 n
你 r
是 v
我 r
的 u
眼 n
你 JJ
是 NNP
我 NNP
的 NNP
眼 NN

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324961417&siteId=291194637