python之NLP词性标注 - 代码天地

python之NLP词性标注

其他 2019-06-15 10:48:09 阅读次数: 0

1、知识点

包括中文和英文的词性标注
主要使用的库是nltk和jiaba

2、代码

# coding = utf-8

import nltk
from nltk.corpus import stopwords
from nltk.corpus import brown
import numpy as np
"""
标注步骤:
    1、清洗，分词
    2、标注
    
FAQ:
    1、 Resource punkt not found.
        请安装punkt模块 
    2、安装average_perceptron tagger
    3、Resource sinica_treebank not found
        请安装sinica_treebank模块
"""
def english_label():
    """
    英文词性标注
    :return:
    """
    # 分词
    text = "Sentiment analysis is a challenging subject in machine learning.\
     People express their emotions in language that is often obscured by sarcasm,\
      ambiguity, and plays on words, all of which could be very misleading for \
      both humans and computers.".lower()
    text_list = nltk.word_tokenize(text)
    # 去掉标点符号
    english_punctuations = [',', '.', ':', ';', '?', '(', ')', '[', ']', '&', '!', '*', '@', '#', '$', '%']
    text_list = [word for word in text_list if word not in english_punctuations]
    # 去掉停用词
    stops = set(stopwords.words("english"))
    text_list = [word for word in text_list if word not in stops]

    list = nltk.pos_tag(text_list) #打标签
    print(list)


def chineses_label():
    import jieba.posseg as pseg
    import re
    """
    fool也可以针对中文词性标注
    HanLP词性标注集
    案例使用jieba进行词性标注
    :return:
    """
    str = "我爱你，是粉色，舒服 ，舒服，士大夫"
    posseg_list = re.sub(r'[，]', " ", str)
    posseg_list =pseg.cut(posseg_list)
    print(posseg_list)
    print(' '.join('%s/%s' % (word, tag) for (word, tag) in posseg_list))

猜你喜欢

转载自www.cnblogs.com/ywjfx/p/11026712.html

python之NLP词性标注

【NLP】Python之jieba词性标注实例

【NLP】Python词性标注之词性解释

nlp词性标注

NLP --- 词性标注

python.nlp随笔（五）词性标注详解

【NLP】Python之词性标注界面化实现

NLP(11): 词性标注实战

【Python & NLP】关于语料库标注——词性标注、分词标注、类别标签等-例如brat

Python 词性标注

python jieba 词性标注

python的词性标注

【NLP】Python3.6.5中使用 Stanford NLP工具包进行词性标注

NLP学习（六）-词性标注问题

NLP笔记 --- 4.词性标注

NLP（五）词性标注和文法

[NLP]OpenNLP词性标注器的使用

NLP实战-词性标注-维特比算法

【python 走进NLP】使用Jieba进行中文词性标注

jieba之posseg(词性标注)

02-NLP-05-使用HMM进行词性标注

【NLP】【四】jieba源码分析之词性标注

中文 NLP（6）-- stanford 训练词性标注模型

中文 NLP（5） -- 宾州树《汉语词性标注规范》

NLP基础之分词、向量化、词性标注

NLP基础-词性标注应用去除停用词

【NLP】一文了解词性标注CRF模型

自然语言处理（NLP）- HMM+VITERBI算法实现词性标注（解码问题）（动态规划）（Python实现）

词性标注

python 自然语言处理词性标注

今日推荐

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

【转】spring中对控制反转和依赖注入的理解

tms webcore 安装和使用

java程序员进阶相关书籍

SpringMVC接受请求参数、

如何保存训练好的机器学习模型

MyEclipse、Eclipse设置项目JDK的三个地方

商超行业微信小程序开发定制一般多少钱（行业技术人员解读）

Markdown编辑器语言——30分钟入门到到精通

Linux系统下MongoDB的简单安装与基本操作

Power Strings

每日归档

更多

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)