keyword extraction

This note is based on the content taught in Han Xiaoyang's course, written by myself, and is not original:
(The code compilation environment is python3)
- TF-IDF keyword extraction
- jieba.analyse.extract_tags(sentence, topK = 20, withWeighet = False, allowPOS = ())
- sentence is the text to be analyzed
- withWeight returns the keyword weight
- allowPOS = () specifies the part of speech of the returned word

#encoding = utf-8
import jieba.analyse as analyse
lines_1 = open('...\\NBA.txt',encoding = 'utf-8').read()
print(' '.join(analyse.extract_tags(lines_1,topK = 20,allowPOS = ())))
>>> 时间 建议 特别 过程 选择 机会 期待 一家 介绍 很大 交流 韦少 全明星 杜兰特 MVP 全明星赛 威少 指导 两次 周末
  • textrank keyword extraction
  • jieba.analyse.textrank(sentence,topK = 20,withWeight = False,allowPOS = ())
  • sentence is the text to be analyzed
  • topK is the keyword with the largest TF/IDF weight, the default value is 20
  • allowPOS = () specifies the part of speech of the returned word

import jieba.analyse as analyse
lines_1 = open('D:\\study\\NLP\\01_NLPbasis_txt\\Lecture_1\\NBA.txt',encoding = 'utf-8').read()
print(' '.join(analyse.textrank(lines_1,topK = 20, withWeight = False, allowPOS = ('ns','n','vn','v'))))
>>>全明星赛 勇士 正赛 指导 对方 投篮 球员 没有 出现 时间 威少 认为 看来 结果 相隔 助攻 现场 三连庄 介绍 嘉宾
print(' '.join(analyse.textrank(lines_1,topK = 20, withWeight = False, allowPOS = ('ns','n'))))
>>>勇士 正赛 全明星赛 指导 投篮 玩命 时间 对方 现场 结果 球员 嘉宾 时候 全队 主持人 照片 全程 目标 快船队 肥皂剧

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325608960&siteId=291194637
Recommended