jibe之Keywords

代码:

import jieba.analyse

sentence = "我爱北京天安门"


# 抽取关键词
keywords = jieba.analyse.extract_tags(sentence, topK=20, withWeight=False, allowPOS=())
# Extract keywords from sentence using TF-IDF algorithm.
# Parameter:
#     - topK: return how many top keywords. `None` for all possible words.
#     - withWeight: if True, return a list of (word, weight);
#                   if False, return a list of words.
#     - allowPOS: the allowed POS list eg. ['ns', 'n', 'vn', 'v','nr'].
#                 if the POS of w is not in this list,it will be filtered.
#     - withFlag: only work with allowPOS is not empty.
#                 if True, return a list of pair(word, weight) like posseg.cut
#                 if False, return a list of words
print("===" * 20)
print(keywords)

# 带权重
keywords = jieba.analyse.extract_tags(sentence, topK=20, withWeight=True, allowPOS=())
print("===" * 20)
for tup in keywords:
    print("%s %.4f"%tup)

运行结果:

============================================================
['天安门', '北京']
============================================================
天安门 4.4977
北京 2.3337

代码:

# 实例化 
Tfidf = jieba.analyse.TFIDF()
keywords = Tfidf.extract_tags(sentence, topK=20, withWeight=False, allowPOS=())
print("===" * 20)
print(keywords)

# 带权重
keywords2 = Tfidf.extract_tags(sentence, topK=20, withWeight=True, allowPOS=())
print("===" * 20)
for tup in keywords2:
    print("%s %.4f"%tup)

运行结果:

============================================================
['天安门', '北京']
============================================================
天安门 4.4977
北京 2.3337

代码:

# 将天安门的idf设置很低
# 载入模块
Tfidf.set_idf_path('idf.txt')
keywords3 = Tfidf.extract_tags(sentence, topK=20, withWeight=False, allowPOS=())
print("===" * 20)
print(keywords3)

# 带权重
keywords3 = Tfidf.extract_tags(sentence, topK=20, withWeight=True, allowPOS=())
print("===" * 20)
for tup in keywords3:
    print("%s %.4f"%tup)

运行结果:

============================================================
['北京', '天安门']
============================================================
北京 0.0500
天安门 0.0500

代码:

# 用textrank对文章的关键词进行提取
keywords = jieba.analyse.textrank(sentence, topK=20, withWeight=False, allowPOS=('ns','n','vn','v'))
print("===" * 20)
print(keywords)

# 带权重
keywords = jieba.analyse.textrank(sentence, topK=20, withWeight=True, allowPOS=('ns','n','vn','v'))
print("===" * 20)
for tup in keywords:
    print("%s %.4f"%tup)

运行结果:

扫描二维码关注公众号,回复: 560580 查看本文章
============================================================
['天安门', '北京']
============================================================
天安门 1.0000
北京 0.9961

代码:

# 实例化
Textrank = jieba.analyse.TextRank()
keywords = Textrank.textrank(sentence)
print(keywords)

运行结果:

['天安门', '北京']

猜你喜欢

转载自blog.csdn.net/wangsiji_buaa/article/details/80265097
今日推荐