pyhanlp performs keyword extraction and integrates into a custom vocabulary

# 加入词表前
from pyhanlp import *
if __name__ == '__main__':
  text = "基于知识融合的数据挖掘与分析技术"
  keyword_list = HanLP.extractKeyword(text, 5)
  print(keyword_list)
  '''
  	["融合","知识","数据挖掘","技术"]
  '''
  • 1. First define your own vocabulary new_add.txt, the format is: one word per line, without part of speech and frequency;
  • 2. Put the vocabulary in the path of pyhanlp in the lib/python3.7/site-packages/pyhanlp/static/data/dictionary/custom folder
  • 3. Modify the content of the hanlp.properties file, the path is lib/python3.7/site-packages/pyhanlp/static/hanlp.properties.
    Specific method: add new_add.txt to the custom dictionary path on line 20 of the file,
    CustomDictionaryPath =data/dictionary/custom/CustomDictionary.txt; new_add.txt; Modern Chinese Supplementary Thesaurus.txt…
  • 4. Delete the cache file, lib/python3.7/site-packages/pyhanlp/static/data/dictionary/custom/CustomDictionary.txt.bin
  • 5. Re-run the program
# 加入词表后
'''
词表命名为new_add.txt
内容只有一个词语:
数据挖掘与分析
结果如下所示
'''
from pyhanlp import *
if __name__ == '__main__':
  text = "基于知识融合的数据挖掘与分析技术"
  keyword_list = HanLP.extractKeyword(text, 5)
  print(keyword_list)
  '''
  	["融合","知识","数据挖掘与分析","技术"]
  '''

Guess you like

Origin blog.csdn.net/tailonh/article/details/112666350