Python-third-party library jieba library

Jieba is an excellent third-party library

  • Chinese text needs to get individual words through word segmentation
  • Jieba is an excellent third-party library for Chinese word segmentation, which requires additional installation
  • Jieba library provides three word segmentation modes, namely: precise mode, full mode, search engine mode

--- Precise mode: cut the text accurately without redundancy

--- Full mode: Scan all possible words in the text, with redundancy

--- Search Engine Mode: Based on the precise mode, the long words are segmented again

Common functions

  • jieba.lcut (str): Precise mode, return a list-type word segmentation result
  • jieba.lcut (str, cut_all = True): full mode, returns a word segmentation result of a list type, there is redundancy
  • jieba.lcut_for_search (str): search engine mode, returns a list of word segmentation results, there is redundancy
  • jieba.add_word (word): add a new word to the word segmentation dictionary

 for example

 

 

Guess you like

Origin www.cnblogs.com/technicist/p/12725617.html