Jieba is an excellent third-party library
- Chinese text needs to get individual words through word segmentation
- Jieba is an excellent third-party library for Chinese word segmentation, which requires additional installation
- Jieba library provides three word segmentation modes, namely: precise mode, full mode, search engine mode
--- Precise mode: cut the text accurately without redundancy
--- Full mode: Scan all possible words in the text, with redundancy
--- Search Engine Mode: Based on the precise mode, the long words are segmented again
Common functions
- jieba.lcut (str): Precise mode, return a list-type word segmentation result
- jieba.lcut (str, cut_all = True): full mode, returns a word segmentation result of a list type, there is redundancy
- jieba.lcut_for_search (str): search engine mode, returns a list of word segmentation results, there is redundancy
- jieba.add_word (word): add a new word to the word segmentation dictionary
for example