Python third-party library____jieba

jieba is an excellent third-party library for Chinese word segmentation

       Chinese text needs to get individual words through word segmentation

        jieba is an excellent third-party library for Chinese word segmentation, which requires additional installation (pip install jieba)

        The jieba library provides three word segmentation modes, the easiest is to master one function

The principle of jieba word segmentation

        Using a Chinese thesaurus to determine the relationship probability between Chinese characters

        There is a high probability of forming phrases between Chinese characters to form word segmentation results

        In addition to word segmentation, users can also add custom phrases

Three modes of jieba word segmentation

       Exact Mode, Full Mode, Search Engine Mode

       Precise Mode: Cuts the text accurately without redundant words (most commonly used)

       Full mode: scan all possible words in the text, with redundancy

       Search engine mode: On the basis of the precise mode, the long words are segmented again

Common functions of jieba library:

       jieba.lcut(s) exact mode, returns a list-type word segmentation result l--> list cut -- word segmentation type exact mode

       jieba.lcut(s,cut_all=True) full mode, returns a list type word segmentation, there is redundancy

      jieba.lcut_for_search(s) search engine mode, returns a list-type word segmentation result, there is redundancy

      jieba.add_word(w) adds a new word w to the word segmentation dictionary

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325136181&siteId=291194637