Ten million Chinese giant thesaurus Share

Ten million Chinese giant thesaurus Share
+ This algorithm is a corpus ≈NLP world's largest vocabulary thesaurus, do HanLP so long, I gradually realized that the algorithm can not solve all the problems, the thesaurus is also very important. Usually an algorithm can solve 80 percent of the problem, no matter how the remaining 20% ​​adjustment optimization, are shattering. For example, I mentioned last time, "District People's Insurance" Examples mistakenly hit HMM name recognition module, so that the word HMM point of view, "regions" as a last name, "people", "security" as the name of two or three words indeed very likely but normal people will not take the name. If I put "people", "insurance" to reduce the frequency of the word or delete the words, "very hot" "King of preservation" and does not recognize these names. Originally blog "Natural Language Processing" category under there, "Corpus" This small class ...

Continue reading : yard farm >> ten million Chinese giant thesaurus Share

Original link : http://www.hankcs.com/nlp/corpus/tens-of-millions-of-giant-chinese-word-library-share.html

Reproduced in: https: //my.oschina.net/hankcs/blog/342303

Guess you like

Origin blog.csdn.net/weixin_33756418/article/details/91780446