VicWord a word of pure php
Major update:
1. Add a default thesaurus path
2. thesaurus does not exist returns false, instead throwing an exception.
Cloud code Address: https://gitee.com/jitog/phpfenci
GitHub: https://github.com/lizhichao/VicWord
DESCRIPTION word
containing three kinds Segmentation
getWord priority segmentation length. The fastest
getShortWord fine-grained segmentation. A little bit of speed than most
getAutoWord break automatically. The best
three kinds of word comparison results
$fc = new VicWord('igb');
$arr = $fc->getWord('北京大学生喝进口红酒,在北京大学生活区喝进口红酒');
//北京大学|生喝|进口|红酒|,|在|北京大学|生活区|喝|进口|红酒
//$arr 是一个数组 每个单元的结构[词语,词语位置,词性,这个词语是否包含在词典中] 这里只值列出了词语
$arr = $fc->getShortWord('北京大学生喝进口红酒,在北京大学生活区喝进口红酒');
//北京|大学|生喝|进口|红酒|,|在|北京|大学|生活|区喝|进口|红酒
$arr = $fc->getAutoWord('北京大学生喝进口红酒,在北京大学生活区喝进口红酒');
//北京|大学生|喝|进口|红酒|,|在|北京大学|生活区|喝|进口|红酒
//对比
//qq的分词 http://nlp.qq.com/semantic.cgi#page2
//百度的分词 http://ai.baidu.com/tech/nlp/lexical
Word speed
machine Ali cloud Intel (R) Xeon (R) Platinum 8163 the CPU @ 2.50GHz
getWord 140W words per second
getShortWord 138w word per second
getAutoWord 40w word per second
testing period of 5000 words in text text copy of Baidu Encyclopedia