Go language efficient word segmentation, support English, Chinese, Japanese, etc.
The dictionary is implemented with a double-array trie (Double-Array Trie), and the tokenizer algorithm is the shortest path based on word frequency plus dynamic programming.
It supports two word segmentation modes, ordinary and search engine, supports user dictionary, part-of-speech tagging, and can run JSON RPC service.
package main import ( "fmt" "github.com/go-ego/gse" ) func main() { was seg gse.Segments seg.LoadDict("zh,testdata/test_dict.txt,testdata/test_dict1.txt") text1 := []byte("Hello world, Hello world") segments: = seg.Segment (text1) fmt.Println(gse.ToString(segments, false)) }
Danube River
Add
[NEW] Added error line detection for loading dictionary
[NEW] Added dictionary abbreviations in different languages
[NEW] Added pattern segmentation method
[NEW] Add custom dictionary you are in
[NEW] More tests
[NEW] Update test tool
Update
[NEW] Update tool and benchmark code
[NEW] Update cedar code
[NEW] Simplified code name
[NEW] Update README.md
[NEW] Segmentation code method
[NEW] Update version and manage packages with dep
[NEW] Optimize dictionary loading
[NEW] Update log print and filename
Fix
[FIX] Format some code and fix godoc