Gse v0.10.0 released, Go high-performance word segmentation

  

Go language efficient word segmentation, support English, Chinese, Japanese, etc.

The dictionary is implemented with a double-array trie (Double-Array Trie), and the tokenizer algorithm is the shortest path based on word frequency plus dynamic programming.

It supports two word segmentation modes, ordinary and search engine, supports user dictionary, part-of-speech tagging, and can run JSON RPC service.

package main


import (
"fmt"


"github.com/go-ego/gse"
)


func main() {
was seg gse.Segments
seg.LoadDict("zh,testdata/test_dict.txt,testdata/test_dict1.txt")


text1 := []byte("Hello world, Hello world")


segments: = seg.Segment (text1)
fmt.Println(gse.ToString(segments, false))
}

Danube River

Add  

  • [NEW] Added error line detection for loading dictionary

  • [NEW] Added dictionary abbreviations in different languages

  • [NEW] Added pattern segmentation method

  • [NEW] Add custom dictionary you are in 

  • [NEW] More tests

  • [NEW] Update test tool

Update

  • [NEW] Update tool and benchmark code

  • [NEW] Update cedar code

  • [NEW] Simplified code name

  • [NEW] Update README.md

  • [NEW] Segmentation code method

  • [NEW] Update version and manage packages with dep

  • [NEW] Optimize dictionary loading

  • [NEW] Update log print and filename

Fix

  • [FIX] Format some code and fix godoc

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325048039&siteId=291194637