java segmentation tool hanlp Introduction


A few days ago ( June 28 ), at the 23 session of the China International Software Expo, hanlp this natural language processing tool won the "2019 China International Software Expo twenty-third outstanding product."

Cover .jpg 

HanLP is a tool consisting of a series of model budget law package, combined with the depth of distributed natural language processing neural networks, with functional, efficient performance, clear structure, when the corpus new, customizable features, providing lexical analysis, syntactic analysis , text analysis and sentiment analysis and other functions, is GitHub most popular and largest user (more than 13,000 star), community activists highest natural language processing technology.

HanLP completely open source, including dictionary. Independent of other JAR, using a range of high speed of the underlying data structure, such as bis array Trie tree, DAWG, AhoCorasickDoubleArrayTrie the like, which are open to the base member. The official People's Daily corpus model training since 2014, you can use the built-in tools for training their own models.

By tools HanLP You can call all the functions of a word document in detail, out of the box. The underlying algorithm is optimized to up to 20 million characters / second speed word mode, memory is only 120MB. In terms of IO, fast loading dictionary, just 500 ms to quick start. HanLP After several reconstruction, has now updated to version 1.7, the new and improved Chinese word, named entity recognition, information extraction, text classification, text clustering, painting analysis and other functions, efficiency and applicability has been greatly upgrade.

Figure 1.JPG 

 Figure 2.JPG

 

 Figure 3.JPG

 


Guess you like

Origin blog.51cto.com/13636660/2416506