1, hanlp Profile
HanLP is composed of a series of models and algorithms Java toolkit aims to popularize the application of natural language processing in a production environment. HanLP with perfect features, performance efficiency, clear structure, when the corpus new, customizable features.
Hanlp has the following features:
- Chinese word
- Speech tagging
- Named entity recognition
- Dependency parsing
- Keyword extraction discover new words
- Phrase extraction
- Automatic Summary
- Text Categorization
- Simplified and Traditional Pinyin
2, hanlp installation
Step: hanlp Providing python library modules, the system needs to command box: enter the following code (ctrl + r, enter cmd), the library can be installed pyhanlp
pip install pyhanlp
Step two: hanlp library dependencies packet, so to successfully use also need to download specific data packets, packets need to have: data-for-1.7.7.zip (latest edition),
data Download: https://github.com/hankcs/HanLP/releases
Once downloaded, the file into the directory, according to the directory where you install python compiler may be, as my directory is: E: \ tool \ python \ Lib \ site-packages \ pyhanlp \ static, can not find your path in the command box again to re-enter the installation command, you can. Note Once you have downloaded the data packet, without decompression, directly into your directory. Enter the following code:
from python import*
Run, can automatically extract, after a successful start the test.
3, hanlp function test
Enter the simple test code, test hanlp features:
from pyhanlp Import * sentence = " I like being a writer, to write the kind of book to your favorite writer, writing allows the writer of the book many readers seem unable to stop, write the kind of book writer hearty " Terms = HanLP .SEGMENT (sentence) Print (Terms)
The results show:
Output: [I / rr, like / vi, when / p, a / q, writer / nnt,, / w, sort / r, write / v, own / rr, watch / v, a / ude1, book / n, the / ude1, writer / nnt,, / w, write / v, can / v, let / v, lot / m, the reader / n, it seems / v, unable to stop / vl, the / ude1, book / n, the / ude1, writer / nnt,, / w, write / v, that / r, hearty / al, the / ude1, books / n, the / ude1, writer / nnt]
4, hanlp reference documentation
pyhanlp reference documentation: https://github.com/hankcs/pyhanlp
hanlp reference documentation: https://github.com/hankcs/HanLP/blob/master/README.md
5 Notes
pynlp and hanlp are hanlp's segmentation, POS tagging tool, HanLP is a Java toolkit is based on python python toolkit, if it is compiled with pycharm python, install pyhanlp enough.