Chinese natural language processing tool HanLP source package download and use

Recently, I plan to study Chinese natural language processing. The project I did last year has accumulated some speech recognition related projects. I originally planned to study the source code of speech recognition direction, but it is difficult to collect speech data, and there is little free and open source data on the Internet. Poor, so I turned to text natural language processing. Compared with speech, there are more open source text thesaurus online, and it is better to collect materials by myself. After all, writing a script on the Internet is a lot.

Let me recommend two books here, for some reference for friends who also want to learn Chinese natural language processing, one is "NLP Chinese Natural Language Processing Principles and Practice", which introduces the related technologies of Chinese natural language processing, and Some source code interpretation, thanks to the author here. The other is "python natural language processing", paste my network disk address below:

"NLP Chinese Natural Language Processing Principles and Practice":

Link: https://pan.baidu.com/s/13g-KRw2XPCvqXeHZ87cawAPassword : 91dr

"python natural language processing":

Link: https://pan.baidu.com/s/1BW94LgXl5SsxJCp4Mpi9Ag Password: 0e29

Okay, let’s get to the topic, this article is mainly a memo, and does not talk about the principle and code. These will be studied in depth and then written into a series of albums. Today, I mainly remember the download of HanLP natural language processing source code, data set download, and It will allow the demo in the source code to run through. In view of the fact that I have been looking at the source code for a few days before, I have some clues, but the overall feeling is still very hazy and the progress is slow, so I decided to change my thinking, first run the demo provided in the source code, and then follow the instructions of each demo. Look at the source code according to the calling sequence, so the conditioning is much clearer.

In fact, you only need to download the source code, download the dictionary and model data files, download the configuration file, change the configuration file a little, and then use the IDE to open the source code, you can run, the whole process is not complicated.

Source code, dictionary and model, configuration file download address:

https://github.com/hankcs/HanLP/releases

The webpage provides detailed instructions. In fact, you can follow the instructions. After downloading, extract the dictionary and model files to a directory. It is recommended to put them in the project name directory:


data is the model file and dictionary data file:


The file downloaded through the source code download link provided by github does not contain the hanlp.properties configuration file. This is the code you need to download a release version. After decompression, there is a hanlp.properties file in it


Copy this file to the decompressed source code target/classes and target-classes directories respectively


Finally, open the source code with ide. The ide tool I use is IDEA (Intellij). The operations of other IDEs should be similar. Of course, only one configuration file may be enough, and there is no need to copy both directories. I have not verified it here. To be on the safe side, a copy is made in both directories. Readers can try it out. My main purpose here is to get through.

After the above steps are completed, open the demo under src/test/java/com.hankcs/demo, and you can run out the results. Next, just follow the clues to read the source code.



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325259518&siteId=291194637