The best solution for Python3 to install pyhanlp

1. Introduction to Hanlp

Hanlp is a Chinese natural language processing tool. Hanlp supports a variety of natural language processing tasks, including word segmentation, part-of-speech tagging, named entity recognition, dependency syntax analysis, sentiment analysis, text classification, etc. Its main advantages include:

  1. High accuracy rate: Hanlp adopts the CRF model word segmentation, which has the best word segmentation effect among non-deep learning methods at present, and has a better recognition effect on ambiguous words and unknown words, effectively improving the accuracy and speed of word segmentation.
  2. Wide coverage: Hanlp supports a variety of natural language processing tasks, including word segmentation, part-of-speech tagging, named entity recognition, dependency syntax analysis, sentiment analysis, text classification, etc., and can meet most of the natural language processing needs.
  3. Multilingual support: Hanlp supports the processing of multiple languages, such as Chinese, English, Japanese, etc., and can be applied to natural language processing tasks in a multilingual environment.
  4. Easy to integrate: Hanlp provides rich API interfaces and out-of-the-box models, easy to integrate into Java projects, and supports the use of multiple programming languages ​​such as Python, Go, and C++.

In short, Hanlp is a powerful, easy-to-integrate Chinese natural language processing tool, and has a wide range of application scenarios.

HanLP official website

2. Problem background

Recently, I need to use the hanlp package in my work, so I started to step on the road...

ModuleNotFoundError: No module named 'hanlp' error

pip install pyhanlp installation error

 3. Solution

Step 1: Install JPype1 , but use the pip install JPype1 command to install and report an error.

Solution: Find the whl of jypel corresponding to the Python version , download link: https://www.lfd.uci.edu/~gohlke/pythonlibs/

Choose to download the corresponding installation package according to your python version: pip install JPype1-1.2.0-cp36-cp36m-win_amd64.whl

Note: When installing, the whl suffix also needs to be brought.

Check whether the installation is successful (successful operation):

import jpype
jvmPath=jpype.getDefaultJVMPath()
print(jvmPath)   # D:\jdk\bin\server\jvm.dll

After the installation is complete, install pip install pyhanlp at this time, but still report an error.

Step 2: Compile and install the source code

Enter the official website to download the source code zip, and then decompress it into the Python package. Link address: mirrors / hankcs / HanLP · GitCode

After decompression, in the decompressed installation directory

D:\python3.6.6\Lib\site-packages\HanLP-doc-zh (this is the installation directory) execute:

python setup.py install

Start compiling and installing.

An error will be reported during the installation process, you need to install torch, just go to the Python installation package to download and install it.

Python installation package collection: https://www.lfd.uci.edu/~gohlke/pythonlibs/

Execute again: python setup.py install, there is still a little problem, but the running code no longer reports this error: ModuleNotFoundError: No module named 'hanlp'.

 Continue to step on the pit......

Step 3: HanLP installation

Download the jar configuration file hanlp.jar package and data, address: Releases · hankcs/HanLP · GitHub

Click the image below to download:

 After downloading, first unzip the hanlp-1.8.4-release installation package to the local, and rename it to hanlp_package (take it arbitrarily), and put the data folder decompressed by data-for-1.7.5.zip into the hanlp_package installation package.

 Next, modify the hanlp.properties configuration file and change the default path inside to your local path:

 Note: The path of HanLP is in "D:\software\hannlp" (it is best not to bring Chinese in the path)

4. HanLP code test

from jpype import *

startJVM(getDefaultJVMPath(), "-Djava.class.path=D:\software\hannlp\hanlp-1.7.2.jar;D:\software\hannlp",
         "-Xms1g",
         "-Xmx1g") # 启动JVM,Linux需替换分号;为冒号:

print("=" * 30 + "HanLP分词" + "=" * 30)
HanLP = JClass('com.hankcs.hanlp.HanLP')
# 中文分词
print(HanLP.segment('小明毕业于北京理工大学,后就职与中国科学院大数据研究所。'))
print("-" * 70)

shutdownJVM()

operation result:

==============================HanLP分词==============================
[小明/nz, 毕业/v, 于/p, 北京理工大学/ntu, ,/w, 后/f, 就职/vi, 与/cc, 中国科学院/nt, 大/a, 数据/n, 研究所/nis, 。/w]
----------------------------------------------------------------------

But at this time, pyhanlp still cannot be imported, and an error is reported when running the program. Execute the command to install pip install pyhanlp, but still report an error.

Restart, and then execute pip install pyhanlp, the installation is successful, solved! ! !

Code test:

from pyhanlp import *
conten_list = HanLP.parseDependency("小明毕业于北京理工大学,后就职与中国科学院大数据研究所。")
print(conten_list)

operation result:

1	小明	小明	nh	nr	_	2	主谓关系	_	_
2	毕业	毕业	v	v	_	0	核心关系	_	_
3	于	于	p	p	_	2	动补结构	_	_
4	北京理工大学	北京理工大学	ni	ntu	_	3	介宾关系	_	_
5	,	,	wp	w	_	2	标点符号	_	_
6	后	后	nd	f	_	7	状中结构	_	_
7	就职	就职	v	v	_	2	并列关系	_	_
8	与	与	p	p	_	11	左附加关系	_	_
9	中国科学院	中国科学院	ni	nt	_	10	定中关系	_	_
10	大数据	大数据	n	n	_	11	定中关系	_	_
11	研究所	研究所	n	n	_	7	并列关系	_	_
12	。	。	wp	w	_	2	标点符号	_	_

5. Summary

In summary, the above stepping steps can solve the problem of pip install pyhanlp installation error or ModuleNotFoundError: No module named 'hanlp' error.

Guess you like

Origin blog.csdn.net/weixin_40547993/article/details/130853697