Comparative usage records of word segmentation jieba and LAC

I spent a whole day trying to compare jieba and lac. record it. Due to project needs, I plan to use LAC as the main word segmentation tool.

jieba

Let me talk about jieba first. The installation and use are relatively simple, and there are many recommendations. For example, in the book "Python Chinese Natural Language Processing Basics and Practice", jieba is recommended.

Important dependency package: paddle-tiny . But paddle-tiny was last updated in 2019, so it's a bit old.

Better than lac, the installation is a little easier, and the installation pit of lac just makes me speechless.

Weaker than lac is that there is no importance label (I haven't found this feature yet).

LAC

Installation instructions (important) :

1. It depends on paddlepaddle, which currently does not support the latest python version ! For example, python is 3.10 now, sorry, it is not supported. For details about which version is supported, see the documentation on pypi.

2. You must use 64-bit python . 32-bit sorry, not supported.

Because of the above two points, I didn't notice and wasted half a day debugging the error.

LAC does not need to be connected to the Internet, and the results of word segmentation, part-of-speech tagging, and importance tagging can be obtained by running locally. It has been disconnected from the network test. The calculation of a single sentence takes 0.3 seconds, and the time overhead is slightly larger, but it is also within the acceptable range. The time overhead is as shown in the figure below.

Guess you like

Origin blog.csdn.net/chenggong2dm/article/details/122566977