NLP kit installation configuration (with a key to download requirements.txt)

Disclaimer: This article is a blogger original article, shall not be reproduced without the bloggers allowed. https://blog.csdn.net/qq_29027865/article/details/89814471

A key installation address
pip install -r requirements.txt

Enter the virtual environment

pip Mirror address

-i https://pypi.tuna.tsinghua.edu.cn/simple

numpy

a matrix operation numpy

pip install numpy

NLTK

NLTK natural language processing tool bag

pip installnltk

Gensim

Gensim: for automatically extracting semantic topics

  1. pip install gensim

  2. Download whl file: http://www.lfd.uci.edu/~gohlke/pythonlibs/ , then pip install whl documents;

Tensorflow

Tensorflow: data stream using open source software libraries for numerical calculation of FIG;

pip install tensorflow
pip install tf-nightly-gpu/cpu

jieba

jieba: Chinese sub thesaurus, word has three modes, can be added to custom dictionaries;

pip install jieba

Stanford NLP

Stanford NLP:

  1. Installation stanford nlp natural language processing package:
    pip install stanfordcorenlp
  2. Download Stanford CoreNLP file
    https://stanfordnlp.github.io/CoreNLP/download.html
  3. Download the Chinese model jar package:
    http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar
  4. After pressing the Stanford CoreNLP folders and download stanford-chinese-corenlp-2018-02-27-models.jar in the same directory
  5. Reference model in Python:
from stanfordcorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP(r‘path', lang='zh')

Hanlp

Brief introduction

Hanlp: Chinese word segmentation, POS tagging, named entity recognition (based on C ++ or the Java)
Hanlp consists of three parts: the library hanlp.jar package, the package data model, configuration files hanlp.properties, JVM after the test is completed, we need to its related configuration.
github URL: https://github.com/hankcs/HanLP#3 profile

JVM environment installation

  1. First, install the Java version, java version I am using jdk1.8;

  2. Then install Jpype, Jpype java code is invoked by a toolkit python
    pip install Jpype

  3. JVM can test whether the normal start in py environment:

   from jpype import *
   import os.path
   startJVM(getDefaultJVMPath(),"-ea")
   java.lang.System.out.printin("Hello World")
   shutdownJVM()

Hanlp installation

Hanlp consists of three parts: hanlp.jar library package, the package model data, profile hanlp.properties;

  1. Download hanlp.jar package:
    https://github.com/hankcs/HanLP

  2. Download the data.zip: https://github.com/hankcs/HanLP/releases

http://hanlp.linrunsoft.com/release/data-for-1.7.0.zip

  1. Profiles

Hanlp configuration properties file: hanlp.properties, the role of the configuration file is to tell the packet Data Hanlp position, simply modify the first line:
root=usr/home/HanLP/

Once configured, then we need to HanLP.properties into the classpath, called into the classpath, essentially looks for class and properties in the classpath when the JVM starts, this sentence is in the specified classpath:
"-Djava.class.path=E:\NLP\hanlp\hanlp-1.5.0.jar;E:\NLP\hanlp"

Often appear error conditions

  1. Class com.hankcs.hanlp.HanLP not found
    The reason lies startJVM settings, be sure to check two aspects: the first is to put the right path? The second is the version number of the yet? (Note that the path put in the best time in English)
  2. unicodeescape' codec can't decode bytes
    This is the cause of the error escape path before the string before adding r 'can escape.
    Test code:
import jpype
from jpype import *
jvmPath = jpype.getDefaultJVMPath()
jpype.startJVM(jvmPath,r"-Djava.class.path=E:\NLP\hanlp\hanlp-1.5.0.jar;E:\NLP\hanlp",
         "-Xms1g",
         "-Xmx1g")
jpype.java.lang.System.out.println("hello world!")
HanLP = JClass('com.hankcs.hanlp.HanLP')
java.lang.System.out.println(HanLP.segment("你好,欢迎使用HanLP汉语处理包!"));
jpype.shutdownJVM()

Guess you like

Origin blog.csdn.net/qq_29027865/article/details/89814471