Getting started with the part-of-speech reduction tool CoreNLP, Pattern from getting started to giving up

This is pattern’s complaint (skip)

1. Due to work needs, I need a word restoration tool. I usually use PHP. I found a PHP code and looked at the restoration rate. I am not very satisfied (mainly because there are many commonly used thesauruses that cannot be restored). After all, I am not a professional NLP. Tools.
2. Then I found tools such as NLTK Pattern TextBlob on the Internet, and found that Pattern is relatively simple, and there is no need to mark the part of speech first.
3. Then started to install python3 on Windows (not sure if Pattern does not support python3, I did not see that it does not support it) So I used python3, and my English is not good and I didn’t pay attention to the official documentation). I found that python3 is really a waste of life in windows . I installed it step by step according to the textbook. There was a lack of libraries to install. After installing it for a long time, this thing is actually very fast. I want vs2015. Damn, that thing has several GB and it takes several hours to install (about 1 hour). When I was compiling php, I installed it and then uninstalled it because it took up space. (After half a day) I gave up. 4. Is it better to switch to Linux and switch to a virtual machine
? Centos7 , install python3 pip3
and modify the pip source to Alibaba Cloud source, then install Pattern
and then there are various missing libraries... Is it promised that Linux will be installed automatically? It was promised that it will automatically determine the missing libraries?
Okay, the mysql_config file cannot be found. The configuration was solved (need to install mysql-devel)
and other problems were solved all the way ( it took half a day to solve it )
. Finally, a line of python code came out and reported an error. Damn it, this is a popular open source library. It is not a missing library. You actually reported an error. I can’t understand the original. If you just want to use it briefly, you can only give up.
5. Conclusion I am really not suitable for python. I have failed to install other open source programs in python before. In addition, you said that Linux is simple. I found that except for very common programs, I almost never had a smooth installation process in Linux. How can I solve it with one command? Ah, there are all errors and missing libraries. Many programs cannot be successfully installed at one time using yum.

CoreNLP is an nlp tool written in java and can be called through http api

advantage

Written in java, it can be called through http api with basically no programming language restrictions.
It is suitable for non-professional nlp to be used as a tool

reference:

Download and install: https://blog.csdn.net/quiet_girl/article/details/79974788
http server startup: https://blog.csdn.net/u014033218/article/details/89301572

Install

1.CoreNLP requires jdk1.8 or above, install jdk1.8_64. Then you need to add the jdk directory (java.exe directory) to the environment variable path
2. Make sure that the program (environment variable) will not be found when executing java in cmd. A bug is found here (one of them has an unknown reason). Right-click and open cmd at the current location. Java cannot be found. Just open cmd directly and navigate to the target directory. 3. Download program download address:
https
://stanfordnlp.github. io/CoreNLP/, the download interface is as shown below:
Insert image description here
If you only want to restore words, you do not need to download the language file (it seems that you must download the language file when analyzing Chinese).

Start web server

1. Unzip to an English directory (preferably an English directory).
Execute in the CoreNLP root directory (a directory with many jar files) (start cmd and locate this directory),

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

This sentence means that the web service port is started as 9000 and the timeout is 15000 milliseconds. If there is a firewall interception reminder, click Allow.

#加载中文语言模型
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -serverProperties StanfordCoreNLP-chinese.properties -port 9000 -timeout 15000


2. You can test the effect by accessing http://localhost:9000/ with your browser.
Enter the sentence to be tested, click "submit", and wait patiently. The first initialization is estimated to take several minutes (approximately 4Gb of memory). If it takes a long time The time does not respond and the CPU remains high (more than 5 minutes). You can consider ending the java process and starting again. In addition, try not to run other java programs when running. I found that finalshell (an SSH tool of java) conflicts with it. The effect is as follows
.
Insert image description here

3. If you only need part-of-speech restoration, delete other choices and only select "lemmas". This will make the initialization faster and only require 200m of memory.
Insert image description here
Insert image description here

api call

1. The easiest way to query the API is not to check the official documentation (if you can understand English, let me know)
2.f12 Turn on the debugging function of the browser and then test the desired function on the page and check the http request.
Because CoreNLP’s default request is not Security authentication is required, so it is very simple.
3. This is the URL for word restoration. You can see that there is a formatted time in the URL, and then the query data is sent through post.
Insert image description here

#解码后的url
http://localhost:9000/?properties={"annotators": "tokenize,ssplit,lemma", "date": "2019-11-11T15:10:57"}&pipelineLanguage=en

We only need to use any programming language that can send post requests (unsupported languages ​​​​can call wget from the command line) to call the CoreNLP interface. I looked at the official php api and it is not easy to use. The relevant Chinese tutorial
explains Basically none.

Other functions user authentication

Please refer to https://blog.csdn.net/u014033218/article/details/89301572

Guess you like

Origin blog.csdn.net/tangshangkui/article/details/103009466