Configuration of ICTCLAS50 Chinese word segmentation

2014/6/5

First of all, Chinese word segmentation is really difficult.

In fact, there will be many errors in each configuration of Chinese word segmentation in the whole project. At the beginning, it was tested in small batches on its own machine. When it was transplanted to the server, it was found that there would be many errors, so that the errors were the same and the solutions were different. Kind of crashed. . .

Where Chinese word segmentation is used:

Enter the query box to generate the topics of the word segmentation.

Configure.xml: Configuration management file

ICTCLAS50.dll: ICTCLAS5.0 dynamic link library

ICTCLAS50.h: ICTCLAS5.0 header file

ICTCLAS50.lib: ICTCLAS5.0 Library

libICTCLAS50.so: ICTCLAS5.0 Library

user.lic User License file, indicating User identity, essential and must not be changed.



Configuration:

Copy the ICTCLAS folder to the src directory, and copy all other folders and files to the project directory, including: Data, Configure.xml, ICTCLAS_I3C_AC_ICTCLAS50.h, ICTCLAS50.dll, ICTCLAS.h, ICTCLAS50.lib, libICTCLAS50. so, user.lic

reference: http://blog.csdn.net/heyu158/article/details/12680183 Chinese Academy of Sciences word segmentation ICTCLAS5.0_JNI usage

http://blog.csdn.net/caimo/article/details/7686872ICTCLAS2011 Chinese word segmentation Use in java web projects



Put the configuration files of the word segmentation in the galagosearch-core directory

The handleSearch() function is the list display, q is the content of the query box

SplitWord splitword=new SplitWord();

String displayQuery0=scrub(request.getParameter("q"));//分词前

String displayQuery = splitword.testICTCLAS_ParagraphProcess(displayQuery0);//After word segmentation

SearchResult() is the query result
Configuration errors:

1. Init Fail! The word

segmentation library is not found, and the Data is placed in the root directory.

2. If you change the workspace, there are often errors in Chinese word segmentation, which are basically path problems no ICTCLAS50 in java.library.path . That is, the error of loading the library file and the Data folder and the user authorization file user.lic.

Change ICTCLAS.I3S.AC.ICTCLAS50.java according to the online method, change System.loadLibrary("ICTCLAS50") to

System.load("E:/douban/workspaces/SocialBook2/ICTCLAS50.dll");

and then modify the test The value of argus in the testICTCLAS_ParagraphProcess() method in the class tells ICTCLAS that you changed the project directory.

System.load("E:/douban/workspaces/SocialBook2/ICTCLAS50.dll");


If it is not successful, create a new folder config in the root directory, and put Data, Configure.xml and other files in the config
System.load(" E:/douban/workspaces/SocialBook2/ICTCLAS50.dll");
System.load("E:/douban/workspaces/SocialBook2/ICTCLAS50.dll");

Reference: http://summerbell.iteye.

Sometimes, it is really not successful. When I tested it on the server today, it prompted such an error.

Then I reconfigured tomcat

Windows-preferences-tomcat and added the path. Found that tomcat does not have a temp file, and created a new one manually.

whereis tomcat7

/etc/tomcat7 /usr/share/tomcat7 two directories and

then copy the files needed for word segmentation, except for ICTCLAS, all the other seven are copied to the tomcat/bin directory.

Another error: libstdc++.so.6: connot open shared object file: No such file or directory. . . . Errors like ELF class

apt-get install libstdc++5

then runs successfully.

3. Init Fail!

Cannot Open Configure file .\Configure.xml

is because the .\Configure.xml file cannot be found, and the root directory of the configuration file needs to be set to new File("").getAbsolutePath()+"\\ICTCLASConf" ;. When initializing ICTCLAS_Init, pass new File("").getAbsolutePath()+"\\ICTCLASConf".getBytes("GB2312") as a parameter to run correctly.

When initialized in the SplitWord class

String argu=new File("").getAbsolutePath(); still prompts the error Cannot Open Configure file

String argu = "/home/zzj/Workspaces/SocialBook2"; (the directory of Configure.xml) succeeds.

Reference: http://gdhapple.blog.163.com/blog/static/12685791720122832029133/ Chinese Academy of Sciences word segmentation ICTCLAS5.0 configuration error handling

4. When calling galago to display the results on the web page, it is a complete web project, and it is found that a query is entered The result can be obtained. When the query is entered again, the word segmentation error occurs, and the word segmentation content is empty.

Called word segmentation function: testICTCLAS_ParagraphProcess(String sInput)

Initially, because of the out-of-bounds situation, the initialization was placed outside the function, but this will result in no results for the second and subsequent queries, so initialization is required, so the initialization is moved to the function Inside.

ICTCLAS50 testICTCLAS50 = new ICTCLAS50();

String argu = ".";

//initialization

if (testICTCLAS50.ICTCLAS_Init(argu.getBytes("GB2312")) == false)

{

System.out.println("Init Fail!");

return;

}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326446817&siteId=291194637