First of all, Chinese word segmentation is really difficult.
In fact, there will be many errors in each configuration of Chinese word segmentation in the whole project. At the beginning, it was tested in small batches on its own machine. When it was transplanted to the server, it was found that there would be many errors, so that the errors were the same and the solutions were different. Kind of crashed. . .
Where Chinese word segmentation is used:
Enter the query box to generate the topics of the word segmentation.
Configure.xml: Configuration management file
ICTCLAS50.dll: ICTCLAS5.0 dynamic link library
ICTCLAS50.h: ICTCLAS5.0 header file
ICTCLAS50.lib: ICTCLAS5.0 Library
libICTCLAS50.so: ICTCLAS5.0 Library
user.lic User License file, indicating User identity, essential and must not be changed.
Configuration:
Copy the ICTCLAS folder to the src directory, and copy all other folders and files to the project directory, including: Data, Configure.xml, ICTCLAS_I3C_AC_ICTCLAS50.h, ICTCLAS50.dll, ICTCLAS.h, ICTCLAS50.lib, libICTCLAS50. so, user.lic
reference: http://blog.csdn.net/heyu158/article/details/12680183 Chinese Academy of Sciences word segmentation ICTCLAS5.0_JNI usage
http://blog.csdn.net/caimo/article/details/7686872ICTCLAS2011 Chinese word segmentation Use in java web projects
Put the configuration files of the word segmentation in the galagosearch-core directory
The handleSearch() function is the list display, q is the content of the query box SplitWord splitword=new SplitWord(); String displayQuery0=scrub(request.getParameter("q"));//分词前 String displayQuery = splitword.testICTCLAS_ParagraphProcess(displayQuery0);//After word segmentation SearchResult() is the query resultConfiguration errors:
1. Init Fail! The word
segmentation library is not found, and the Data is placed in the root directory.
2. If you change the workspace, there are often errors in Chinese word segmentation, which are basically path problems no ICTCLAS50 in java.library.path . That is, the error of loading the library file and the Data folder and the user authorization file user.lic.
Change ICTCLAS.I3S.AC.ICTCLAS50.java according to the online method, change System.loadLibrary("ICTCLAS50") to
System.load("E:/douban/workspaces/SocialBook2/ICTCLAS50.dll");
and then modify the test The value of argus in the testICTCLAS_ParagraphProcess() method in the class tells ICTCLAS that you changed the project directory.
System.load("E:/douban/workspaces/SocialBook2/ICTCLAS50.dll");
If it is not successful, create a new folder config in the root directory, and put Data, Configure.xml and other files in the config
System.load(" E:/douban/workspaces/SocialBook2/ICTCLAS50.dll");
System.load("E:/douban/workspaces/SocialBook2/ICTCLAS50.dll");
Reference: http://summerbell.iteye.
Sometimes, it is really not successful. When I tested it on the server today, it prompted such an error.
Then I reconfigured tomcat
Windows-preferences-tomcat and added the path. Found that tomcat does not have a temp file, and created a new one manually.
whereis tomcat7
/etc/tomcat7 /usr/share/tomcat7 two directories and
then copy the files needed for word segmentation, except for ICTCLAS, all the other seven are copied to the tomcat/bin directory.
Another error: libstdc++.so.6: connot open shared object file: No such file or directory. . . . Errors like ELF class
apt-get install libstdc++5
then runs successfully.
3. Init Fail!
Cannot Open Configure file .\Configure.xml
is because the .\Configure.xml file cannot be found, and the root directory of the configuration file needs to be set to new File("").getAbsolutePath()+"\\ICTCLASConf" ;. When initializing ICTCLAS_Init, pass new File("").getAbsolutePath()+"\\ICTCLASConf".getBytes("GB2312") as a parameter to run correctly.
When initialized in the SplitWord class
String argu=new File("").getAbsolutePath(); still prompts the error Cannot Open Configure file
String argu = "/home/zzj/Workspaces/SocialBook2"; (the directory of Configure.xml) succeeds.
Reference: http://gdhapple.blog.163.com/blog/static/12685791720122832029133/ Chinese Academy of Sciences word segmentation ICTCLAS5.0 configuration error handling
4. When calling galago to display the results on the web page, it is a complete web project, and it is found that a query is entered The result can be obtained. When the query is entered again, the word segmentation error occurs, and the word segmentation content is empty.
Called word segmentation function: testICTCLAS_ParagraphProcess(String sInput)
Initially, because of the out-of-bounds situation, the initialization was placed outside the function, but this will result in no results for the second and subsequent queries, so initialization is required, so the initialization is moved to the function Inside.
ICTCLAS50 testICTCLAS50 = new ICTCLAS50(); String argu = "."; //initialization if (testICTCLAS50.ICTCLAS_Init(argu.getBytes("GB2312")) == false) { System.out.println("Init Fail!"); return; }