Use JAVA to search EXCEL, WORD, PPT, and PDF content in OFFICE

The problem solved:
       Since the company has thousands of documents, such as WORD, EXCEL, PPT, etc., the current storage is based on the file system method, that is, SVN is used for management and directory storage, which is very inconvenient to find files.

       Then, combined with the algorithm written by some friends on the Internet, and then supplemented on this basis: you can get the qualified files through the content you want to find.

First, the order of use:
1. First configure the index storage directory, the default is: D:\\doc\\index, if no directory is created, the program will automatically create: Constants.DIRECTORY_INDEX_PATH
2. Specify the directory where the file to be searched is located: Constants. DIRECTORY_FILE_PATH
3. Create index: run the main method in Indexer.java
4. Start to find content: run the main method A of Searcher.java
, support EXCEL, WORD, PPT search of office2003-2010, and also support PDF
B, when creating an index , support file recursive search
C, support to create indexes by shielding specified directories, such as some directories do not want to be added
D, support directory search
Second , there are problems
1. The file types to be supported, such as: TXT, SQL, VISIO, etc., you can do it when you have time Continue to expand, the first 2 are very simple
2. There are some problems with the file search algorithm: Searcher.java searcher method, some search words cannot be found, such as the provided test file: 51CTO download-ORACLE__SQL statement teaching.pdf,
    search content: subquery Use the data

source :
http://ishare.iask.sina.com.cn/f/69219507.html

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326635218&siteId=291194637