Lucene笔记33-Lucene的扩展-使用Tika创建索引并进行搜索

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_36059561/article/details/83890536

一、使用Tika创建索引

之前创建索引的文档都是txt文件,现在有了Tika,我们就可以将pdf,word,html等文件,通过Tika提取出文本,之后创建索引,创建索引的写法和之前大致相似。只需要将content域对应的值做一下处理,之前是FileReader来读取,现在是使用Tika.parse()来获取。

public void index(boolean update) {
    IndexWriter indexWriter = null;
    try {
        Directory directory = FSDirectory.open(new File("E:\\Lucene\\IndexLibrary"));
        indexWriter = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_35, new IKAnalyzer()));
        if (update) {
            indexWriter.deleteAll();
        }
        File[] files = new File("E:\\Lucene\\SearchSource\\TikaSource").listFiles();
        for (File file : files) {
            // 通过Tika来存储数据
            Document document = new Document();
            // 如果需要,可以放入Metadata数据
            Metadata metadata = new Metadata();
            document.add(new Field("content", new Tika().parse(file, metadata)));
            document.add(new Field("fileName", file.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.add(new Field("path", file.getAbsolutePath(), Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.add(new NumericField("date", Field.Store.YES, true).setLongValue(file.lastModified()));
            document.add(new NumericField("size", Field.Store.YES, true).setIntValue((int) (file.length() / 1024)));
            indexWriter.addDocument(document);
        }
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        if (indexWriter != null) {
            try {
                indexWriter.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

二、使用Tika进行搜索

索引文件都创建出来了,搜索自然就很简单了,和之前一样,重心应该放在创建索引上,直接上代码吧。

public void search() {
    try {
        Directory directory = FSDirectory.open(new File("E:\\Lucene\\IndexLibrary"));
        IndexSearcher indexSearcher = new IndexSearcher(IndexReader.open(directory));
        TermQuery termQuery = new TermQuery(new Term("content", "必须"));
        TopDocs topDocs = indexSearcher.search(termQuery, 20);
        for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
            Document document = indexSearcher.doc(scoreDoc.doc);
            System.out.println(document.get("fileName"));
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
}

猜你喜欢

转载自blog.csdn.net/qq_36059561/article/details/83890536