Lucene常用知识点

Lucene:是一个开放源代码的全文检索引擎工具包，但它不是一个完整的全文检索引擎，而是一个全文检索引擎的架构，提供了完整的查询引擎和索引引擎。

索引库:相当于数据库。

索引：相当于为数据库创建的目录。

文档:数据库中一条记录的删减版（索引库中的一个索引）。

查询分为:

(1)基本查询:按照字段进行查询

(2)Term查询：查询条件不可以拆分

（3）模糊/容错查询:可设置错1个字或者2个字进行查询

（4）通配符查询:

(5)数值范围查询:

(6)组合/布尔查询:

高亮显示: 高亮显示的主要实现原理在于，为所有的关键字添加一个HTML标签，通过该标签来设置高亮。

使用Lucene：

(1)添加的依赖:

lucene核心库 lucene-core

查询解析器 lucene-queryparser

默认分词器 lucene-analyzers-common

IK分词器 ikanalyzer

高亮显示 lucene-highlighter

<groupId>junit</groupId>

<artifactId>junit</artifactId>

</dependency>

<groupId>org.apache.lucene</groupId>

<artifactId>lucene-core</artifactId>

</dependency>

<groupId>org.apache.lucene</groupId>

<artifactId>lucene-queryparser</artifactId>

</dependency>

<groupId>org.apache.lucene</groupId>

<artifactId>lucene-analyzers-common</artifactId>

</dependency>

<groupId>org.apache.lucene</groupId>

<artifactId>lucene-highlighter</artifactId>

</dependency>

<groupId>com.janeluo</groupId>

<artifactId>ikanalyzer</artifactId>

</dependency>

</dependencies>

(2)创建一个索引:

@Test
public void testCreateIndex() throws Exception{

    // 创建文档对象
    Document document = new Document();
    // 创建并添加字段信息(字段名称，字段的值，是否排序)
    document.add(new StringField("id", "1",Store.YES));
    // 添加字段
    document.add(new TextField("title", "中国工博会老铁上演“人工智能总动员”", Store.YES));

    // 创建索引目录对象
    Directory directory = FSDirectory.open(new File("f:\\indexDir"));
    // 创建分词器对象(默认的)
    //Analyzer analyzer = new StandardAnalyzer();
    Analyzer analyzer = new IKAnalyzer();
    // 创建配置对象
    IndexWriterConfig conf = new IndexWriterConfig(Version.LATEST, analyzer);
    // 创建索引的写出工具类
    IndexWriter indexWriter = new IndexWriter(directory, conf);

    // 添加文档
    indexWriter.addDocument(document);
    // 提交
    indexWriter.commit();
    // 关闭
    indexWriter.close();
}
(3)基本查询:

public static void main(String[] args) throws Exception{
    // 索引目录对象
    Directory directory = FSDirectory.open(new File("f:\\indexDir"));
    // 索引读取工具
    IndexReader reader = DirectoryReader.open(directory);
    // 索引搜索工具
    IndexSearcher searcher = new IndexSearcher(reader);

    // 创建查询解析器
    QueryParser parser = new QueryParser("title", new IKAnalyzer());
    // 创建查询对象(查询的关键字)
    Query query = parser.parse("人工智能");



    // 搜索数据(查询条件，查询条数)
    TopDocs topDocs = searcher.search(query, 10);
    // 获取总条数
    System.out.println("本次搜索共找到" + topDocs.totalHits + "条数据");
    // 获取得分文档对象
    ScoreDoc[] scoreDocs = topDocs.scoreDocs;
    for (ScoreDoc scoreDoc : scoreDocs) {
        // 取出文档编号
        int docID = scoreDoc.doc;
        // 根据编号去找文档，返回一个文档对象
        Document doc = reader.document(docID);
        System.out.println("id: " + doc.get("id"));
        System.out.println("title: " + doc.get("title"));
    }
}

（4）高亮显示:

public static void main(String[] args) throws Exception{
    // 目录对象
    Directory directory = FSDirectory.open(new File("F:\\indexDir"));
    // 创建读取工具
    IndexReader reader = DirectoryReader.open(directory);
    // 创建搜索工具
    IndexSearcher searcher = new IndexSearcher(reader);

    QueryParser parser = new QueryParser("title", new IKAnalyzer());
    Query query = parser.parse("人工智能");



    // 格式化器   设置关键字的前缀和后缀
    Formatter formatter = new SimpleHTMLFormatter("<em>", "</em>");
    Scorer scorer = new QueryScorer(query);
    // 准备高亮工具 由于高亮显示只是提取片段，所以需要分数权重
    Highlighter highlighter = new Highlighter(formatter, scorer);


    // 搜索
    TopDocs topDocs = searcher.search(query, 10);
    System.out.println("本次搜索共" + topDocs.totalHits + "条数据");

    ScoreDoc[] scoreDocs = topDocs.scoreDocs;
    for (ScoreDoc scoreDoc : scoreDocs) {
        // 获取文档编号
        int docID = scoreDoc.doc;
        Document doc = reader.document(docID);
        //System.out.println("id: " + doc.get("id"));

        String title = doc.get("title");
        // 处理查询到的高亮显示的结果（分词器，高亮显示的域名称，域的值）
        String hTitle = highlighter.getBestFragment(new IKAnalyzer(), "title", title);//System.out.println("title: " + hTitle);

        doc.removeField("title");
        doc.add(new TextField("title",hTitle, Field.Store.YES));
        System.out.println(doc.get("id")+doc.get("title"));

    }
}

szhwwjava

发布了7 篇原创文章 · 获赞 2 · 访问量 223

私信关注

猜你喜欢