Search engine series two: Lucene (Lucene introduction, Lucene architecture, Lucene integration)

1. Introduction to Lucene

1. Introduction to Lucene

  The most popular Java open source full-text search engine development kit . Provides a complete query engine and indexing engine, part of the text word segmentation engine (English and German two Western languages). The purpose of Lucene is to provide a simple and easy-to-use toolkit for software developers to facilitate the realization of full-text search functions in the target system, or to build a complete full-text search engine based on this. Is a sub-project of Apache, URL: http://lucene.apache.org/

2. Use of Lucene

  Provide software developers with a simple and easy-to-use toolkit to facilitate the realization of full-text retrieval functions in the target system, or to build a complete full-text retrieval engine based on this.

3. Lucene applicable scenarios

  Provide full-text retrieval implementation for data in the database in the application.

  Develop independent search engine services and systems

4. Features of Lucene

  1. Stable and high index performance

    Ability to index more than 150GB of data per hour.

    Small memory requirements - only 1MB of heap memory is required

    Incremental indexing is as fast as bulk indexing.

     The size of the index is about 20%~30% of the index text size.

  2. Efficient, accurate and high-performance search algorithm

    Good search sorting.

    Powerful query method support: phrase query, wildcard query, proximity query, range query, etc.

    Field search (eg title, author, content) is supported.

    Sort by any field

    Supports merging of multiple index query results

    Support update operation and query operation at the same time

    Support highlighting, join, grouping result functions

    high speed

    Extensible sorting module, built-in includes vector space model, BM25 model optional

    Configurable storage engine

  3. Cross-platform

    Written in pure java.

    As an open source project under the Apache Open Source License, you can use it in commercial or open source projects.

     Lucene is available in multiple languages ​​(such as C, C++, Python, etc.), not only JAVA.

2. Lucene Architecture

 

1. Data collection

2. Create an index

3. Index Storage

4. Search (using index)

 3. Lucene integration

 1. Selected version of Lucene

 Use the latest version 7.3.0: https://lucene.apache.org/

 2. System Requirements

 JDK1.8 and above

 3. Integration: Introduce the jar of lucene core into your application

Method 1: Download the zip from the official website, decompress and copy the jar to your project

Method 2: maven introduces dependencies

4. Lucene Module Description

c ore: Lucene core library core modules: word segmentation, indexing, query

analyzers-*: tokenizers

facet: Faceted indexing and search capabilities provides classified indexing and search capabilities

grouping: Collectors for grouping search results.

highlighter: Highlights search keywords in results keyword highlighting support

join: Index-time and Query-time joins for normalized content 连接支持

queries: Filters and Queries that add to core Lucene Supplementary query and filter implementation

queryparser: Query parsers and parsing framework query expression parsing module

spatial: Geospatial search suggest: Auto-suggest and Spellchecking support

5. First introduce the core module of lucene

<!-- lucene 核心模块  -->
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-core</artifactId>
    <version>7.3.0</version>
</dependency>

 6. Understand the composition of core modules

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325357874&siteId=291194637