1. Introduction to Lucene
1. Introduction to Lucene
The most popular Java open source full-text search engine development kit . Provides a complete query engine and indexing engine, part of the text word segmentation engine (English and German two Western languages). The purpose of Lucene is to provide a simple and easy-to-use toolkit for software developers to facilitate the realization of full-text search functions in the target system, or to build a complete full-text search engine based on this. Is a sub-project of Apache, URL: http://lucene.apache.org/
2. Use of Lucene
Provide software developers with a simple and easy-to-use toolkit to facilitate the realization of full-text retrieval functions in the target system, or to build a complete full-text retrieval engine based on this.
3. Lucene applicable scenarios
Provide full-text retrieval implementation for data in the database in the application.
Develop independent search engine services and systems
4. Features of Lucene
1. Stable and high index performance
Ability to index more than 150GB of data per hour.
Small memory requirements - only 1MB of heap memory is required
Incremental indexing is as fast as bulk indexing.
The size of the index is about 20%~30% of the index text size.
2. Efficient, accurate and high-performance search algorithm
Good search sorting.
Powerful query method support: phrase query, wildcard query, proximity query, range query, etc.
Field search (eg title, author, content) is supported.
Sort by any field
Supports merging of multiple index query results
Support update operation and query operation at the same time
Support highlighting, join, grouping result functions
high speed
Extensible sorting module, built-in includes vector space model, BM25 model optional
Configurable storage engine
3. Cross-platform
Written in pure java.
As an open source project under the Apache Open Source License, you can use it in commercial or open source projects.
Lucene is available in multiple languages (such as C, C++, Python, etc.), not only JAVA.
2. Lucene Architecture
1. Data collection
2. Create an index
3. Index Storage
4. Search (using index)
3. Lucene integration
1. Selected version of Lucene
Use the latest version 7.3.0: https://lucene.apache.org/
2. System Requirements
JDK1.8 and above
3. Integration: Introduce the jar of lucene core into your application
Method 1: Download the zip from the official website, decompress and copy the jar to your project
Method 2: maven introduces dependencies
4. Lucene Module Description
c ore: Lucene core library core modules: word segmentation, indexing, query
analyzers-*: tokenizers
facet: Faceted indexing and search capabilities provides classified indexing and search capabilities
grouping: Collectors for grouping search results.
highlighter: Highlights search keywords in results keyword highlighting support
join: Index-time and Query-time joins for normalized content 连接支持
queries: Filters and Queries that add to core Lucene Supplementary query and filter implementation
queryparser: Query parsers and parsing framework query expression parsing module
spatial: Geospatial search suggest: Auto-suggest and Spellchecking support
5. First introduce the core module of lucene
<!-- lucene 核心模块 --> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>7.3.0</version> </dependency>
6. Understand the composition of core modules