Construction and Lucene- example program installed base

Construction and Lucene- example program installed base

  • About this article
  • About sample program
  • Setting the CLASSPATH environment variable
  • Use the search index
  • About Code
  • Core code
  • Index file
  • Search Index

About this article

Goal of this article is to help you through the quick start example program, understand the basic installation process, and configuration.

About the Example

Lucene sample program demonstrates the various functions, as well as how to embed into your application.

Setting the CLASSPATH environment variable

First, you need to download the latest release of Lucene package, and then extract it to a directory.

Then you will get four JAR package: lucene-core- {version} .jar, lucene-queryparser- {version} .jar, lucene-analyzers-common- {version} .jar, lucene-demo- {version} .jar , the four JAR package placed in the CLASSPATH.

Use the search index

Assuming you have properly configured CLASSPATH environment variable, and then indexed file.

java org.apache.lucene.demo.IndexFiles -docs {path-to-lucene}

The above command creates a subdirectory called index, which contains all Lucene index generated source code.

After completion of the establishment of the index, use the following command to index search:

java org.apache.lucene.demo.SearchFiles

Try searching supercalifragilisticexpialidocious, you will get no results, because Lucene source code is not the word. Change a word, such as string, try again. Then get a lot of results per page ten return, of course, you can also return more results necessary requirements.

About Code

This paper deep into the command line behind - the source code to tell you what the core code, where, and what function is intended to make you know how to integrate their own applications Lucene.

Core code

  • IndexFiles.java: used to create an index file
  • SearchFiles.java: for search index files

Index file

Speaking above IndexFiles classes for creating an index file, let's look at how it is done.

IndexFiles implementation process:

main () method parses the command line arguments, and initialize the IndexWriter, open a Directory, and reinitialization StandardAnalyzer IndexWriterConfig.

IndexFiles command-line arguments:

  • -index: specifies the path to a directory index files are stored, if the value -index parameter is not specified, the default path using a relative index, if the current directory does not exist, create a new one. On some platforms, the default, such as user directory index directory may be in a different directory created.
  • -docs: Specifies the document path to create the index.
  • -update: Do not delete those instructions IndexFiles index file already exists, if not this parameter, then IndexFiles first empty index files before indexing.

Use IndexWriter Directory store index information, Directory is an abstract concept does not necessarily mean a directory on the file system may be a RAM or other database. Used in the sample program is FSDirectory, it is stored in the destination file system.

Analyzer is a processing channel, the input text by broken into one word, and then further processed on a word, such as lowercase conversion, synonyms insertion, and filtering operations. In the example program, using StandardAnalyzer, it uses the Unicode text partition algorithm (the algorithm defined in the Unicode Standard Annex # 29 ), and then converted to lower case, and filtered stop words. Providing only a few languages Lucene analyzer defined in lucene / analysis / common / src / java / org / apache under / lucene / analysis.

IndexWriterConfig encapsulates all the IndexWriter configuration.

IndexWriter After initialization, then call indexDocs () method to recursively establishing Document object, then the IndexWriter serialized to a file. Document object is a simple data object representing an index file. If you use the -update command-line parameters, IndexWriterConifg of OpenMode will be set to OpenMode.CREATE_OR_APPEND, according to the identifier so IndexWriter try to find out if the old index file, if present, the update; if not, then New.

Search Index

SearchFile class consists of three components: IndexSearcher, StandardAnalyzer and QueryParser. Search word used with the applicable indexing word must be the same, in order to ensure correct results. QueryParser analysis of search terms, generate a query object Query, and then passed IndexSearcher search.

Call SearchFile of search (query, n) method to search, hit document in accordance with the scores in descending order, and then return the highest previous document.

Knowledge Chain

Lucene- Outline

reference

Lucene 7.4.0 demo API

Guess you like

Origin blog.csdn.net/qq_34017326/article/details/94738269