Apache Lucene 5.x version example

http://blog.csdn.net/isea533/article/details/48791309

Since most of the information about lucene on the Internet is 4.x or earlier, there are major changes compared to the 5.x version, in order to facilitate learning 5.x version, this article makes a record of the simple modification of the 5.x example.

The content of this article is derived from the official documentation, at core/overview-summary.html.

The specific version used in this article is 5.3.1, which is applicable to the 5.x version.

Simple example

Apache Lucene is a high-performance and full-featured text search engine library, here is a simple example of how to use Lucene for indexing and querying.

public static void main(String[] args) throws IOException, ParseException {
    Analyzer analyzer = new StandardAnalyzer();

    // store the index into memory
    Directory directory = new RAMDirectory();
    //If you want to store the index on the hard disk as follows, use the following code instead
    //Directory directory = FSDirectory.open(Paths.get("/tmp/testindex"));
    IndexWriterConfig config = new IndexWriterConfig(analyzer);
    IndexWriter iwriter = new IndexWriter(directory, config);

    String[] texts = new String[]{
        "Mybatis Pagination Plugin - Example",
        "Mybatis Post Bar Q&A Issue 1",
        "Mybatis example complex properties (property)",
        "Mybatis is an extremely (most) simple (good) single (used) paging plugin",
        "Problems with Log4j log output for Mybatis - and all about logging",
        "Mybatis example foreach (below)",
        "Mybatis example foreach (Part 1)",
        "SelectKey of Mybatis Example",
        "Association (2) of Mybatis Example",
        "Association of Mybatis Example"
    };

    for (String text : texts) {
        Document doc = new Document();
        doc.add(new Field("fieldname", text, TextField.TYPE_STORED));
        iwriter.addDocument(doc);
    }
    iwriter.close();

    //read index and query
    DirectoryReader ireader = DirectoryReader.open(directory);
    IndexSearcher isearcher = new IndexSearcher(ireader);
    // Parse a simple query
    QueryParser parser = new QueryParser("fieldname", analyzer);
    Query query = parser.parse("foreach");
    ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
    //Iterate over the output
    for (int i = 0; i < hits.length; i++) {
        Document hitDoc = isearcher.doc(hits[i].doc);
        System.out.println(hitDoc.get("fieldname"));
    }
    ireader.close();
    directory.close();
}

Code output:
foreach of Mybatis example (bottom)
Foreach of Mybatis example (top)

Lucene API is divided into the following packages

org.apache.lucene.analysis

defines the abstract AnalyzerAPI converted from Reader to TokenStream, mainly the tokenizer. Some default implementations are provided, including StopAnalyzer and the grammar-based StandardAnalyzer. For Chinese word segmentation, please refer to the Chinese word segmentation library IKAnalyzer.

org.apache.lucene.codecs

provides an abstract inverted index structure for encoding and decoding, and also provides some different implementations that can be applied to different program requirements.

org.apache.lucene.document

provides a simple Document class. A document is just a set of named fields whose values ​​can be strings or instances of Reader.

org.apache.lucene.index

provides two main classes: IndexWriter for creating and indexing documents, and IndexReader for accessing indexed data.

org.apache.lucene.search

provides data structures that represent queries (e.g. TermQuery for individual keyword queries, PhraseQuery for short sentences, BooleanQuery for Boolean union queries).
IndexSearcher converts queries to TopDocs. Some QueryParsers provide functionality to generate query structures from strings or xml.

org.apache.lucene.store

defines an abstract class to store persistent data, Directory which is a collection of specified files written and read by IndexOutput and IndexInput respectively. Several implementations are provided, including FSDirectory, which uses a filesystem directory to store files. There is also the RAMDirectory class that implements the data structure where files reside in memory.

org.apache.lucene.util

contains some useful data structures and utility classes such as FixedBitSet and PriorityQueue.

The application should follow the steps below to use Luncene

to create a document (Document) by adding a field (Field);

create an IndexWriter, add a document (Document) through the addDocument() method;

call the QueryParser.parser() method to generate a query object from a string;

create an IndexSearcher And query through the search() method.

Finally, the

above content is the most basic content in Luncene. There is also a detailed document under each package above. This article may give some brief introductions to these content. If you need to use Luncene, it is recommended to download the official one. Download, which contains the complete document content.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326942521&siteId=291194637