This is the official Lucene FAQ

转自：https://wiki.apache.org/lucene-java/LuceneFAQ
This is the official Lucene FAQ.

If you have a question about using Java Lucene, please do not add it directly to this FAQ. Join the Java User mailing list and email your question there.

Questions should only be added to this Wiki page when they already have an answer that can be added at the same time.

Contents
Lucene FAQ
General
How do I start using Lucene?
Are there any mailing lists available?
What Java version is required to run Lucene?
Will Lucene work with my Java application?
How can I get the latest greatest development code?
Where can I get the javadocs for the org.apache.lucene classes?
Where does the name Lucene come from?
Are there any alternatives to Lucene?
How does Zend Search Lucene (and other non-ASF ports) Relate to Lucene?
Does Lucene have a web crawler?
Why am I getting an IOException that says “Too many open files”?
When I compile Lucene x.y.z from source, the version number in the jar file name and MANIFEST.MF is different. What’s up with that?
How do I contribute an improvement?
Why hasn’t patch FOO been committed?
What are the backwards compatibility commitments?
How do I get code written for Lucene 1.4.x to work with Lucene 2.x?
I am having a performance issue. How do I ask for help on the [email protected] mailing list?
What does l.a.o and o.a.l.xxxx stand for?
What is the difference between field (or document) boosting and query boosting?
Searching
Does Lucene allow searching and indexing simultaneously?
Why am I getting no hits / incorrect hits?
Why am I getting a TooManyClauses exception?
How can I search over multiple fields?
What wildcard search support is available from Lucene?
Can I combine wildcard and phrase search, e.g. “foo ba*”?
Is the QueryParser thread-safe?
How do I restrict searches to only return results from a limited subset of documents in the index (e.g. for privacy reasons)? What is the best way to approach this?
What is the order of fields returned by Document.fields()?
How does one determine which documents do not have a certain term?
How do I get the last document added that has a particular term?
Does MultiSearcher do anything particularly efficient to search multiple indices or does it simply search one after the other?
Is there a way to use a proximity operator (like near or within) with Lucene?
Are Wildcard, Prefix, and Fuzzy queries case sensitive?
Why does IndexReader’s maxDoc() return an ‘incorrect’ number of documents sometimes?
Is there a way to get a text summary of an indexed document with Lucene (a.k.a. a “snippet” or “fragment”) to display along with the search result?
Can I cache search results with Lucene?
Is the IndexSearcher thread-safe?
Is there a way to retrieve the original term positions during the search?
How do I retrieve all the values of a particular field that exists within an index, across all documents?
Can Lucene do a “search within search”, so that the second search is constrained by the results of the first query?
Does the position of the matches in the text affect the scoring?
How do I make sure that a match in a document title has greater weight than a match in a document body?
How do I find similar documents?
Can I filter by score?
How can I cluster results, i.e. create groups of similar documents?
How do I implement paging, i.e. showing result from 1-10, 11-20 etc?
How do I speed up searching?
Indexing
Can I use Lucene to crawl my site or other sites on the Internet?
How can I use Lucene to index a database?
How do I perform a simple indexing of a set of documents?
How can I add document(s) to the index?
Where does Lucene store the index it builds?
Can I store the Lucene index in a relational database?
Can I store the Lucene index in a BerkeleyDB?
I get “No tvx file”. What does that mean?
Does Lucene store a full copy of the indexed documents?
What is the different between Stored, Tokenized, Indexed, and Vector?
What happens when you IndexWriter.add() a document that is already in the index? Does it overwrite the previous document?
How do I delete documents from the index?
Is there a way to limit the size of an index?
Why is it important to use the same analyzer type during indexing and search?
What are Segments?
Is Lucene index database platform independent?
When I recreate an index from scratch, do I have to delete the old index files?
How can I index and search digits and other non-alphabetic characters?
Is the IndexWriter class, and especially the method addIndexes(Directory[]) thread safe?
When is it possible for document IDs to change?
What is the purpose of write.lock file, when is it used, and by which classes?
What is the purpose of the commit.lock file, when is it used, and by which classes?
My program crashed and now I get a “Lock obtain timed out.” error. Where is the lock and how can i delete it?
Is there a maximum number of segment infos whose summary (name and document count) is stored in the segments file?
How do I update a document or a set of documents that are already indexed?
How do I write my own Analyzer?
How do I index non Latin characters?
How can I index HTML documents?
How can I index XML documents?
How can I index file formats like OpenDocument (aka OpenOffice.org), RTF, Microsoft Word, Excel, PowerPoint, Visio, etc?
How can I index Email (from MS-Exchange or another IMAP server) ?
How can I index PDF documents?
How can I index JSP files?
How can I index java source files?
What is the difference between IndexWriter.addIndexes(IndexReader[]) and IndexWriter.addIndexes(Directory[]), besides them taking different arguments?
Can I use Lucene to index text in Chinese, Japanese, Korean, and other multi-byte character sets?
Why do I have a deletable file (and old segment files remain) after merging?
How do I speed up indexing?

This is the official Lucene FAQ

猜你喜欢