Solr: Configuring Solr

--------------------------------------------------------------------------------------------------------------------------------------
Query request handling

------------------------------------------------------------------------------------------------------------------------------------

Managing searchers

In Solr, queries are processed by a component called a searcher. There is only one “active” searcher in Solr at any given time. All query components for all search request handlers execute queries against the active searcher.

The active searcher has a read-only view of a snapshot of the underlying Lucene index. It follows that if you add a new document to Solr, then it is not visible in search results from the current searcher. This raises the question: How do new documents become visible in search results? The answer is to close the current searcher and open a new one that has a read-only view of the updated index. A commit creates a new searcher to make new documents and updates visible.

Because precomputed data, such as a cached query result set, must be invalidated and recomputed, it stands to reason that opening a new searcher on your index is potentially an expensive operation. This can have a direct impact on user experience.Solr supports the concept of warming a new searcher in the back-ground and keeping the current searcher active until the new one is fully warmed.

--------------------------------------------------------------------------------------------------------------------------------------

Cache management

Cache fundamentals

There are four main concerns when working with Solr caches:

Cache sizing and eviction policy
Hit ratio and evictions
Cached-object invalidation
Autowarming new caches

Filter cache

In Solr, a filter restricts search results to documents that meet the filter criteria, but it does not affect scoring.Autowarming the filter cache requires Solr to re-execute the filter query with the new searcher. Consequently,autowarming the filter cache can be a source of performance and resource utilization problems in Solr.

Query result cache

The query result cache holds result sets for a query. If you execute a query more than once, subsequent results are served from the query result cache,rather than re-executing the same query against the index.

<queryResultCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>

QUERY RESULT WINDOW SIZE

Imagine your application shows 10 documents per page and in most cases your users only look at the first and second pages. You can set <queryResultWindowSize> to 20 to avoid having to re-execute the query to retrieve the second page of results.

QUERY RESULT MAX DOCS CACHED

You have to set a maximum size for Solr caches; but this doesn’t affect the size of each individual entry in the cache. As you can imagine, a result set holding millions of documents in the cache would greatly impact available memory in Solr. The <queryResultMaxDocsCached> element allows you to limit the
number of documents cached for each entry in the query result cache. Users only look at the first couple of pages in most search applications, so you should generally set this to no more than two or three times the page size.

ENABLE LAZY FIELD LOADING

A common design pattern in Solr is to have a query return a subset of fields for each document. For instance, in our example query, we requested the name,price, features, and score fields. But the documents in the index have many more fields, such as category, popularity, manufacturedate, and so forth. If your application adopts this common design pattern, you may want to set <enableLazyFieldLoading> to true to avoid loading unwanted fields.

Document cache

The query result cache holds a list of internal document IDs that match a query, so even if the query results are cached, Solr still needs to load the documents from disk to produce search results. The document cache is used to store documents loaded from disk in memory keyed by their internal document IDs. It follows that the query result cache uses the document cache to find cached versions of documents in the cached result set.

This raises the question of whether it makes sense to warm the document cache.There’s a good argument to be made against autowarming this cache, because there’s no way to ensure the documents you are warming have any relation to queries and filters being autowarmed from the query result and filter caches. If you are constantly indexing, and the documents a query returns are constantly changing, you could be spending time recreating documents that may not benefit your warmed filter and query result caches. Onthe other hand, if most of your index is relatively static, the document cache may provide value.

Field value cache

The last cache we’ll mention is field value, which is strictly used by Lucene and is not managed by Solr. The field value cache provides fast access to stored field values by internal document ID. It is used during sorting and when building documents for the response.