Interpretation of the source code of filterCache usage scenarios in solr

We all know that solr has four caches, queryResultCache, documentCache, filterCache, fieldValueCache. Today I want to talk about filterCache. It is said that it is used to cache the docid of fq, that is, after searching for all the docids of the query corresponding to an fq , the result is cached for future reuse, which saves more io operations. In order to get a more accurate conclusion, I read the code carefully again, made a solr node with the company's 4.10.4 version of solr, and did a lot of experiments.

 

 Under what circumstances will filterCache be used. In SolrIndexSearcher's getDocListC, if the cache is hit (the cache here refers to queryResultCache), let's look at the code:

if (queryResultCache != null && cmd.getFilter() == null && (flags & (NO_CHECK_QCACHE | NO_SET_QCACHE)) != ((NO_CHECK_QCACHE | NO_SET_QCACHE))) //If the cache can be queried
	key = new QueryResultKey(q, cmd.getFilterList(), cmd.getSort(), flags);//Build the key queried from queryResultCache
	if ((flags & NO_CHECK_QCACHE) == 0) { // Judge again if you can query the cache
		superset = queryResultCache.get(key);//Query from queryResultCache
		if (superset != null) {//Cache hit,
			if ((flags & GET_SCORES) == 0 || superset.hasScores()) {//If this query does not need to return scores, or there are scores in the cached results, enter if
				out.docList = superset.subset(cmd.getOffset(), cmd.getLen());//Get the result set needed this time from the cached result. The judgment is based on the two parameters of start + rows. After this Steps may have results because they meet start + rows, or they may have no results because they do not meet
			}
		}
		if (out.docList != null) {//If there is a result,
			if (out.docSet == null && ((flags & GET_DOCSET) != 0)) {//The key is this flags, we need to know the relationship between this flats and GET_DOCSET, after my code search, when facet is used , flags & GET_DOCSET) != 0 is established, that is, when facet, it needs to return docSet
				if (cmd.getFilterList() == null) {//If there is no fq in this query
					out.docSet = getDocSet(cmd.getQuery()); // This method is to get the docSet of the query parsed by q, first look it up from the filterCache, if there is no hit, it will look it up from lucene, and then put it into the filterCache. From here, filterCache will also be put into q's docset.
				} else {//If there is fq
					List<Query> newList = new ArrayList<>(cmd.getFilterList().size() + 1);
					newList.add(cmd.getQuery());
					newList.addAll(cmd.getFilterList());
					out.docSet = getDocSet(newList);// This method will also get the docSet from the filterCache, which is called in the getPositiveDocSet method, and then do the intersection or difference in this method. After this method, q corresponds to fq All the query's docSet will enter the filterCache
				}
			}
			return;
		}
	}

(Let me first talk about how I found that facet will set flags & GET_DOCSET != 0, in org.apache.solr.search.SolrIndexSearcher.QueryCommand.setNeedDocSet(boolean) method will set flags to flags & GET_DOCSET != 0, and this method is called in org.apache.solr.handler.component.ResponseBuilder.getQueryCommand(), and the parameter used is org.apache.solr.handler.component.ResponseBuilder.isNeedDocSet(), let's take a look The method org.apache.solr.handler.component.ResponseBuilder.setNeedDocSet(boolean), his call is in org.apache.solr.handler.component.FacetComponent.prepare(ResponseBuilder), and the incoming is true, that is When opening facet, it will be flags & GET_DOCSET != 0

 

The above code shows that if the cache is hit and the facet is enabled, the getDocSet method will be called, the parameter is either a query, and the other is a List<query>, to get all the docids to realize the function of the facet. In the getDocSet method with only one parameter, the docset will be searched from the filterCache. If there is no search, getDocSetNC (NC means not cache) will be called to search from the index of lucne, and then put into the filterCache. At this time, the query of q The docSet will be put into the fitlerCache; in the method whose parameter is List<query>, it will also be searched from the filterCache, but it is the filterCache that searches the query separately ( the specific implementation method is getProcessedFilter, which will be called by calling getPositiveDocSet gets the docSet from the filterCache, and then does the intersection or difference in this method. All the query's inverted list of the second parameter of this method will be put into the filterCache ), at this time all the fq docSets And q's docSet is put into filterCache . This means that when the cache is hit (again, the cache here is the queryResultCache), if the facet is enabled, the docSet will be searched from the filterCache, and all the docSets of the query formed by fq and q will be put into the filterCache (From this point, it can be found that it is not suitable to call filterCache, because q's docSet will also be put into it).

 

If the cache is not hit, the code is part of the getDocListC of solrIndexSearcher, as follows:

if (useFilterCache) {//Don't worry about this first, there will be a separate description later
	// now actually use the filter cache.
	// for large filters that match few documents, this may be
	// slower than simply re-executing the query.
	if (out.docSet == null) {
		out.docSet = getDocSet(cmd.getQuery(), cmd.getFilter());
		DocSet bigFilt = getDocSet(cmd.getFilterList());
		if (bigFilt != null)
			out.docSet = out.docSet.intersection(bigFilt);
	}
	// todo: there could be a sortDocSet that could take a list of
	// the filters instead of anding them first...
	// perhaps there should be a multi-docset-iterator
	sortDocSet(qr, cmd);
} else {
	// do it the normal way... That is, look up from lucene.
	if ((flags & GET_DOCSET) != 0) {//First determine whether it is GET_DOCSET, from the above we know that if it is facet, it is true, otherwise it is false.
		// this currently conflates returning the docset for the base query vs the base query and all filters.
		DocSet qDocSet = getDocListAndSetNC(qr, cmd);//In this method, the getProcessedFilter method is also called, and the second parameter is the queyr of all fq, that is, all docSets of fq are put into the fitlerCache.
		if (qDocSet != null && filterCache != null && !qr.isPartialResults())//When there is no filter, the docSet corresponding to the query will also be put into filterCache. So the docSet and query obtained at this time match.
			filterCache.put(cmd.getQuery(), qDocSet);
	} else {
		getDocListNC(qr, cmd);//In the case of no facet, for fq, the above getProcessedFilter method will also be used, that is, it will also be searched in filterCache, if there is no hit, it will be searched from lucene, and then the result Put into filterCache.
	}
}

 The above two methods, getDocListAndSetNC and getDocListNC will call the getProcessedFilter method, the incoming parameter is the query represented by fq, and the result obtained is the intersection of all fqs, that is, for fq, even when the facet is not open, The merging of the inverted list of fq will also use filterCache. This means that if the QueryResultCache is not hit, whether the facet is opened or not, the filterCache will be used, and it will be used to merge the inverted list of fq. However, when using the facet, the docSet is still obtained by querying first. lucene (because there is no cache hit).

 

After the above code, whether it hits the cache or misses the cache, we can conclude that filterCache has two functions, one is to merge the inverted list, and the second is to realize the intersection of multiple fqs. It is to obtain the docset from filterCache and realize the function of facet. Or more abstractly, the filterCache is to store the docSet of the query. The query does not have to be fq, and the inverted list of q will also be placed.

 

In fact, filterCache also has a function, which is the part of if(useFilterCache) in the above code. Its logic is very simple. Let's take a look at the code.

boolean useFilterCache = false;
if ((flags & (GET_SCORES | NO_CHECK_FILTERCACHE)) == 0 && useFilterForSortedQuery && cmd.getSort() != null && filterCache != null) {//If this request does not need to return the score, and it is configured in solrconfig useFilterForSortedQuery=true and this request is sorted and filterCache is not null
	useFilterCache = true;
	SortField[] sfields = cmd.getSort().getSort();
	for (SortField sf : sfields) {
		if (sf.getType() == SortField.Type.SCORE) {// If all sorting does not use score
			useFilterCache = false;
			break;
		}
	}
}
if (useFilterCache) {//The following code is the result of using filerCache to achieve the request
	if (out.docSet == null) {
		out.docSet = getDocSet(cmd.getQuery(), cmd.getFilter());//This is the docSet to find the inverted list of query + cmd.getFilter from the lucene index (note that cmd.fitler here is not fq, fq is cmd.getFilterList)
		DocSet bigFilt = getDocSet(cmd.getFilterList());//Search from filterCache, if not found, search from lucene and put it in
		if (bigFilt != null)
			out.docSet = out.docSet.intersection(bigFilt);//The intersection of the two
	}
	sortDocSet(qr, cmd);//Sort the result combination
} else {xxxxx}//Same as above, omitted

 Why does the above emphasize that the score cannot be used alone? The reason is very simple, because if you use score sorting, you may need tf, may need location information, and may need payload, but filterCache does not have these, it only contains id, so if you use score, you cannot use fitlerCache. If the score sorting is not applicable, that is, use a certain domain or a function to sort, so that you can search from the FieldCache according to the id, and the id provided by the filterCache can meet the requirements.

 

Therefore, from the above, in addition to the above functions, filterCache also has a function to satisfy requests for sorting without scores, but this function is unlikely to be used.

 

 

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326168958&siteId=291194637