The use of solr's cache in SolrIndexSearcher

Let me first say that the version of sorl I use is 5.5.3

 

The last blog finished talking about the cache in solr, but only introduced its implementation principle and configuration, and did not find out how the program uses it. This blog is about the use of cache in SolrIndexSearcher. Let's talk about SolrIndexSearcher first, this is the encapsulation of lucene's indexSearcher, and then check it from the cache when querying, if not from lucene, then put it in the cache,

The meaning of cache in this class is:

1. filterCache: key is query, value is docSet, that is, the corresponding inverted table, but it is unordered, it is easy to understand, there may be a key of booleanQuery (including multiple queries), and the value is the corresponding inverted table, But it is not sorted. This is used for caching when it is not sorted (almost not used), and more for the source of docSet (useful when facet).

2. queryResultCache: The key is the object encapsulated according to query, filterquery, and sort, and the value is the result, that is, the result to be returned to the front end

3. documentCache: Corresponds to the return result of lucene's doc(int id) method, but this is searched from the cache.

4. fieldValueCache: I don't understand its usefulness for the time being. In the constructor of solrConfig, it can be found that even if there is no configuration in the configuration, one will be automatically created.

 

Let's look at his search method:

  public QueryResult search(QueryResult qr, QueryCommand cmd) throws IOException {
    getDocListC(qr,cmd);//The call is getDocListC, C means cache, that is, to find the result from the cache first.
    return qr;
  }

 So what ends up being this getDocListC method:

private void getDocListC(QueryResult qr, QueryCommand cmd) throws IOException {
    DocListAndSet out = new DocListAndSet();//Some queries not only need docList, that is, sorted doc, but also need docSet, that is, a collection. For example, this is needed when doing facet.
    qr.setDocListAndSet(out);
    QueryResultKey key=null;
    int maxDocRequested = cmd.getOffset() + cmd.getLen();//How many doc are needed in total, that is offset + rows.
    // check for overflow, and check for # docs in index
    if (maxDocRequested < 0 || maxDocRequested > maxDoc()) maxDocRequested = maxDoc();
    int supersetMaxDoc= maxDocRequested;
    DocList superset = null;

    int flags = cmd.getFlags();//This flag is defined according to the parameters we query.
    Query q = cmd.getQuery();
    if (q instanceof ExtendedQuery) {//General query will not enter if
      ExtendedQuery eq = (ExtendedQuery)q;
      if (!eq.getCache()) {
        flags |= (NO_CHECK_QCACHE | NO_SET_QCACHE | NO_CHECK_FILTERCACHE);
      }
    }
   //If the SolrIndexSearcher is cached and the query does not specify the filter of the docSet (not the fq when we query), and there is no system to use the cache and update the cache.
    if (queryResultCache != null && cmd.getFilter()==null
        && (flags & (NO_CHECK_QCACHE|NO_SET_QCACHE)) != ((NO_CHECK_QCACHE|NO_SET_QCACHE)))
    {
        // all of the current flags can be reused during warming,
        // so set all of them on the cache key.
        key = new QueryResultKey(q, cmd.getFilterList(), cmd.getSort(), flags);//Encapsulate this query, filter, sort into queryKey, that is, the key in the cache.
        if ((flags & NO_CHECK_QCACHE)==0) {//If the use of cache is not prohibited
          superset = queryResultCache.get(key);//The queryResultCache is used here, that is, it is queried according to the query, filter, and sort of the query,

          if (superset != null) {//There is a hit in the cache
            if ((flags & GET_SCORES)==0 || superset.hasScores()) {//If there is no requirement to calculate the score in the request or the score is calculated when the result in the cache is already available, it will definitely meet our requirements
              out.docList = superset.subset(cmd.getOffset(),cmd.getLen());//According to the requirements of the query, select a part from it, and select the result to be returned according to the offset (start) and rows (the result is docList, not yet docSet)
            }
          }
         //The above may have no result because of start (that is, the offset is too large) (because queryKey does not contain start and rows), or simply not entered if(superset != null) and no result, ie docList==null

          if (out.docList != null) {//If the above operation has a result.
            if (out.docSet==null && ((flags & GET_DOCSET)!=0) ) {//Need to get docSet
              if (cmd.getFilterList()==null) {
                out.docSet = getDocSet(cmd.getQuery());//filterQuery is used in getDocSet, and we will see it later.
              } else {
                List<Query> newList = new ArrayList<>(cmd.getFilterList().size()+1);
                newList.add(cmd.getQuery());
                newList.addAll(cmd.getFilterList());
                out.docSet = getDocSet(newList);
              }
            }
            return;
          }
        }
      if ((flags & NO_SET_QCACHE) == 0) {
        if (maxDocRequested < queryResultWindowSize) {//queryResultWindowSize is also configured in solrConf.xml, because for caching, this value specifies the minimum value to be searched from the index each time, so that when browsing the first few pages, it is not necessary to search from the index Searched again.
          supersetMaxDoc=queryResultWindowSize;
        } else {
          supersetMaxDoc = ((maxDocRequested -1)/queryResultWindowSize + 1)*queryResultWindowSize;//The reason for this is to change the number of docs to be queried to a multiple of queryResultWindowSize.
          if (supersetMaxDoc < 0) supersetMaxDoc=maxDocRequested;
        }
      } else {
        key = null;  // we won't be caching the result
      }
    }
    cmd.setSupersetMaxDoc(supersetMaxDoc);

    //Going here means that the cache is not hit or even if it is hit, there is no specified part (such as start is too large), so it must be searched from the index
    boolean useFilterCache=false;//Can filterCache be used as the return result, provided that it is not sorted, because filterCache is not sorted
    if ((flags & (GET_SCORES|NO_CHECK_FILTERCACHE))==0 && useFilterForSortedQuery && cmd.getSort() != null && filterCache != null) {//If these conditions are met, useFilterForSortedQuery is not satisfied by default, usually The next will also use score sorting, so this is not satisfied
      useFilterCache=true;
      SortField[] sfields = cmd.getSort().getSort();
      for (SortField sf : sfields) {
        if (sf.getType() == SortField.Type.SCORE) {
          useFilterCache=false;
          break;
        }
      }
    }
    if (useFilterCache) {//Almost not satisfied with this
      if (out.docSet == null) {
        out.docSet = getDocSet(cmd.getQuery(),cmd.getFilter());
        DocSet bigFilt = getDocSet(cmd.getFilterList());
        if (bigFilt != null) out.docSet = out.docSet.intersection(bigFilt);
      }
      sortDocSet(qr, cmd);
    } else {
      // do it the normal way...
      if ((flags & GET_DOCSET)!=0) {
        DocSet qDocSet = getDocListAndSetNC(qr,cmd);//NC, that is, not cache, that is, find docSet and docList from the index
        //The docSet of the query is stored, there is no filter
        if (qDocSet!=null && filterCache!=null && !qr.isPartialResults()) filterCache.put(cmd.getQuery(),qDocSet);//filterCache stores query+docSet, which is not sorted,
      } else {
        getDocListNC(qr,cmd);
      }
      assert null != out.docList : "docList is null";
    }

   //There is a cursor in the middle. Since there is no cursor, this part is deleted.

    // lastly, put the superset in the cache if the size is less than or equal to queryResultMaxDocsCached
    if (key != null && superset.size() <= queryResultMaxDocsCached && !qr.isPartialResults()) {//queryResultMaxDocsCached is also configured in solrConf, indicating the maximum number of docs that can be cached, if it is too large, it cannot be cached .
      queryResultCache.put(key, superset);//Put it into queryResultCache, the key is the encapsulated query, filter, sort object.
    }
  }

 

There is also a method, getDocSet,

  public DocSet getDocSet(Query query) throws IOException {
    if (query instanceof ExtendedQuery) {//Do not enter this if
      ExtendedQuery eq = (ExtendedQuery)query;
      if (!eq.getCache()) {
        if (query instanceof WrappedQuery) {
          query = ((WrappedQuery)query).getWrappedQuery();
        }
        query = QueryUtils.makeQueryable(query);
        return getDocSetNC(query, null);
      }
    }

//Change this query to a positive query, because some queries may be -name:james, that is, the name is not james, it should be changed to name:james
    Query absQ = QueryUtils.getAbs(query);
    boolean positive = query==absQ;
    if (filterCache != null) {
      DocSet absAnswer = filterCache.get(absQ);//Try to get the result from filterCache
      if (absAnswer!=null) {
        if (positive) return absAnswer;
        else return getPositiveDocSet(matchAllDocsQuery).andNot(absAnswer);
      }
    }

    DocSet absAnswer = getDocSetNC(absQ, null);//If not found, look up from the index,
    DocSet answer = positive? absAnswer: getPositiveDocSet (matchAllDocsQuery) .andNot (absAnswer);

    if (filterCache != null) {//After finding it, put it into the cache, you can find that filterQuery is only an operation based on query, and there will be no filter.
      filterCache.put (absQ, absAnswer);
    }
    return answer;
  }

 

There is also a cache, documentCache is not used, this is simpler, there is a doc method in SolrIndexSearcher, doc(int id), the implementation sold here is to recruit from this documentCache first, and then call lucene's indexSearcher if it is not found. find from index

 

So far, I have fully understood the meaning of the three caches:

1. filterCache is used for docSet to process fq parameters (it is also used in facet and group). If fq is turned on when querying, the fiterCache will be queried (there is a lot of code in this part that I did not post)

2. queryResultCache: used to cache the query composed of query+filter+sort+flag. Regardless of start and row

3, documentCache: used to cache the last document.

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326576468&siteId=291194637