Solr's facet source code interpretation (2) - facet.field

facet.field is much more complicated than facet.query and has more parameters. Let's take a look at the code. The method is: SimpleFacets.getFacetFieldCounts()

public NamedList<Object> getFacetFieldCounts() throws IOException, SyntaxError {
	NamedList<Object> res = new SimpleOrderedMap<>();
	String[] facetFs = params.getParams(FacetParams.FACET_FIELD);
	if (null == facetFs) {
		return res;
	}
 	int maxThreads = req.getParams().getInt(FacetParams.FACET_THREADS, 0);//In some cases, multiple threads can be used to process facet. The parameter is facet.threads, if it is greater than 0, a multi-threaded thread pool is used. However, the processing of multiple tasks is for multiple facet fields. If there is only one facet field, it is useless to set this parameter.
	Executor executor = maxThreads == 0 ? directExecutor : facetExecutor;
	//If it is less than 0, it means using an unlimited thread pool (of course, his implementation still uses a thread pool similar to CachedThreadPool, that is, the maximum value is unlimited, but the following semaphore can be used to make a limit to achieve the same Function)
	final Semaphore semaphore = new Semaphore((maxThreads <= 0) ? Integer.MAX_VALUE : maxThreads);
	List<Future<NamedList>> futures = new ArrayList<>(facetFs.length);//Results of facets of multiple domains.
	try {
		for (String f : facetFs) {//loop through all facet fields
			parseParams(FacetParams.FACET_FIELD, f);//Parse the parameters of the facet of this field
			final String termList = localParams == null ? null : localParams.get(CommonParams.TERMS);//I don't see this situation, all the following assume termList=null
			final String workerKey = key;//When there is no localParam, the specific field
			final String workerFacetValue = facetValue;//The object of the facet, which is the name of the field
			final DocSet workerBase = this.docs;//All doc ids obtained by q and fq during the previous query
			Callable<NamedList> callable = new Callable<NamedList>() {//This is the task to be submitted to the thread pool to process a facet.field
				@Override
				public NamedList call() throws Exception {
					try {
						NamedList<Object> result = new SimpleOrderedMap<>();
						if (termList != null) {//Ignore this situation. not used at work
							List<String> terms = StrUtils.splitSmart(termList, ",", true);
							result.add(workerKey, getListedTermCounts(workerFacetValue, workerBase, terms));
						} else {
							result.add(workerKey, getTermCounts(workerFacetValue, workerBase));// The specific method is this,
						}
						return result;
					} catch (SolrException se) {
						throw se;
					} catch (Exception e) {
						throw new SolrException(ErrorCode.SERVER_ERROR,
								"Exception during facet.field: " + workerFacetValue, e);
					} finally {
						semaphore.release();//Release resources
					}
				}
			};
			RunnableFuture<NamedList> runnableFuture = new FutureTask<>(callable);
			semaphore.acquire();// may block and/or interrupt
			executor.execute(runnableFuture);// releases semaphore when done
			futures.add(runnableFuture);
		} // facetFs loop
		// Loop over futures to get the values. The order is the same as facetFs but shouldn't matter.
		for (Future<NamedList> future : futures) {
			res.addAll(future.get());
		}
		assert semaphore.availablePermits() >= maxThreads;
	} catch (InterruptedException e) {
		throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,"Error while processing facet fields: InterruptedException", e);
	} catch (ExecutionException ee) {
		Throwable e = ee.getCause();// unwrap
		if (e instanceof RuntimeException) {
			throw (RuntimeException) e;
		}
		throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,"Error while processing facet fields: " + e.toString(), e);
	}
	return res;
}

As can be seen above, a thread pool can be used for multiple facet.fields. Submitting multiple facet.fields to different CPUs for parallel processing can improve the speed. The parameter to control the use of multiple threads is facet.threads, but if If there is only one facet.field, it is useless, because he only has one task.

The most important thing below is the getTermCounts method.

/**
 * Facet within the scope of base <br/>
 * Term counts for use in field faceting that resepects the appropriate mincount
 * @see FacetParams#FACET_MINCOUNT
 */
public NamedList<Integer> getTermCounts(String field, DocSet base) throws IOException {
	Integer mincount = params.getFieldInt(field, FacetParams.FACET_MINCOUNT);//The minimum number of doc matching each term
	return getTermCounts(field, mincount, base);
}

 

/**
 * @param filed: facet's domain
 * @param mincount : the value of the least doc that matches the conditional term
 * @param base the set determined by q and fq
 */
private NamedList<Integer> getTermCounts(String field, Integer mincount, DocSet base) throws IOException {
	
	int offset = params.getFieldInt(field, FacetParams.FACET_OFFSET, 0);//偏移量
	int limit = params.getFieldInt(field, FacetParams.FACET_LIMIT, 100);//How many
	if (limit == 0)
		return new NamedList<>();
	
	if (mincount == null) {
		// Determine whether to collect the value of the term without doc matching
		Boolean zeros = params.getFieldBool(field, FacetParams.FACET_ZEROS);
		mincount = (zeros != null && !zeros) ? 1 : 0;
	}
	// Do you want to collect the value of null? Some doc has no value in this field, which is represented by null. If it is true, it will return the number of doc hits by a null term.
	boolean missing = params.getFieldBool(field, FacetParams.FACET_MISSING, false);
	// default to sorting if there is a limit. The sorting of facet results. If sort is not specified, if limit>0, the count is used to sort, that is, the number of hit docs, otherwise the facet value is used to sort by literal value
	String sort = params.getFieldParam(field, FacetParams.FACET_SORT,limit > 0 ? FacetParams.FACET_SORT_COUNT : FacetParams.FACET_SORT_INDEX);
	String prefix = params.getFieldParam(field, FacetParams.FACET_PREFIX);//The value of the prefix that must be matched

	NamedList<Integer> counts;
	SchemaField sf = searcher.getSchema().getField(field);
	FieldType ft = sf.getType();

	// Determine the faceting method
	final String methodStr = params.getFieldParam(field, FacetParams.FACET_METHOD);
	FacetMethod method = null;
	if (FacetParams.FACET_METHOD_enum.equals(methodStr)) {
		method = FacetMethod.ENUM;//Ignore this situation, I have not encountered it
	} else if (FacetParams.FACET_METHOD_fcs.equals(methodStr)) {
		method = FacetMethod.FCS;
	} else if (FacetParams.FACET_METHOD_fc.equals(methodStr)) {
		method = FacetMethod.FC;
	}
	if (method == FacetMethod.ENUM && TrieField.getMainValuePrefix(ft) != null) {
		method = sf.multiValued() ? FacetMethod.FC : FacetMethod.FCS;
	}
	if (method == null && ft instanceof BoolField) {
		// Always use filters for booleans... we know the number of values is very small.
		method = FacetMethod.ENUM;
	}
	//Is it multi-valued domain or word segmentation.
	final boolean multiToken = sf.multiValued() || ft.multiValuedFieldCache();
	if (method == null && ft.getNumericType() != null && !sf.multiValued()) {//If no method is specified and it is a single-valued numeric type, use FCS first.
		// the per-segment approach is optimal for numeric field types since there are no global ords to merge and no need to create an expensive top-level reader
		method = FacetMethod.FCS;//fcs (only facet single-valued domains)
	}
	if (ft.getNumericType() != null && sf.hasDocValues()) {//If it is a numeric type and has docVaue, it is recommended to use FCS
		// only fcs is able to leverage the numeric field caches
		method = FacetMethod.FCS;//
	}
	if (method == null) {//If none of the above is entered, FC is used by default
		method = FacetMethod.FC;
	}
	if (method == FacetMethod.FCS && multiToken) {//FCS cannot handle multi-value domains, so switch to FC
		method = FacetMethod.FC;
	}
	if (method == FacetMethod.ENUM && sf.hasDocValues()) {
		method = FacetMethod.FC;
	}
	if (params.getFieldBool(field, GroupParams.GROUP_FACET, false)) {//This function is not used, ignore
	        counts = getGroupedCounts(searcher, base, field, multiToken, offset, limit, mincount, missing, sort,prefix);
	} else {
		assert method != null;
		switch (method) {
		case ENUM:
			assert TrieField.getMainValuePrefix(ft) == null;
			counts = getFacetTermEnumCounts(searcher, base, field, offset, limit, mincount, missing, sort, prefix);
			break;
		case FCS:// can only handle single-valued fields without word segmentation.
			assert !multiToken;
			if (ft.getNumericType() != null/* && !sf.multiValued()*/) {//This is my own comment. Because if you use FCS and getNumericType !=null, it must not be multiValued, so the second condition is useless.
				if (prefix != null && !prefix.isEmpty()) {
					throw new SolrException(ErrorCode.BAD_REQUEST, FacetParams.FACET_PREFIX + " is not supported on numeric types");
				}
				// This will try not to use the read dictionary table, unless the result to be returned is not enough and the parameter minCount=0 is used
				counts = NumericFacets.getCounts(searcher, base, field, offset, limit, mincount, missing, sort);
			} else {//Single value domain facet
				PerSegmentSingleValuedFaceting ps = new PerSegmentSingleValuedFaceting(searcher, base, field, offset, limit, mincount, missing, sort, prefix);
				Executor executor = threads == 0 ? directExecutor : facetExecutor;
				ps.setNumThreads (threads);
				counts = ps.getFacetCounts(executor);
			}
			break;
		case FC:
			if (sf.hasDocValues()) {//If there is docValue,
				counts = DocValuesFacets.getCounts(searcher, base, field, offset, limit, mincount, missing, sort, prefix);
			} else if (multiToken || TrieField.getMainValuePrefix(ft) != null) {//If there is no docValue and it is a multi-value domain
				UnInvertedField uif = UnInvertedField.getUnInvertedField(field, searcher);
				counts = uif.getCounts(searcher, base, offset, limit, mincount, missing, sort, prefix);
			} else {
				counts = getFieldCacheCounts(searcher, base, field, offset, limit, mincount, missing, sort, prefix);
			}
			break;
		default:
			throw new AssertionError();
		}
	}
	return counts;
}

 From the above code, if no type is specified, if it is a single-valued field number, FCS is used first, otherwise FC is used. If FCS is specified, but if the field is a multi-valued field, FC will also be used, so we Just take a look at what's going on with the numbers for FCS and what's going on with FC. In the next few blogs, I will take a closer look at the two cases of numbers and non-numbers in FCS and the case of FCS with docValue (do not read the case that does not contain docValue, all work contains docValue).

  

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326078190&siteId=291194637