代码入口
ES5.6.4的查询功能的代码入口位于TransportSearchAction#doExecute
@Override
protected void doExecute(Task task, SearchRequest searchRequest, ActionListener<SearchResponse> listener) {
final long absoluteStartMillis = System.currentTimeMillis();
final long relativeStartNanos = System.nanoTime();
final SearchTimeProvider timeProvider =
new SearchTimeProvider(absoluteStartMillis, relativeStartNanos, System::nanoTime);
// 获取集群的状态
final ClusterState clusterState = clusterService.state();
// 根据不同集群分组索引
final Map<String, OriginalIndices> remoteClusterIndices = remoteClusterService.groupIndices(searchRequest.indicesOptions(),
searchRequest.indices(), idx -> indexNameExpressionResolver.hasIndexOrAlias(idx, clusterState));
OriginalIndices localIndices = remoteClusterIndices.remove(RemoteClusterAware.LOCAL_CLUSTER_GROUP_KEY);
if (remoteClusterIndices.isEmpty()) {
executeSearch((SearchTask)task, timeProvider, searchRequest, localIndices, remoteClusterIndices, Collections.emptyList(),
(clusterName, nodeId) -> null, clusterState, Collections.emptyMap(), listener, clusterState.getNodes()
.getDataNodes().size());
} else {
remoteClusterService.collectSearchShards(searchRequest.indicesOptions(), searchRequest.preference(), searchRequest.routing(),
remoteClusterIndices, ActionListener.wrap((searchShardsResponses) -> {
List<SearchShardIterator> remoteShardIterators = new ArrayList<>();
Map<String, AliasFilter> remoteAliasFilters = new HashMap<>();
BiFunction<String, String, DiscoveryNode> clusterNodeLookup = processRemoteShards(searchShardsResponses,
remoteClusterIndices, remoteShardIterators, remoteAliasFilters);
int numNodesInvovled = searchShardsResponses.values().stream().mapToInt(r -> r.getNodes().length).sum()
+ clusterState.getNodes().getDataNodes().size();
executeSearch((SearchTask) task, timeProvider, searchRequest, localIndices, remoteClusterIndices, remoteShardIterators,
clusterNodeLookup, clusterState, remoteAliasFilters, listener, numNodesInvovled);
}, listener::onFailure));
}
}
这个方法做了两件事:
1、对请求中的索引根据集群名称进行分组
由于ES的一次查询请求是支持夸集群查询的,因此在执行请求之前需要对请求中包含的所有索引按照集群名进行分组。然后按照集群执行请求。
2、执行搜索请求。
如果查询对象包含其他集群的索引,那么需要先获取其他集群的对应分片信息,然后执行executeSearch进行查询。
索引分组
public Map<String, List<String>> groupClusterIndices(String[] requestIndices, Predicate<String> indexExists) {
Map<String, List<String>> perClusterIndices = new HashMap<>();
Set<String> remoteClusterNames = getRemoteClusterNames();
for (String index : requestIndices) {
int i = index.indexOf(RemoteClusterService.REMOTE_CLUSTER_INDEX_SEPARATOR);
if (i >= 0) {
String remoteClusterName = index.substring(0, i);
List<String> clusters = clusterNameResolver.resolveClusterNames(remoteClusterNames, remoteClusterName);
if (clusters.isEmpty() == false) {
if (indexExists.test(index)) {
// we use : as a separator for remote clusters. might conflict if there is an index that is actually named
// remote_cluster_alias:index_name - for this case we fail the request. the user can easily change the cluster alias
// if that happens
throw new IllegalArgumentException("Can not filter indices; index " + index +
" exists but there is also a remote cluster named: " + remoteClusterName);
}
String indexName = index.substring(i + 1);
for (String clusterName : clusters) {
perClusterIndices.computeIfAbsent(clusterName, k -> new ArrayList<>()).add(indexName);
}
} else {
perClusterIndices.computeIfAbsent(RemoteClusterAware.LOCAL_CLUSTER_GROUP_KEY, k -> new ArrayList<>()).add(index);
}
} else {
perClusterIndices.computeIfAbsent(RemoteClusterAware.LOCAL_CLUSTER_GROUP_KEY, k -> new ArrayList<>()).add(index);
}
}
return perClusterIndices;
}
遍历索引名称,远程索引名称格式与本地索引的名称格式不同,集群名称:索引名称。本地索引存在Map中key= LOCAL_CLUSTER_GROUP_KEY对应的value中。远程索引在Map中key=集群名称,value存索引名称集合。
对索引按照集群名称分组后,开始进入查询
查询流程
1、 查看集群状态,如果状态为RED就抛出异常
clusterState.blocks().globalBlockedRaiseException(ClusterBlockLevel.READ);
2、 由于请求中的索引信息可能是含有通配符的表达式。比如:hik_mac*,因此需要将其解析成具体的索引信息
indices = indexNameExpressionResolver.concreteIndices(clusterState, searchRequest.indicesOptions(),
timeProvider.getAbsoluteStartMillis(), localIndices.indices());
在ES中对索引名称的解析器有两种:DateMathExpressionResolver用于解析日期数学表达式表示的索引名称;WildcardExpressionResolver用于解析通配符表示的索引名称
3、 解析出每个索引对应的路由
Map<String, Set<String>> routingMap = indexNameExpressionResolver.resolveSearchRouting(clusterState, searchRequest.routing(),
searchRequest.indices());
routingMap 的key=索引名称 value是对应的路由名称集合
4、 根据routingMap查找出本次请求的所有目标分片
GroupShardsIterator<ShardIterator> localShardsIterator = clusterService.operationRouting().searchShards(clusterState,
concreteIndices, routingMap, searchRequest.preference());
路由转化为分片号的算法已经与2.x的算法不同
ES 5.6.4
private static int calculateScaledShardId(IndexMetaData indexMetaData, String effectiveRouting, int partitionOffset) {
final int hash = Murmur3HashFunction.hash(effectiveRouting) + partitionOffset;
// we don't use IMD#getNumberOfShards since the index might have been shrunk such that we need to use the size
// of original index to hash documents
// 这里的计算方法不再对NumberOfShards取模,因为这涉及到5.X新增的分片收缩功能。
return Math.floorMod(hash, indexMetaData.getRoutingNumShards()) / indexMetaData.getRoutingFactor();
}
ES 2.3.5
if (routing == null) {
if (!useType) {
hash = hash(hashFunction, id);
} else {
hash = hash(hashFunction, type, id);
}
} else {
hash = hash(hashFunction, routing);
}
if (createdVersion.onOrAfter(Version.V_2_0_0_beta1)) {
return MathUtils.mod(hash, indexMetaData.getNumberOfShards());
5.X 的 计算公式: (hash%routingNumShards)/routingFactor
2.X 的 计算公式: hash%NumOfShards
ES 5.x之所以对路由算法改进是因为5.X新增分片的收缩功能:
doc0--------> shard0 shard1----->shard’0
doc1--------> shard1 shard2----->shard’0
doc2--------> shard2 收缩成2个分片后 shard3-----shard’1
doc3--------> shard3 shard4----->shard’1 收缩的算法: targetShardId = sourceShardId/factor; factor=sourceNum/targetNum
收缩后的文档对应分片关系
doc0--------> shard’0
doc1--------> shard’0
doc2--------> shard’1
doc3--------> shard’1
收缩过如果还是按照hash%routingNumShards 这个算法路由的话会变成:
doc0--------> shard’0
doc1--------> shard’1
doc2--------> shard’0 这就错了
doc3--------> shard’1
按照(hash%routingNumShards)/routingFactor 算法:routingNumShards=4,routingFactor=routingNumShards/ NumOfShards=4/2=2
doc0--------> shard’0
doc1--------> shard’0
doc2--------> shard’1
doc3--------> shard’1 结果正确
5、 检查目标分片数是否超过限制,action.search.shard_count.limit 参数控制
failIfOverShardCountLimit(clusterService, shardIterators.size());
6、 判断是否需要在查询前做目标分片过滤
boolean preFilterSearchShards = shouldPreFilterSearchShards(searchRequest, shardIterators);
判断为true需要满足3个条件:
searchRequest.searchType() == QUERY_THEN_FETCH && //1、查询类型为QUERY_THEN_FETCH
SearchService.canRewriteToMatchNone(source) && // 2、是否能通过查询重写预判出查询结果为空
searchRequest.getPreFilterShardSize() < shardIterators.size(); // 3、实际的查询分片数量> preFilterShardSize(默认128)
7、 如果需要做分片过滤,就需要进入CAN_MATCH阶段
public boolean canMatch(ShardSearchRequest request) throws IOException {
assert request.searchType() == SearchType.QUERY_THEN_FETCH : "unexpected search type: " + request.searchType();
try (DefaultSearchContext context = createSearchContext(request, defaultSearchTimeout, null)) {
SearchSourceBuilder source = context.request().source();
if (canRewriteToMatchNone(source)) {
QueryBuilder queryBuilder = source.query();
return queryBuilder instanceof MatchNoneQueryBuilder == false;
}
return true; // null query means match_all
}
}
首先用createSearchContext对查询进行重写,然后根据重写结果对分片进行过滤。
8、 在每个分片执行查询请求
if (shardsIts.size() > 0) {
int maxConcurrentShardRequests = Math.min(this.maxConcurrentShardRequests, shardsIts.size());
final boolean success = shardExecutionIndex.compareAndSet(0, maxConcurrentShardRequests);
assert success;
for (int index = 0; index < maxConcurrentShardRequests; index++) {
final SearchShardIterator shardRoutings = shardsIts.get(index);
assert shardRoutings.skip() == false;
performPhaseOnShard(index, shardRoutings, shardRoutings.nextOrNull());
}
}
9、 从缓存中取数据或者执行查询
private void loadOrExecuteQueryPhase(final ShardSearchRequest request, final SearchContext context) throws Exception {
final boolean canCache = indicesService.canCache(request, context);
context.getQueryShardContext().freezeContext();
if (canCache) {
indicesService.loadIntoContext(request, context, queryPhase);
} else {
queryPhase.execute(context);
}
}
必须要QUERY_THEN_FETCH 才能取缓存
请求中的缓存参数为true才能取缓存,如果没设置,配置文件的参数index.requests.cache.enable(默认为true)为true才能取缓存。
如果允许取缓存,且去得到,就把缓存数据反序列化到查询结果中,否则执行查询。
10、 判断是否只做建议查询,是的话只执行建议查询,否则先执行搜索查询。建议查询例子:输入查询只”unexpacted”,查询结果返回unexpected。这个功能通过返回编辑距离近似的值,判断出用户真正要查的结果。
if (searchContext.hasOnlySuggest()) {
suggestPhase.execute(searchContext);
// TODO: fix this once we can fetch docs for suggestions
searchContext.queryResult().topDocs(
new TopDocs(0, Lucene.EMPTY_SCORE_DOCS, 0),
new DocValueFormat[0]);
return;
}
seggust的流程:
- 用分析器将text解析成sourcetoken流。
- 根据suggest的字段从lucene索引中获取指定字段所有的targettoken流。
- 遍历sourcetoken流的每个sourcetoken,从targettoken流中筛选出与sourcetoken相似的targettoken。
11、 如果不是只做建议查询,就先执行搜索查询,再对查询结果做二次平分,再做建议查询,再做聚合
final ContextIndexSearcher searcher = searchContext.searcher();
boolean rescore = execute(searchContext, searcher, searcher::setCheckCancelled);
if (rescore) { // only if we do a regular search
rescorePhase.execute(searchContext);
}
suggestPhase.execute(searchContext);
aggregationPhase.execute(searchContext);
12、 如果有要求,对查询语句进行分析
if (searchContext.getProfilers() != null) {
ProfileShardResult shardResults = SearchProfileShardResults
.buildShardResults(searchContext.getProfilers());
searchContext.queryResult().profileResults(shardResults);
}