在solrCloud中,我们发起的一次查询请求绝大部分是朝多个shard发起的请求,但是可能存在这么一个情况,我已经知道这次要查询的sahrd是哪一个了,那么如何只向一个shard发起请求呢?这个博客就是看看solrCloud对分布式请求的判断,代码在HttpShardHandler中,看看checkDistributed方法:
/** * 判断这次请求是不是分布式的请求,根据是不是有zk, * 如果是的话,则找到由Router决定的要路由到的多个shard, * 并添加多个shard的多个replica的url,用|分隔,放在rb的shard和slices中 */ @Override public void checkDistributed(ResponseBuilder rb) { SolrQueryRequest req = rb.req; SolrParams params = req.getParams(); rb.isDistrib = params.getBool("distrib", req.getCore().getCoreDescriptor().getCoreContainer().isZooKeeperAware());// 先检查distrib这个参数,如果指定了则使用,否则默认值是是否启动了zk. String shards = params.get(ShardParams.SHARDS);// 参数中指定的shards参数。 // for back compat, a shards param with URLs like localhost:8983/solr will mean that this // search is distributed. boolean hasShardURL = shards != null && shards.indexOf('/') > 0; rb.isDistrib = hasShardURL | rb.isDistrib;//由distrib、是否使用zk、是否制定了shards三个参数决定一个请求是否是分布式的,即是否要向多个shard转发请求。 if (rb.isDistrib) {// 如果是分布式的。 // since the cost of grabbing cloud state is still up in the air, we grab it only if we need it. ClusterState clusterState = null; Map<String,Slice> slices = null; CoreDescriptor coreDescriptor = req.getCore().getCoreDescriptor(); CloudDescriptor cloudDescriptor = coreDescriptor.getCloudDescriptor(); ZkController zkController = coreDescriptor.getCoreContainer().getZkController(); if (shards != null) {// 如果在请求的参数中指定了shards,则使用给定的shards List<String> lst = StrUtils.splitSmart(shards, ",", true);// 可以指定多个要查询的shard,用英文的逗号分隔。 rb.shards = lst.toArray(new String[lst.size()]); rb.slices = new String[rb.shards.length]; if (zkController != null) { // figure out which shards are slices for (int i = 0; i < rb.shards.length; i++) { if (rb.shards[i].indexOf('/') < 0) { // this is a logical shard rb.slices[i] = rb.shards[i]; rb.shards[i] = null; } } } } else if (zkController != null) {// 如果没有指定shards并且使用了zk // we weren't provided with an explicit list of slices to query via "shards", so use the cluster state clusterState = zkController.getClusterState(); String shardKeys = params.get(ShardParams._ROUTE_);// shardKeys就是参数中的_route_,这个指定要路由到的shard,对于任何的Router都可以使用这个值(像Implicit这个Router可以使用域的名字来指定要查找的shard)。 // This will be the complete list of slices we need to query for this request. slices = new HashMap<>(); // we need to find out what collections this request is for. // A comma-separated list of specified collections. // Eg: "collection1,collection2,collection3" String collections = params.get("collection");// 得到collection,可能有多个collection,有,分隔。 if (collections != null) { // If there were one or more collections specified in the query, split // each parameter and store as a separate member of a List. List<String> collectionList = StrUtils.splitSmart(collections, ",", true); // In turn, retrieve the slices that cover each collection from the // cloud state and add them to the Map 'slices'. for (String collectionName : collectionList) {// 假设只有一个collection. // The original code produced <collection-name>_<shard-name> when the collections // parameter was specified (see ClientUtils.appendMap) // Is this necessary if ony one collection is specified? // i.e. should we change multiCollection to collectionList.size() > 1? addSlices(slices, clusterState, params, collectionName, shardKeys, true);// 根据这个collection的路由策略和参数找到所有要请求的shard。这个方法的实现要涉及到docRouter,关于这个博客参见http://suichangkele.iteye.com/blog/2363305这个博客。 } } else { // just this collection String collectionName = cloudDescriptor.getCollectionName(); addSlices(slices, clusterState, params, collectionName, shardKeys, false); } // Store the logical slices in the ResponseBuilder and create a new // String array to hold the physical shards (which will be mapped // later). rb.slices = slices.keySet().toArray(new String[slices.size()]); rb.shards = new String[rb.slices.length]; }
读完了这个代码,便明白了solrCloud对分布式请求的路由的规则,如果我们指定了shards就会使用查找的shard,如果没有指定,则使用collection中的DocRouter根据参数中的_router_来决定要路由到的shard。对于DocRouter的操作在http://suichangkele.iteye.com/blog/2363305这个博客中写了。