SolrCloud 5.0 Routing, Collection Creation and Data Migration

    SolrCloud is designed to provide high availability, fault tolerance, content indexing and query requests in a distributed environment.

        SolrCloud 5.0, the built-in SolrCloud startup script has been improved, starting SolrCloud becomes extremely simple, execute

 

[plain]  view plain copy  
 
  1. $ bin/solr –e cloud  

        Enter some parameters according to the prompts, you can start SolrCloud, and the deployment is completed, as shown in the following figure



 

 

 

SolrCloud related concepts

 

        There are four key nouns in SolrCloud: core , collection , shard , node .

        core : In a Solr stand-alone environment, core is essentially a single index. If there are multiple indexes, multiple cores must be created. In a SolrCloud environment, a single index can span multiple Solr instances, which means that a single index is composed of multiple cores on different machines.

        collection : A logical index composed of cores is called a collection . A collection is an index that spans multiple cores, which makes the index scalable and redundant.

        shard : There can be multiple collections in SolrCloud. Collections can be sharded, each shard can have multiple replicas (Replica), and the same shards under the same replica are called shards . One shard under each shards is the leader, and the leader is generated through the election strategy.

        node : In SolrCloud, node is the Java virtual machine instance running Solr, that is, Server (such as Tomcat, Jetty).

        It is important to understand the difference between core and collection. In traditional single-node solr, the concepts of core and collection are equivalent, and both represent a logical index. In SolrCloud, cores under multiple nodes form a collection.

 

SolrCloud Routing

        In SolrCloud, two routing algorithms are provided :

  • compositeId
  • implicit

        When creating a Collection, you need to specify a routing strategy through router.name, which defaults to compositeId routing.

 

 

compositeId

        This route is a consistent hash route, and the hash range of shards ranges from 80000000 to 7fffffff. When creating a collection initially, numShards must be specified. The compositeId routing algorithm calculates the hash range of each shard based on the number of numShards. Therefore, the routing policy cannot expand shards.

 

implicit

        This routing method specifies which shard the index is routed to, which is different from the compositeId routing method, which can be evenly distributed on each shard. At the same time , shards can be created only under the implicit routing policy .

        利用solrJ新建索引时,需要在代码中指定索引具体落在哪个shard上,添加代码:

 

[java]  view plain  copy
 
  1. doc.addField("_route_""shard_X");  

        同时在schema.xml添加字段

 

 

[html]  view plain  copy
 
  1. <field name="_route_" type="string"/>  

        利用URL创建implicit路由方式collection:

 

        http://10.21.17.200:9580/solr-5.0.0-web/admin/collections?action=CREATE&name=testimplicit&router.name=implicit&shards=shard1,shard2,shard3

 

SolrRouter源码

        在Solr源码中,可以看到,Solr路由的基类为DocRouter抽象类,HashBasedRouter和ImplicitDouter继承自DocRouter,同时CompositeIdRouter又继承HashBasedRouter抽象类,通过一个工具Hash类实现Document的路由策略。



 

 

创建Collection

        Solr创建Collection的两种方式:

 

  • 通过前台界面Add Core创建collection


 

 

 



 

 

        由于在tomcat,setenv.sh,设置-DnumShards=7,所以该collection有7个shards。

        需要注意的是:使用compositeId路由创建collection,指定numShards后,不可扩展Shard,即使勉强增加Shard,新建索引也不会落在该Shard上。查看clusterstate.json,可看到新建shard的"range":null

 

  • URL创建collection

        通过URL创建collection需要满足条件:num of (shards + replications)< num of live nodes

 

        测试环境中3台solr机器,创建collection URL为:

        http://10.21.17.200:9580/solr-4.10.0/admin/collections?action=CREATE&name=collection1&router.name=compositeId&numShards=5&replicationFactor=1

        执行结果报错

 

        <str name="Operation createcollection caused exception:">

              org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:Cannot create collection collection1. Value of maxShardsPerNode is 1, and thenumber of live nodes is 3. This allows a maximum of 3 to be created. Value ofnumShards is 5 and value of replicationFactor is 1. This requires 5 shards tobe created (higher than the allowed number)

        </str>

        报错原因不满足 5 + 1 < 3

 

数据迁移

        在某些场景中,需要对SolrCloud进行扩容或数据迁移。

        根据以上讨论的两种路由算法,implicit实现该需求比较简单,只要创建Shard即可,新建索引时,将索引建到新建Shard上,查询操作,指定collection名称,得到的仍是整个集群返回的结果。

        compositeId路由实现上述需求稍微麻烦一下,通过分裂(SPLITSHARD)操作实现。如下图,对Shard1进行分裂,分裂URL为:

        http://10.21.17.200:9580/solr-4.10.0-web/admin/collections?action=SPLITSHARD&collection=log4j201503&shard=shard1



 



 

        At this time, the data of shard1 will be evenly distributed to shard1_0 and shard1_1. If you delete shard1 using the DELETESHARD API, you can ensure that the data is not redundant.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326759932&siteId=291194637