Quick start with solr: build a solr cluster and create a core, set up data synchronization (11)

0 Preface

In the previous chapters, we have explained the basic use of the solr stand-alone version, but in actual production, in order to ensure high availability and high performance, we generally use the cluster mode, so next, we continue to explain the construction and basic operations of the solr cluster

1. Cluster mode

1.1 Fragmentation

Before explaining the solr cluster mode, we must first understand the concept of "sharding".

When the node is expanded from one node to multiple, data storage and synchronization problems also follow. If the data is simply stored in a certain node, the purpose of high availability cannot be achieved. If the data is stored in each node Storage, then it will lead to waste of space, so the concept of sharding appears.

The so-called sharding is to divide the data into multiple parts, each part is a shard, and then store these shards on different nodes to achieve storage expansion. At the same time, because different data are stored on different nodes, in fact Also improves query performance.

And these shards are also divided into primary shards and replica shards. The primary shards are different data, and the replica shards are the backups of the primary shards. Then these shards are distributed to different nodes, so that both Realized data storage and data backup

To achieve high availability, it is required that the same primary and secondary shards cannot be on the same node, otherwise when a node hangs, the secondary shards also hang

1.2 Node governance

When there are more nodes, the call coordination between nodes becomes a problem. Solr does not have its own service governance, so it needs to introduce third-party components. Generally, we use zookeeper as a registration center to manage service scheduling

At the same time, in order to ensure the high availability of the registration center, our zookeeper also needs to be deployed in cluster mode. Some students may have doubts, zk is deployed in cluster mode, then who will coordinate the multi-nodes of zk, this is of course zk itself, and its cluster mode has its own service management function

1.3 Deployment Architecture

With the above basic concepts, let's sort out the solr cluster architecture to be deployed.

First of all, the minimum number of master nodes in the cluster mode is 3. Here, in order to simulate the online environment, 4 master shards are built, and each master shard has 3 replica shards. I use 4 nodes, and you can refer to the server environment for details. to select the number of nodes, but not less than 3

Secondly, zookeeper builds a cluster, the minimum number of nodes is 3, so the deployment architecture is shown in the figure below

insert image description here

2. Build

2.1 Build a zookeeper cluster

The zookeeper cluster can be built in another article: build a zookeeper cluster and set it to start automatically

But note that because I use the solr version 8.2.0, the corresponding version of zookeeper is selected 3.4.14, and there may be connection problems if the version does not correspond. will result in an errorTimeoutException: Could not connect to ZooKeeper

2.2 Building a solr cluster

1. We have built a solr single node before, and copied the node 3 times to the other 3 servers

2. Modify the solr configuration filesolr.xml

vim server/solr/solr.xml 

Content, adjust it to the server solr server ip, if the port is adjusted, you can modify it directly
insert image description here

3. Modify the startup script file solr.in.shand configure the zk address

vim bin/solr.in.sh

content

ZK_HOST="192.168.244.42:2181,192.168.244.43:2181,192.168.244.44:2181"

# Set the ZooKeeper client timeout (for SolrCloud mode)
ZK_CLIENT_TIMEOUT="15000"

4. On the other 3 nodes, adjust the above 2 steps synchronously

5. Restart the four solr nodes

# 如下启动指令是单独配置的,参考专栏第一篇文章
service solr restart

Being able to access solr-admin normally proves that the cluster deployment is successful!
insert image description here

If an error is found hereSolrException: ruok is not executed because it is not in the whitelist. Check 4lw.commands.whitelist setting in zookeeper configuration file

This is because if you want to use the four-word command of zk simply and conveniently without logging in to the zk client, for example, ruok is a command to check whether zk is started, you need to add a white list of commands allowed by zk

conf/zoo.cfgAdd a configuration item in the zookeeper configuration file 4lw.commands.whitelist=stat,ruok,conf,isro, set the specified four-word instruction to be called remotely, if it is set, *it means that all instructions are allowed to be called

insert image description here

After configuration, restart zk, solr, if it is a cluster, remember to modify each zk node

Start normally, you can Cloudcheck the status of the cluster nodes in the menu

insert image description here

6. Because zookeeper is used to manage the cluster, we need to upload the relevant configuration files of solr to zookeeper, and use zookeeper as the configuration center

First upload the configuration file of the orders core we created in the stand-alone solr to one of the solr nodes

scp -r orders [email protected]:/data/solr-8.2.0/server/solr

7. While uploading to zk, solr provides us with a script fileserver/scripts/cloud-scripts/zkcli.sh

The main will be solr.xmlthe configuration file under the core (index) managed-schema,solrconfig.xml

Execute on any solr node:

# 设置solr配置文件路径
sh /data/solr-8.2.0/server/scripts/cloud-scripts/zkcli.sh -zkhost 192.168.244.44:2181,192.168.244.43:2181,192.168.244.42:2181 --cmd upconfig -solrhome /data/solr-8.2.0/server/solr
# 上传核心配置文件目录
sh /data/solr-8.2.0/server/scripts/cloud-scripts/zkcli.sh -zkhost 192.168.244.44:2181,192.168.244.43:2181,192.168.244.42:2181 --cmd upconfig -confdir /data/solr-8.2.0/server/solr/orders -confname orders

If you need to upload other core (index) configuration files in the future, you only need to execute the following instructions

sh /data/solr-8.2.0/server/scripts/cloud-scripts/zkcli.sh -zkhost 192.168.244.44:2181,192.168.244.43:2181,192.168.244.42:2181 --cmd upconfig -confdir /data/solr-8.2.0/server/solr/collection_name -confname collection_name

insert image description here

Connect to zk, and you can also find the corresponding data. The connection here is prettyZooa tool. If you don’t know how to install it, you can check my previous blog:
Install zookeeper visualization tools PrettyZoo, ZooKeeperAssistant

insert image description here
8. Log in to any solr-admin, add a core, the name is consistent with the one uploaded before orders, and because we have 4 nodes, generally set the number of primary shards to be the same as the number of nodes, and cannot exceed the number of nodes, the same primary and secondary shards Not on one node, then there are 4 primary shards in total, and each primary shard has 3 replica shards

Because the solr node defaults maxShardsPerNodeto 1, that is, each node is only allowed to create 1 shard (primary shard or secondary shard), which obviously does not meet our above-mentioned architecture, and each node needs to create 1 primary shard and 3 copies Sharding, so a node needs to create 4 shards, then we need to maxShardsPerNodeadjust to 4

insert image description here

The core created after saving will be synchronized to other nodes

CollectionsThe fragmentation can be viewed in

insert image description here

9. Perform full synchronization. If you are not familiar with synchronization operations, you can check the previous articles in the column

insert image description here

10. Query data and find that the data query is successful

insert image description here

Summarize

Since then, we have explained the construction of the solr cluster, core creation, and data synchronization. At the same time, what needs to be changed is the code when our client connects. It needs to be adjusted to the cluster mode, that is, to connect through zk

Guess you like

Origin blog.csdn.net/qq_24950043/article/details/131374754