mongodb --- cluster building: shard + replica set

This article is reproduced from:--Pure Smile--https://yq.aliyun.com/articles/319079?spm=a2c4e.11155435.0.0.7ba57edap00Mg1

Abstract:  mongodb is the most commonly used nodql database, and has risen to the top six in the database ranking. This article introduces how to build a highly available mongodb (shard + replica) cluster. Before building a cluster, you need to understand several concepts: routing, sharding, replica sets, configuration servers, etc.

mongodb is the most commonly used nodql database and has risen to the top six in the database ranking. This article introduces how to build a highly available mongodb (shard + replica) cluster.

Before building a cluster, you need to understand several concepts: routing, sharding, replica sets, configuration servers, etc.

Related concepts

First look at a picture:

sharded-cluster-production-architecture.

As you can see from the figure, there are four components: mongos, config server, shard, and replica set.

mongos, the entry point for database cluster requests, all requests are coordinated through mongos, there is no need to add a router to the application, mongos itself is a request distribution center, which is responsible for forwarding the corresponding data request requests to the corresponding shard server superior. In the production environment, there are usually multiple mongos as the entry point of the request, and there is no way to prevent one of them from hanging up all the mongodb requests.

config server, as the name implies, is a configuration server that stores the configuration of all database metadata (routing, sharding). The mongos itself does not physically store the sharding server and data routing information, it is only cached in memory, and the configuration server actually stores the data. The first time mongos is started or turned off and restarted, the configuration information will be loaded from the config server. If the configuration server information changes in the future, all mongos will be notified to update their status, so that mongos can continue to route accurately. There are usually multiple config servers in a production environment, because it stores the metadata of the shard routing, preventing data loss!

shard, sharding refers to the process of splitting the database and dispersing it on different machines. Spread data across different machines to store more data and handle larger loads without the need for a powerful server. The basic idea is to cut the collection into small pieces, these pieces are scattered into several pieces, each piece is only responsible for a part of the total data, and finally each piece is balanced (data migration) through an equalizer.

Replica set, Chinese translation replica set, is actually the backup of the shard to prevent data loss after the shard hangs up. Replication provides redundant backup of data and stores data copies on multiple servers, which improves data availability and ensures data security.

The Arbiter is a MongoDB instance in the replica set, which does not store data. Arbiter nodes use minimal resources and do not require hardware equipment. Arbiter cannot be deployed in the same dataset node, it can be deployed in other application servers or monitoring servers, or it can be deployed in a separate virtual machine. In order to ensure that there are an odd number of voting members (including primary) in the replica set, it is necessary to add an arbiter node as a vote, otherwise the primary will not be automatically switched when the primary cannot run.

After a brief understanding, we can summarize it like this. The application requests mongos to operate the addition, deletion, modification and query of mongodb, configure the server to store database metadata, and synchronize with mongos. The data is finally stored on the shard (shard), in order to prevent data loss Synchronization stores a copy in the replica set, and the quorum decides which node to store the data to when the data is stored in the shard.

Environmental preparation

System system centos6.5
three servers: 192.168.0.75/84/86
installation package: mongodb-linux-x86_64-3.4.6.tgz

Server Planning

server 75 server 84 server 86
mongos mongos mongos
config server config server config server
shard server1 master node shard server1 secondary node shard server1 quorum
shard server2 quorum shard server2 master node shard server2 secondary node
shard server3 secondary node shard server3 quorum shard server3 主节点

端口分配:

mongos:20000
config:21000
shard1:27001
shard2:27002
shard3:27003

集群搭建

1、安装mongodb

#解压
tar -xzvf mongodb-linux-x86_64-3.4.6.tgz -C /usr/local/
#改名
mv mongodb-linux-x86_64-3.4.6 mongodb

分别在每台机器建立conf、mongos、config、shard1、shard2、shard3六个目录,因为mongos不存储数据,只需要建立日志文件目录即可。

mkdir -p /usr/local/mongodb/conf
mkdir -p /usr/local/mongodb/mongos/log
mkdir -p /usr/local/mongodb/config/data
mkdir -p /usr/local/mongodb/config/log
mkdir -p /usr/local/mongodb/shard1/data
mkdir -p /usr/local/mongodb/shard1/log
mkdir -p /usr/local/mongodb/shard2/data
mkdir -p /usr/local/mongodb/shard2/log
mkdir -p /usr/local/mongodb/shard3/data
mkdir -p /usr/local/mongodb/shard3/log

配置环境变量

vim /etc/profile
# 内容
export MONGODB_HOME=/usr/local/mongodb
export PATH=$MONGODB_HOME/bin:$PATH
# 使立即生效
source /etc/profile

2、config server配置服务器

mongodb3.4以后要求配置服务器也创建副本集,不然集群搭建不成功。

添加配置文件

vi /usr/local/mongodb/conf/config.conf

## 配置文件内容
pidfilepath = /usr/local/mongodb/config/log/configsrv.pid
dbpath = /usr/local/mongodb/config/data
logpath = /usr/local/mongodb/config/log/congigsrv.log
logappend = true
 
bind_ip = 0.0.0.0
port = 21000
fork = true
 
#declare this is a config db of a cluster;
configsvr = true

#副本集名称
replSet=configs
 
#设置最大连接数
maxConns=20000

启动三台服务器的config server

mongod -f /usr/local/mongodb/conf/config.conf

登录任意一台配置服务器,初始化配置副本集

#连接
mongo --port 21000
#config变量
config = {
...    _id : "configs",
...     members : [
...         {_id : 0, host : "192.168.0.75:21000" },
...         {_id : 1, host : "192.168.0.84:21000" },
...         {_id : 2, host : "192.168.0.86:21000" }
...     ]
... }

#初始化副本集
rs.initiate(config)

其中,"_id" : "configs"应与配置文件中配置的 replicaction.replSetName 一致,"members" 中的 "host" 为三个节点的 ip 和 port

3、配置分片副本集(三台机器)

设置第一个分片副本集

配置文件

vi /usr/local/mongodb/conf/shard1.conf

#配置文件内容
#——————————————–
pidfilepath = /usr/local/mongodb/shard1/log/shard1.pid
dbpath = /usr/local/mongodb/shard1/data
logpath = /usr/local/mongodb/shard1/log/shard1.log
logappend = true

bind_ip = 0.0.0.0
port = 27001
fork = true
 
#打开web监控
httpinterface=true
rest=true
 
#副本集名称
replSet=shard1
 
#declare this is a shard db of a cluster;
shardsvr = true
 
#设置最大连接数
maxConns=20000

启动三台服务器的shard1 server

mongod -f /usr/local/mongodb/conf/shard1.conf

登陆任意一台服务器,初始化副本集

mongo --port 27001
#使用admin数据库
use admin
#定义副本集配置,第三个节点的 "arbiterOnly":true 代表其为仲裁节点。
config = {
...    _id : "shard1",
...     members : [
...         {_id : 0, host : "192.168.0.75:27001" },
...         {_id : 1, host : "192.168.0.84:27001" },
...         {_id : 2, host : "192.168.0.86:27001” , arbiterOnly: true }
...     ]
... }
#初始化副本集配置
rs.initiate(config);

设置第二个分片副本集

配置文件

vi /usr/local/mongodb/conf/shard2.conf

#配置文件内容
#——————————————–
pidfilepath = /usr/local/mongodb/shard2/log/shard2.pid
dbpath = /usr/local/mongodb/shard2/data
logpath = /usr/local/mongodb/shard2/log/shard2.log
logappend = true

bind_ip = 0.0.0.0
port = 27002
fork = true
 
#打开web监控
httpinterface=true
rest=true
 
#副本集名称
replSet=shard2
 
#declare this is a shard db of a cluster;
shardsvr = true
 
#设置最大连接数
maxConns=20000

启动三台服务器的shard2 server

mongod -f /usr/local/mongodb/conf/shard2.conf

登陆任意一台服务器,初始化副本集

mongo --port 27002
#使用admin数据库
use admin
#定义副本集配置
config = {
...    _id : "shard2",
...     members : [
...         {_id : 0, host : "192.168.0.75:27002"  , arbiterOnly: true },
...         {_id : 1, host : "192.168.0.84:27002" },
...         {_id : 2, host : "192.168.0.86:27002" }
...     ]
... }

#初始化副本集配置
rs.initiate(config);

设置第三个分片副本集

配置文件

vi /usr/local/mongodb/conf/shard3.conf

#配置文件内容
#——————————————–
pidfilepath = /usr/local/mongodb/shard3/log/shard3.pid
dbpath = /usr/local/mongodb/shard3/data
logpath = /usr/local/mongodb/shard3/log/shard3.log
logappend = true

bind_ip = 0.0.0.0
port = 27003
fork = true
 
#打开web监控
httpinterface=true
rest=true
 
#副本集名称
replSet=shard3
 
#declare this is a shard db of a cluster;
shardsvr = true
 
#设置最大连接数
maxConns=20000

启动三台服务器的shard3 server

mongod -f /usr/local/mongodb/conf/shard3.conf

登陆任意一台服务器,初始化副本集

mongo --port 27003
#使用admin数据库
use admin
#定义副本集配置
config = {
...    _id : "shard3",
...     members : [
...         {_id : 0, host : "192.168.0.75:27003" },
...         {_id : 1, host : "192.168.0.84:27003" , arbiterOnly: true},
...         {_id : 2, host : "192.168.0.86:27003" }
...     ]
... }

#初始化副本集配置
rs.initiate(config);

4、配置路由服务器 mongos

先启动配置服务器和分片服务器,后启动路由实例启动路由实例:(三台机器)

vi /usr/local/mongodb/conf/mongos.conf

#内容
pidfilepath = /usr/local/mongodb/mongos/log/mongos.pid
logpath = /usr/local/mongodb/mongos/log/mongos.log
logappend = true

bind_ip = 0.0.0.0
port = 20000
fork = true

#监听的配置服务器,只能有1个或者3个 configs为配置服务器的副本集名字
configdb = configs/192.168.0.75:21000,192.168.0.84:21000,192.168.0.86:21000
 
#设置最大连接数
maxConns=20000

启动三台服务器的mongos server

mongos -f /usr/local/mongodb/conf/mongos.conf

5、启用分片

目前搭建了mongodb配置服务器、路由服务器,各个分片服务器,不过应用程序连接到mongos路由服务器并不能使用分片机制,还需要在程序里设置分片配置,让分片生效。

登陆任意一台mongos

mongo --port 20000
#使用admin数据库
user  admin
#串联路由服务器与分配副本集
sh.addShard("shard1/192.168.0.75:27001,192.168.0.84:27001,192.168.0.86:27001")
sh.addShard("shard2/192.168.0.75:27002,192.168.0.84:27002,192.168.0.86:27002")
sh.addShard("shard3/192.168.0.75:27003,192.168.0.84:27003,192.168.0.86:27003")
#查看集群状态
sh.status()

6、测试

目前配置服务、路由服务、分片服务、副本集服务都已经串联起来了,但我们的目的是希望插入数据,数据能够自动分片。连接在mongos上,准备让指定的数据库、指定的集合分片生效。

#指定testdb分片生效
db.runCommand( { enablesharding :"testdb"});
#指定数据库里需要分片的集合和片键
db.runCommand( { shardcollection : "testdb.table1",key : {id: 1} } )

我们设置testdb的 table1 表需要分片,根据 id 自动分片到 shard1 ,shard2,shard3 上面去。要这样设置是因为不是所有mongodb 的数据库和表 都需要分片!

测试分片配置结果

mongo  127.0.0.1:20000
#使用testdb
use  testdb;
#插入测试数据
for (var i = 1; i <= 100000; i++)
db.table1.save({id:i,"test1":"testval1"});
#查看分片情况如下,部分无关信息省掉了
db.table1.stats();

{
        "sharded" : true,
        "ns" : "testdb.table1",
        "count" : 100000,
        "numExtents" : 13,
        "size" : 5600000,
        "storageSize" : 22372352,
        "totalIndexSize" : 6213760,
        "indexSizes" : {
                "_id_" : 3335808,
                "id_1" : 2877952
        },
        "avgObjSize" : 56,
        "nindexes" : 2,
        "nchunks" : 3,
        "shards" : {
                "shard1" : {
                        "ns" : "testdb.table1",
                        "count" : 42183,
                        "size" : 0,
                        ...
                        "ok" : 1
                },
                "shard2" : {
                        "ns" : "testdb.table1",
                        "count" : 38937,
                        "size" : 2180472,
                        ...
                        "ok" : 1
                },
                "shard3" : {
                        "ns" : "testdb.table1",
                        "count" :18880,
                        "size" : 3419528,
                        ...
                        "ok" : 1
                }
        },
        "ok" : 1
}

You can see that the data is divided into 3 shards, and the number of shards is: shard1 "count": 42183, shard2 "count": 38937, shard3 "count": 18880. It's been done!

Post operation and maintenance

start off

The startup sequence of mongodb is to start the configuration server first, then start the shard, and finally start mongos.

mongod -f /usr/local/mongodb/conf/config.conf
mongod -f /usr/local/mongodb/conf/shard1.conf
mongod -f /usr/local/mongodb/conf/shard2.conf
mongod -f /usr/local/mongodb/conf/shard3.conf
mongod -f /usr/local/mongodb/conf/mongos.conf

When closing, kill all processes directly

killall mongod
killall mongos

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325748082&siteId=291194637