Introduction to MongoDB Sharding
Sharding is a method used by MongoDB to split a large collection into different servers (or a cluster). Although sharding originated from relational database partitioning, MongoDB sharding is another matter entirely.
Compared with the MySQL partitioning scheme, the biggest difference of MongoDB is that it can do almost everything automatically. As long as you tell MongoDB to allocate data, it can automatically maintain the balance of data among different servers.
The purpose of sharding
Database applications with high data volume and throughput will put a lot of pressure on the performance of a single machine. A large query volume will exhaust the CPU of a single machine. A large amount of data puts a greater pressure on the storage of a single machine, which will eventually exhaust the memory of the system. And shift the pressure to disk IO.
In order to solve these problems, there are two basic methods: vertical expansion and horizontal expansion.
- Vertical expansion: Add more CPU and storage resources to expand capacity.
- Horizontal expansion: Distribute the data set on multiple servers. Horizontal expansion is sharding.
Several basic concepts of MongoDB
Various concepts from small to large;
- Shard key: a field in the document
- Document doc: a row of data containing shard key
- Chunk: contains n documents
- Shard: contains n chunks
- Cluster cluster: contains n shards
Let me focus on Chunk. Within a shard server, MongoDB still divides the data into chunks, and each chunk represents a part of the data inside the shard server. The generation of chunks has the following two purposes:
- Splitting: When the size of a chunk exceeds the chunk size in the configuration, the background process of MongoDB will split the chunk into smaller chunks to avoid excessive chunks
- Balancing: In MongoDB, the balancer is a background process that is responsible for the migration of chunks, thereby balancing the load of each shard server. The system initially has 1 chunk, and the default value of chunk size is 64M. It is best to choose a chunk size suitable for the business on the production database. . MongoDB will automatically split and migrate chunks.
Sharded cluster architecture
Component | Description |
---|---|
Config Server | Stores data routing information for all nodes and shards in the cluster. Three Config Server nodes need to be configured by default. |
Mongos | Provide access to external applications, all operations are performed through mongos. Generally there are multiple mongos nodes. Data migration and automatic data balance. |
Mongod | Store application data records. Generally, there are multiple Mongod nodes to achieve the purpose of data fragmentation. |
The official architecture diagram is as follows:
Configuration process
planning:
10 instances, port number: 38017-38026
- configserver:
3 replication sets (1 master and two slaves, arbiter is not supported) 38018-38020 (replication set name configsvr)
Shard node: sh1: 38021-23 (1 master and two slaves, one of the nodes is arbiter, replica set name sh1)
sh2: 38024-26 (1 master and two slaves, one of the nodes is arbiter, replica set name sh2)-
router (mongos) node,
a router node: 38017Shard replication set configuration:
- Directory creation
mkdir -p /mongodb/38021/conf /mongodb/38021/log /mongodb/38021/data mkdir -p /mongodb/38022/conf /mongodb/38022/log /mongodb/38022/data mkdir -p /mongodb/38023/conf /mongodb/38023/log /mongodb/38023/data mkdir -p /mongodb/38024/conf /mongodb/38024/log /mongodb/38024/data mkdir -p /mongodb/38025/conf /mongodb/38025/log /mongodb/38025/data mkdir -p /mongodb/38026/conf /mongodb/38026/log /mongodb/38026/data
-
Modify the configuration file
sh1:cat > /mongodb/38021/conf/mongodb.conf<<EOF systemLog: destination: file path: /mongodb/38021/log/mongodb.log logAppend: true storage: journal: enabled: true dbPath: /mongodb/38021/data directoryPerDB: true #engine: wiredTiger wiredTiger: engineConfig: cacheSizeGB: 1 directoryForIndexes: true collectionConfig: blockCompressor: zlib indexConfig: prefixCompression: true net: bindIp: 11.111.24.4,127.0.0.1 port: 38021 replication: oplogSizeMB: 2048 replSetName: sh1 #replica set名称 sharding: clusterRole: shardsvr #固定写法 processManagement: fork: true EOF cp /mongodb/38021/conf/mongodb.conf /mongodb/38022/conf/ cp /mongodb/38021/conf/mongodb.conf /mongodb/38023/conf/ sed 's#38021#38022#g' /mongodb/38022/conf/mongodb.conf -i sed 's#38021#38023#g' /mongodb/38023/conf/mongodb.conf -i
sh2:
cat > /mongodb/38024/conf/mongodb.conf<<EOF systemLog: destination: file path: /mongodb/38024/log/mongodb.log logAppend: true storage: journal: enabled: true dbPath: /mongodb/38024/data directoryPerDB: true wiredTiger: engineConfig: cacheSizeGB: 1 directoryForIndexes: true collectionConfig: blockCompressor: zlib indexConfig: prefixCompression: true net: bindIp: 11.111.24.4,127.0.0.1 port: 38024 replication: oplogSizeMB: 2048 replSetName: sh2 sharding: clusterRole: shardsvr processManagement: fork: true EOF cp /mongodb/38024/conf/mongodb.conf /mongodb/38025/conf/ cp /mongodb/38024/conf/mongodb.conf /mongodb/38026/conf/ sed 's#38024#38025#g' /mongodb/38025/conf/mongodb.conf -i sed 's#38024#38026#g' /mongodb/38026/conf/mongodb.conf -i
-
Start all nodes and build a replication set:
#启动节点 mongod -f /mongodb/38021/conf/mongodb.conf mongod -f /mongodb/38022/conf/mongodb.conf mongod -f /mongodb/38023/conf/mongodb.conf mongod -f /mongodb/38024/conf/mongodb.conf mongod -f /mongodb/38025/conf/mongodb.conf mongod -f /mongodb/38026/conf/mongodb.conf #配置复制集sh1 mongo --port 38021 admin config = {_id: 'sh1', members: [ {_id: 0, host: '11.111.24.4:38021'}, {_id: 1, host: '11.111.24.4:38022'}, {_id: 2, host: '11.111.24.4:38023',"arbiterOnly":true}] } rs.initiate(config) #配置复制集sh2 mongo --port 38024 admin config = {_id: 'sh2', members: [ {_id: 0, host: '11.111.24.4:38024'}, {_id: 1, host: '11.111.24.4:38025'}, {_id: 2, host: '11.111.24.4:38026',"arbiterOnly":true}] } rs.initiate(config)
config node configuration:
- Directory creation
mkdir -p /mongodb/38018/conf /mongodb/38018/log /mongodb/38018/data mkdir -p /mongodb/38019/conf /mongodb/38019/log /mongodb/38019/data mkdir -p /mongodb/38020/conf /mongodb/38020/log /mongodb/38020/data
-
Modify the configuration file
cat > /mongodb/38018/conf/mongodb.conf <<EOF systemLog: destination: file path: /mongodb/38018/log/mongodb.conf logAppend: true storage: journal: enabled: true dbPath: /mongodb/38018/data directoryPerDB: true #engine: wiredTiger wiredTiger: engineConfig: cacheSizeGB: 1 directoryForIndexes: true collectionConfig: blockCompressor: zlib indexConfig: prefixCompression: true net: bindIp: 11.111.24.4,127.0.0.1 port: 38018 replication: oplogSizeMB: 2048 replSetName: configReplSet sharding: clusterRole: configsvr #固定写法 processManagement: fork: true EOF cp /mongodb/38018/conf/mongodb.conf /mongodb/38019/conf/ cp /mongodb/38018/conf/mongodb.conf /mongodb/38020/conf/ sed 's#38018#38019#g' /mongodb/38019/conf/mongodb.conf -i sed 's#38018#38020#g' /mongodb/38020/conf/mongodb.conf -i
-
Start the node and configure the replication set
mongod -f /mongodb/38018/conf/mongodb.conf mongod -f /mongodb/38019/conf/mongodb.conf mongod -f /mongodb/38020/conf/mongodb.conf mongo --port 38018 admin config = {_id: 'configReplSet', members: [ {_id: 0, host: '11.111.24.4:38018'}, {_id: 1, host: '11.111.24.4:38019'}, {_id: 2, host: '11.111.24.4:38020'}] } rs.initiate(config)
mongos node configuration
- Create a directory
mkdir -p /mongodb/38017/conf /mongodb/38017/log
- Configuration file
cat >/mongodb/38017/conf/mongos.conf<<EOF systemLog: destination: file path: /mongodb/38017/log/mongos.log logAppend: true net: bindIp: 11.111.24.4,127.0.0.1 port: 38017 sharding: configDB: configReplSet/11.111.24.4:38018,11.111.24.4:38019,11.111.24.4:38020 processManagement: fork: true EOF
- Start mongos
mongos -f /mongodb/38017/conf/mongos.conf
It is recommended to use multiple routers in production to prevent single point problems. The router configuration of all nodes is the same
Sharded cluster operation
Connect to one of the mongos (11.111.24.4), do the following configuration
(1) connect to the admin database of mongs
# su - mongod
$ mongo 11.111.24.4:38017/admin
(2) Add fragments
db.runCommand( { addshard : "sh1/11.111.24.4:38021,11.111.24.4:38022,11.111.24.4:38023",name:"shard1"} )
db.runCommand( { addshard : "sh2/11.111.24.4:38024,11.111.24.4:38025,11.111.24.4:38026",name:"shard2"} )
(3) List shards
mongos> db.runCommand( { listshards : 1 } )
(4) View the overall status
mongos> sh.status();
At this point, the MongoDB Sharding Cluster configuration is complete
Use sharded cluster
RANGE fragmentation configuration and testing
Manually shard the vast table under the test library
1. Activate the database sharding function
mongo --port 38017 admin
admin> ( { enablesharding : "数据库名称" } )
eg:
admin> db.runCommand( { enablesharding : "test" } )
2. Specify the shard to build a pair of collection shard
eg:范围片键
--创建索引
use test
> db.vast.ensureIndex( { id: 1 } )
--开启分片
use admin
> db.runCommand( { shardcollection : "test.vast",key : {id: 1} } )
3. Set shard verification
admin> use test
test> for(i=1;i<1000000;i++){ db.vast.insert({"id":i,"name":"shenzheng","age":70,"date":new Date()}); }
test> db.vast.stats()
4. Fragmentation result test
shard1:
mongo --port 38021
db.vast.count();
shard2:
mongo --port 38024
db.vast.count();
Hash fragmentation example:
Hash the vast table under the test2 library to
create a hash index
(1) Turn on the sharding function for test2
mongo --port 38017 admin
use admin
admin> db.runCommand( { enablesharding : "test2" } )
(2) Create a hash index for the vast table under the test2 library
use test2
test2> db.vast.ensureIndex( { id: "hashed" } )
(3) Open fragmentation
use admin
admin > sh.shardCollection( "test2.vast", { id: "hashed" } )
(4) Enter 10w lines of data to test
use test2
for(i=1;i<100000;i++){ db.vast.insert({"id":i,"name":"shenzheng","age":70,"date":new Date()}); }
(5) Hash fragmentation result test
mongo --port 38021
use test2
db.vast.count();
mongo --port 38024
use test2
db.vast.count();
Sharding operation
-
Determine if Shard cluster
admin> db.runCommand({ isdbgrid: 1}) -
List all
shard information admin> db.runCommand({ listshards: 1}) -
List the databases with fragmentation enabled
admin> use config
config> db.databases.find( {"partitioned": true})
or:
config> db.databases.find() //List all database fragmentation -
查看分片的片键
config> db.collections.find().pretty()
{
"_id" : "test.vast",
"lastmodEpoch" : ObjectId("58a599f19c898bbfb818b63c"),
"lastmod" : ISODate("1970-02-19T17:02:47.296Z"),
"dropped" : false,
"key" : {
"id" : 1
},
"unique" : false
} -
View the detailed information of
shards admin> db.printShardingStatus()
or
admin> sh.status() ***** - Delete
shard node (caution) (1) Confirm whether balance is working
sh.getBalancerState()
(2) Delete shard2 node (caution)
mongos> db.runCommand( {removeShard: "shard2"})
Note: The delete operation must be immediate Trigger the blancer.
Reference: https://www.cnblogs.com/duanxz/p/10730121.html
Official document: https://docs.mongodb.com/manual/sharding/