MongoDB sharding principle and detailed architecture

What is MongoDB sharding


MongoDB sharding refers to splitting the database into multiple parts and distributing them on different machines, so that more data can be stored and more requests can be processed without a powerful server.

The basic idea of ​​MongoDB sharding is to divide the collection into small pieces, and these pieces are scattered into several shards, and each shard is only responsible for a part of the total data.

For the application, it is not necessary to know which shard corresponds to which data, or even that the data has been sharded. When an application queries data, it only needs to connect a pre-router. This pre-route obtains the target data by querying the configuration server to obtain the target shard where the data resides.

The purpose of MongoDB fragmentation


Database applications with high data volume and throughput will put great pressure on the performance of the stand-alone machine. The large query volume will exhaust the CPU of the stand-alone machine, and the large amount of data will put a greater pressure on the storage of the stand-alone machine, which will eventually exhaust the memory of the system. Instead, shift the pressure to disk IO.

To solve these problems, there are two basic approaches: vertical scaling and horizontal scaling.

Vertical expansion: add more CPU and storage resources to expand capacity.

Horizontal expansion: Distribute the data set on multiple servers, and horizontal expansion is sharding.

Sharding provides a way to handle high throughput and large data volumes. Using shards reduces the number of requests each shard needs to handle, so by scaling horizontally, the cluster can increase its storage capacity and throughput. For example, when inserting a piece of data, the application only needs to access the shard that stores the data.

Using shards reduces the amount of data stored per shard. For example, if the database has a 1TB dataset and has 4 shards, then each shard may only hold 256 GB of data. If there are 40 shards, each shard might only have 25GB of data.

MongoDB sharding architecture


In the MongoDB sharding architecture, there are three roles:

  • Mongos: It is the router mentioned above, which is the module that deals with the client. Mongos itself does not have any data, and it does not know how to process this data, but obtains it through Config Server;

  • Config Server: configuration server, all shard node information and some configuration information of sharding functions are stored in Config Server, which can be understood as metadata of real data;

  • Sh

ard: The real data storage location, stored in Chunk.

Mongos本身并不持久化数据,所有Shard集群的元数据都会存储到Config Server里,而用户的数据会分散存储到各个Shard。Mongos启动后,会从Config Server加载元数据,开始提供服务,将用户的请求正确路由到对应的分片上。

Shard Key


可以说,Shard Key(中文翻译成片键)是MongoDB实现分片的依仗!

MongoDB中数据的分片以集合为基本单位,集合中的数据通过Shard Key被分成多部分。其实Shard Key就是在集合中选了一个键,用该键的值作为数据拆分的依据。

举个例子,假设有个存储人员信息的文档集合,如果选择名字"name"作为Shard Key,那么第一分片可能会存放名字以 A-F 开头的文档。第二分片存 G-P 开头的文档,第三分片存Q-Z的文档。

一个好的Shard Key对分片至关重要。

有一点需要注意,一个自增的Shard Key对写入和数据均匀分布不是很友好,因为自增的Shard Key总会在一个分片上写入,后续达到某个阀值才可能会写到别的分片上。但是反过来讲,按Shard Key查询(读取)会非常高效。

Guess you like

Origin blog.csdn.net/am_Linux/article/details/129677666