An online accident, I realized the essence of MongoDB


Hello everyone, I am Nezha. Recently, the project is using MongoDB as the storage database for pictures and documents. Why not directly store them in MySQL, but also set up a MongoDB cluster, is it troublesome?

Let's find out together, continue to learn the theory and practice of MongoDB sharding , achieve a quick start, enrich your resume, improve your interview level, increase your conversation resources, and become an interview expert in seconds. BAT is not a dream.

MongoDB refuses connection? Obviously the MongoDB service is down again.

insert image description here

Connect to the MongoDB server to find out.

By ps -aef|grep mongochecking whether the mongo service is still there? As expected, they are gone.

insert image description here

Most likely it is because the disk is full.

df -THCheck disk space.

insert image description here

How to solve the disk 100% problem?

cd to the log directory, rm -rf *delete all logs, and then restart MongoDB.

Mongodb startup exception: about to fork child process, waiting until server is ready for connection

insert image description here

Since MongoDB is deployed in a cluster, data synchronization will be performed when it starts, which may be time-consuming. I am impatient, how can I bear it, just press Ctrl C, force stop, and then restart.

By ps -aef|grep mongolooking at the process, two identical processes are listed.

By ps -aef|grep mongo | grep -v grep | awk '{print $2}' | xargs kill -9force stopping all mongo processes.

insert image description here

Delete the mongod.lock and diagnostic.data files in the data directory, restart MongoDB, and start the script mongos_start.sh (mongod --config data/mongodb.conf), which is a perfect solution.

insert image description here

What is the meaning of the deployment directory of the MongoDB server? What is the relationship between them? The following briefly introduces the fragmentation of MongoDB.

insert image description here

1. What is MongoDB sharding?

Sharding refers to the process of splitting data across machines, also known as partitioning.

MongoDB supports manual partitioning. Using this method, the application maintains connections to multiple different database servers, each server is completely independent. The application not only manages the storage of different data on different servers, but also manages querying the data on the appropriate server. But it is difficult to maintain when nodes are added or removed from the cluster, or in the face of changes in data distribution or load patterns.

MongoDB supports automatic sharding, which attempts to abstract the database schema from the application and simplify system administration. MongoDB automatically balances data across shards, making it easier to add and remove nodes.

MongoDB's sharding mechanism allows you to create a cluster of many shards and spread the data in the collection across the cluster, placing a subset of the data on each shard. This allows applications to exceed the resource limits of a stand-alone server or replica set.

A cluster composed of shards is like a stand-alone server to the application. One or more routing processes called mongos run before sharding. Mongos maintains a "directory" that specifies which data each shard contains. Applications can connect to this routing server and make requests normally. The routing server knows which data is on which shard and can forward requests to the appropriate shard. If there are responses to the request, the reason server collects them, merges them, and returns them to the application, as far as the application knows it is connected to a single mongod.

insert image description here

2. How does MongoDB fragment?

Quickly set up a cluster on a single machine. First, launch with the --nodband --norcoptions mongo shell: mongo --nodb --norc.

Create a cluster using the ShardingTest class. Run the following code:

st = ShardingTest({
    
    
	name:"one-min-shards",
	chunkSize:1,
	shards:2,
	rs:{
    
    
		nodes:3,
		oplogSize:10
	},
	other:{
    
    
		enableBalancer:true
	}
});
  • name: the label of the shard cluster;
  • shards: specifies that the cluster consists of two shards;
  • rs: define each shard as a replica set of 3 nodes;
  • enableBalancer: enable the balancer after the cluster starts;

ShardingTest is designed to support server-side test suites. It provides a lot of convenience in keeping resource usage as low as possible and building sharding clusters with relatively complex architecture. When ShardingTest is run, it creates a cluster with two shards, each of which is a replica set. At the same time, the replica set is configured and each node is started with the necessary options to establish a replication agreement. It starts a mongos to manage requests across shards so that clients can interact with the cluster as if they were communicating with a standalone mongod. Finally, it starts an additional replica set for the config server used to maintain the justification table confidence to ensure queries are directed to the correct shard.

The primary use case for sharding is to split datasets to address hardware and cost constraints, or to provide better performance to applications.

When ShardingTest finishes setting up the cluster, it will have 10 processes up and running that you can connect to: two replica sets (3 nodes each), a config server replica set (3 nodes), and a mongos. By default, these processes will start on port 20000. mongos will run on port 20009.

3. When to shard?

Typically, sharding is used for:

  • increase available RAM;
  • increase available disk space;
  • Reduce server load;
  • Handle throughput that cannot be sustained by a single MongoDB;

insert image description here

4. Build a MongoDB shard server

1. Configure the server config process

The config server is the brain of the cluster, holding all the metadata about what data each server contains, so the config server must be created first. It is very important to configure the server, and the runtime must enable journaling and ensure that its data is stored on a non-transitory drive.

The configuration server must be started before any mongos process mongod -f config.conf, because mongos needs to extract configuration information from the configuration server.

When writing to the configuration server, MongoDB will use the "majority" writeConcern level;
when reading from the configuration server, MongoDB will use the "majority" readConcern level;

This ensures that the sharded cluster metadata is only committed to the config server replica set without a rollback. It also ensures that only those metadata that are not affected by a config server failure are read. This ensures that all mongos routing nodes have consistency on how data is organized in the sharded cluster.

In terms of server resources, the configuration server should have sufficient network and CPU resources. The configuration server only saves the directory of the data in the cluster, so only a few hard disk storage resources are required.

Due to the importance of the configuration server, the data of the configuration server should be backed up before any cluster maintenance.

2. mongos process

mongos is the routing server and is used for application connections. By mongod -f config.confstarting the routing server, the mongos process needs to know the address of the configuration server, so it needs to be configured in config.conf, configdb=configReplSet/配置服务器的三个地址and the log of MongoDB can be saved by configuring logpath.

A certain number of mongos processes should be started and placed as close to all shards as possible, which can improve query performance.

3. Convert the replica set to shards

After starting the configuration server and the routing server in sequence, you can add fragments. If a replica set already exists before, this replica set will become the first fragment.

Starting from MongoDB 3.4, for sharded clusters, the sharded mongod instance must be configured with the --shardsvr option, which is added in config.conf. In shardsvr=truethe process of converting a replica set to a shard, each member of the replica set needs to be configured Repeat the above actions.

After adding the replica set to the cluster as a shard, you can change the connection of the application from the replica set to the mongos routing server, and cut off the direct connection between the application and the shard by setting a firewall.

4. Data fragmentation

(1) How to shard data

Suppose you have a test database and shard the worker collection on the name key.

  1. Shard the database first, > sh.enableSharding("test");
  2. Then partition the collection, sh.shardCollection("test.worker",{"name":1});

If the worker collection already exists, there must be an index on the name field, otherwise, shardCollection will return an error. If the sharded collection does not exist, mongos will automatically create an index on the name shard key.

The shardCollection command will split the collection into multiple data blocks, and MongoDB will evenly scatter the data in the collection among the shards in the cluster.

insert image description here

5. How does MongoDB track cluster data?

1. Data block

Because of the huge amount of data in MongoDB, MongoDB generally groups documents in the form of data blocks. These data blocks are documents within the specified range of the slice key. MongoDB generally uses a smaller table to maintain the relationship between data blocks and shards. mapping relationship.

requires attention:

  1. Blocks cannot overlap;
  2. When the number of documents in a block is too large, it will be automatically split into two documents;
  3. A document always belongs to one and only one block;

2. Block range

  1. There is only one block in the collection of the new fragment, and the boundary of the block is from negative infinity to positive infinity;
  2. As the chunk grows, MongoDB will automatically split it into two chunks ranging from negative infinity to value and value to positive infinity. Blocks with a smaller range contain values ​​smaller than value, and blocks with a larger range contain value and values ​​larger than value;

Therefore, mongos can easily find which block the document is in.

3. Split blocks

The master node mongod process of each shard keeps track of their current blocks, and once a certain threshold is reached, it checks whether the block needs to be split, and if so, mongod requests the global block size configuration value from the configuration server, and then Perform chunk splits and update metadata on the config server. The config server creates new chunk documents and modifies the scope of old chunks.

When a client writes a block, mongod checks the split threshold for that block.

insert image description here

If the split threshold has been reached, mongod sends a request to the balancer to migrate the topmost chunk, otherwise the chunk stays on the shard.

insert image description here

Because two documents with the same shard key are bound to be in the same chunk, splits can only be made between documents with different shard key values.

If the following documents are fragmented by readTime, it is possible.

However, if I read books faster and finish reading all the books within a month, the readTime will be the same, and fragmentation will not be possible.

Therefore, it is especially important to have different shard key values ​​when sharding.

{"name":"哪吒编程","book":"Java核心技术","readTime":"October"}
{"name":"哪吒编程","book":"Java编程思想","readTime":"October"}
{"name":"哪吒编程","book":"深入理解Java虚拟机","readTime":"October"}
{"name":"哪吒编程","book":"effective java","readTime":"November"}
{"name":"哪吒编程","book":"重构 改善既有代码的设计","readTime":"November"}
{"name":"哪吒编程","book":"高性能MySQL","readTime":"December"}
{"name":"哪吒编程","book":"Spring技术内幕","readTime":"December"}
{"name":"哪吒编程","book":"重学Java设计模式","readTime":"December"}
{"name":"哪吒编程","book":"深入理解高并发编程","readTime":"January"}
{"name":"哪吒编程","book":"Redis设计与实现","readTime":"January"}

A prerequisite for sharding is that all config servers must be up and reachable. If mongod keeps getting write requests for a block, it will keep trying and failing to split that block, and these split attempts will slow mongod down. The process of mongod repeatedly trying to shard without success is called a split storm .

insert image description here

6. Equalizer

The balancer is responsible for data migration. The balancer periodically checks for imbalances between shards, and if so, blocks are migrated. On MongoDB 3.4+, the balancer resides on the primary member of the config server replica set.

The balancer is a background process on the master of the config server replica set that monitors the number of blocks on each shard. The balancer is activated only when the number of blocks on a shard reaches a certain migration threshold.


insert image description here

Summary of Java learning route, brick movers attack Java architects

Summary of 100,000 words and 208 Java classic interview questions (with answers)

Java Basic Tutorial Series

Java High Concurrency Programming Series

Database Advanced Combat Series

Guess you like

Origin blog.csdn.net/guorui_java/article/details/128424399