Practical skills to improve the efficiency of MongoDB operation and maintenance

1. Introduction to MongoDB Cluster

MongoDB is a database based on distributed file storage, which aims to provide scalable high-performance data storage solutions for WEB applications. The most common cluster solution will be introduced below with 3 machines. For a detailed introduction, you can view the official website https://docs.mongodb.com/v3.4/introduction/.

1. Introduction to Cluster Components

  • mongos (routing processing): As the request entry of Client and MongoDB cluster, all user requests will be coordinated through Mongos, which will send the data request to the corresponding Shard (mongod) server, and then merge the data and send it back to the user.

  • config server (configuration node): namely: configuration server; mainly saves the metadata of the database, including data distribution (fragmentation) and data structure. After receiving the request from the client, mongos will load the configuration information from the config server and cache it in in memory. Generally, more than one config server will be configured in the production environment, because the metadata it saves is extremely important, and if it is damaged, it will affect the operation of the entire cluster.

  • Shard (shard instance stores data): A shard is a shard. MongoDB uses the sharding mechanism to realize data distribution storage and processing, and achieve the purpose of horizontal expansion. By default, data will be automatically transferred between shards to achieve balance. This action is achieved by a mechanism called a balancer.

  • Replica set (replica set): The replica set realizes the high availability of the database. If the replica set is not made, once the server node storing the data hangs up, the data will be lost. On the contrary, if the replica set is configured, the same data will be saved in In the replica server (replica node), the general replica set includes a master node and multiple replica nodes. If necessary, an arbiter (arbitration node) will be configured as a voting node when the node hangs up.

  • Arbiter (arbitration node): The arbitration server itself does not contain data. It can only detect all replica servers and elect a new master node when the master node fails. This is achieved through the heartbeat between the master node, replica node, and arbitration server (Heart beat) realized.

2. MongoDB application scenarios

  • Website data: suitable for real-time insertion, update and query, and has the replication and high scalability required for real-time data storage on the website.

  • Cache: Due to its high performance, it is also suitable as a cache layer for information infrastructure. After the system is restarted, the persistent cache built can prevent the underlying data source from being overloaded.

  • Large-size, low-value data: It may be more expensive to store some data using traditional relational databases. Before that, many programmers often choose traditional files for storage.

  • High scalability scenarios: very suitable for databases consisting of dozens or hundreds of servers.

  • Used for object and JSON data storage: MongoDB's BSON data format is very suitable for document formatted storage and query.

3. Reasons for choosing MongoDB

The data selected by MongoDB is in BSON data format, which is highly scalable and easy to expand, and the horizontal expansion of data is very simple, supports massive data storage, and has powerful performance.

2. Cluster monitoring

1. Monitoring database to store statistical information

Enter the mongos or shard instance in docker and execute the following command:

docker exec -it mongos bash;
mongo --port 20001;
use admin;
db.auth("root","XXX");

Description: Through this command, you can query related data such as the number of collections and the number of indexes of members of the cluster.

db.stats();

2. View the statistics of the database

Description: Through this command, you can view the number of operations, memory usage, network io, etc.

db.runCommand( { serverStatus: 1 } );

3. Check the status of replica set members

rs.status();

3. Basic operation and maintenance operations

1. Set and view slow queries

# 设置慢查询
db.setProfilingLevel(1,200);
# 查看慢查询级别
db.getProfilingLevel();
# 查询慢查询日志,此命令是针对于某一库进行设置
db.system.profile.find({ ns : 'dbName.collectionName'}).limit(10).sort( { ts : -1 } ).pretty();

2. View actions that take a long time to execute

db.currentOp({"active" : true,"secs_running" : { "$gt" : 2000 }});

3. Dynamically adjust the log level and set the cache size

# 设置日志级别参数
db.adminCommand( { "getParameter": 1, "logLevel":1});
# 设置cache大小参数
db.adminCommand( { "setParameter": 1, "wiredTigerEngineRuntimeConfig": "cache_size=4G"});

4. Add and remove replica set members

# 查看复制集成员
rs.status().members;
# 添加成员
rs.add('127.0.0.1:20001');
# 移除成员
rs.remove('127.0.0.1:20001');

5. Set up database and collection fragmentation

# 在mongos admin库设置库允许分片
sh.enableSharding("dbName");
# 在mongos 的admin库设置集合分片片键
sh.shardCollection("dbName.collectionName", { filedName: 1} );

6. Add and remove shards

# 查看分片状态
sh.status();
# 在mongos执行添加分片(可以为单个实例或复制集)
db.runCommand( { removeShard: "shardName" } );
db.runCommand({addshard:"rs1/ip-1:20001,ip-2:20001,ip-3:20001"});
# 在mongos执行移除分片
db.runCommand( { removeShard: "shard3" } );
# 在mongos执行刷新mongos配置信息
db.runCommand("flushRouterConfig"));

Note : The command to remove a shard must be executed at least twice before it can be successfully deleted. It will not be deleted until the state is completed. Otherwise, the shard will be deleted successfully. The shard is in the state {"draining" : true}. It also affects the subsequent deletion of other shards. You can execute it again when you encounter this state. removeshardIt is best to repeatedly execute the delete command until the state is completed; there is another point that needs attention: it is successfully deleted. If you want to join the cluster again, you must clean up the data directory before joining the cluster again. Otherwise, even if you can join successfully, the data will not be stored, and the collection will not be created. In addition: when deleting a shard, there is There may be an infinite state in the whole process {"draining" : true}, and it will still be the same after a long time, and none of the blocks on the shard have been moved to other shards. The solution is: find the information of the shard in the shard collection of the config database in config, and set The draining field is changed from True to False, and then continue to try to delete." The above sentence will return immediately, and it will actually be executed in the background. During the process of data removal, you must pay attention to the log information of the instance, and there may be data blocks in the process of migration In the process, the boundary conditions were never found, resulting in the data migration being unsuccessful and retrying all the time. The solution is to delete the boundary data and restart the instance. If the shard is the primary shard, the primary shard needs to be migrated first. After the deletion is db.runCommand( { movePrimary: "XXX", to: "other" });complete After that, run the following command on all mongos, and then provide external services, of course, you can also restart all mongos instances.

7. Data import and export

# 导出允许指定导出条件和字段
mongoexport -h 127.0.0.1 --port 20001 -uxxx -pxxx -d xxx -c mobileIndex -o XXX.txt 
mongoimport -h 127.0.0.1 --port 20001 -uxxx -pxxx -d xxx -c mobileIndex --file XXX.txt

4. MongoDB data migration

1. Migrate members in the replica set

  • Close the mongod instance, in order to ensure a safe shutdown, use the shutdown command;

  • Transfer the data directory (ie dbPath) to the new machine;

  • Start mongod on the new machine, where the data directory of the node is the file directory of copy;

  • Connect to the current primary node of the replica set;

If the address of the new node changes, use rs.reconfig() to update the configuration file of the replica set; for example, the following command process will update the second address of the member:

cfg = rs.conf()
cfg.members[2].host = "127.0.0.1:27017"
rs.reconfig(cfg)

Use  rs.conf() to confirm that the new configuration is used. Wait for all members to return to normal, and use to  rs.status() detect member status.

2. Migrate the primary node of the replica set

When migrating the primary node, the replica set is required to elect a new primary node. During the election, the replica set will read and write. Usually, this will only last for a short time, but it should be as small as possible. The master node is migrated within the time period.

  • Downgrade the master node, so that normal failover begins. To downgrade the master node, connect to a master node, use the  replSetStepDownmethod or use rs.stepDown()the method, the following example uses  rs.stepDown()the method to downgrade:

rs.stepDown()
  • After the master node is downgraded to a slave node and another member becomes a member  PRIMARY , the downgraded node can be migrated according to "Migrate a member of the replica set". It can be used  rs.status()to confirm the status change.

3. Restore data from other nodes in the replica set

MongoDB can guarantee highly reliable data storage through the replica set. Generally, it is recommended to use "3-node replica set" in the production environment, so that even if one of the nodes crashes and cannot be started, we can directly clear its data. After restarting, we can use the brand new The Secondary node joins the replication set, or copies the data of other nodes, restarts the node, and it will automatically synchronize the data, thus achieving the purpose of data recovery.

  • Shut down the node that needs data synchronization

docker stop node;  # docker环境中
db.shutdownServer({timeoutSecs: 60}); # 非docker环境
  • Copy the data storage directory (/dbPath) of the target node machine to the specified directory of the current machine.

scp 目标节点 shard/data -> 当前节点 shard/data
  • The current node starts the node with the copied data file

  • Add new nodes to the replica set

# 进入复制集的主节点,执行添加新的节点命令
rs.add("hostNameNew:portNew"); 
# 等待所有成员恢复正常,检测成员状态
rs.status();
# 移除原来的节点
rs.remove("hostNameOld>:portOld"); 

Five, MongoDB online problem scenario solution

1. MongoDB creates a new index that causes the library to be locked

Description of the problem : In order to optimize the business of an online collection of tens of millions, the command to create a new index is directly executed, resulting in the entire library being locked and the application service becoming unavailable.

Solution : Find out the operation process and kill it. Changing to create a new index in the background, the speed will be very slow, but it will not affect the business. The index will only take effect after the new creation is completed;

# 查询运行时间超过200ms操作     
db.currentOp({"active" : true,"secs_running" : { "$gt" : 2000 }}) ;
# 杀死执行时间过长操作操作
db.killOp(opid)
# 后台新建索引
db.collectionNmae.ensureIndex({filedName:1}, {background:true});

2. MongoDB does not limit the memory, causing the instance to exit

Description of the problem : A certain machine in the production environment starts multiple mongod instances, and after running for a period of time, the process is inexplicably killed;

Solution : Now MongoDB uses WiredTiger as the default storage engine, and MongoDB uses both WiredTiger internal cache and file system cache. Since 3.4, the WiredTiger internal cache defaults to using the larger one: 50% (RAM - 1 GB), or 256 MB. For example, on a system with a total of 4GB of RAM, the WiredTiger cache will use 1.5GB of RAM(). Conversely, a system with a total of 1.25 GB of RAM will allocate 256 MB for the WiredTiger cache, since this is more than half of the total RAM minus 1 gigabyte ( ). 0.5 * (4 GB - 1 GB) = 1.5 GB``0.5 * (1.25 GB - 1 GB) = 128 MB < 256 MB. If there are multiple instances of a machine, the operating system will kill some processes in the case of insufficient memory;

# 要调整WiredTiger内部缓存的大小,调节cache规模不需要重启服务,我们可以动态调整:
db.adminCommand( { "setParameter": 1, "wiredTigerEngineRuntimeConfig": "cache_size=xxG"})

3. MongoDB deletes data without releasing disk space

Description of the problem : In the scenario of deleting a large amount of data (the amount of data I operate is 20 million+), and the request volume is large in the production environment, the CPU load of the machine will appear very high at this time, and the machine may even freeze and cannot be operated. Such operations should be carefully divided into batches; after the deletion command is executed, it is found that the data size of the disk has not changed.

solution:

  • Solution 1 : We can use the online data contraction function provided by MongoDB db.collectionName.runCommand("compact")to perform collection-level data contraction through the Compact command, and remove the file fragments where the collection is located. This command is to provide contraction in an online manner, and the contraction will affect online services at the same time. In order to solve this problem, you can first execute the disk defragmentation command on the slave node. After the operation is completed, switch the master node to change the original master node into a slave node and execute the Compact command again.

  • Solution 2 : Use slave node resynchronization, secondary node resynchronization, delete the specified data in the secondary node, and restart the data synchronization with the primary node. Resynchronization can also be used when the replica set member data is too old. Data resynchronization is different from directly copying data files. MongoDB only synchronizes data, so there is no empty collection of data files after resynchronization is completed, thereby realizing the recovery of disk space.

    For some special cases, if the secondary node cannot be offline, you can add a node to the replica set, and then the secondary node will automatically start data synchronization. Generally speaking, the method of resynchronization is better. First, it will basically not block the read and write of the replica set. Second, the time consumed is relatively shorter than the first two.

    1. If it is a primary node, force it to become a secondary node first, otherwise skip this step:rs.stepdown(120);

    2. Then delete the secondary node on the primary:rs.remove("IP:port");

    3. Delete all files under the dbpath of the secondary node

    4. Rejoin the node to the cluster, and then make it automatically synchronize the data:rs.add("IP:port");

    5. After the data synchronization is completed, the steps of cycle 1-4 can release the disk space of all nodes in the cluster

4. MongoDB machine load is extremely high

Description of the problem : This scenario is based on a large customer request. Since the machine where MongoDB is deployed contains one master and one slave, MongoDB makes IO100%, the database is blocked, and a large number of slow queries appear, which leads to extremely high machine load and completely unavailable application services. use.

Solution : In the absence of timely expansion of the machine, the first task is to reduce the IO of the machine. When a machine has one master and one slave, and when a large amount of data is written, it will seize IO resources from each other. So at this time, the high-availability feature of MongoDB is abandoned, and the slave nodes in the replica set are removed to ensure that only one node of each machine can occupy disk resources. After that, the load on the machine dropped immediately, and the service became available normally. However, MongoDB cannot guarantee the integrity of the data at this time. Once the master node hangs up, the data will be lost. This solution is only a temporary solution. The fundamental solution is to increase the memory of the machine, use a solid state drive, or increase the fragmentation set to reduce the read and write pressure of a single machine.

# 进入主节点,执行移除成员的命令
rs.remove("127.0.0.1:20001");
# 注意:切勿直接关停实例

5. Improper selection of MongoDB shard key leads to hot read and hot write

Description of the problem : In the production environment, the shard key of a certain collection uses a method similar to the _id generation method, and the field containing the time series is used as the ascending shard key, which causes the data to be written in one data block. As the amount of data increases, It will cause data to be migrated to the previous partition, resulting in the occupation of system resources and occasional slow queries.

Solution : The temporary solution sets the window for data migration, which is placed in the normal time period, which will affect the business. The fundamental solution is to replace the shard key.

# 连接mongos实例,执行以下命令
db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "23:00", stop : "4:00" } } }, true );
# 查看均衡窗口
sh.getBalancerWindow();

6. MongoDB optimization suggestions

1. Application level optimization

Query optimization : Confirm whether your query fully utilizes the index, use the explain command to check the execution of the query, add necessary indexes, and avoid table scanning operations.

Reasonable design of sharding key : Incremental sharding-key: Suitable for fields that can be divided into ranges, such as integer, float, and date types, and the query is faster. Random sharding-key: It is suitable for scenarios with frequent write operations. In this case, if it is performed on one shard, the load of this shard will be higher than that of other shards, which is not balanced enough. Therefore, it is hoped that hash query keys can be used to distribute writes to multiple shards. Carry out above, consider the compound key as the sharding key, the general principle is to query fast, minimize cross-shard queries, and the number of balance balances is small; a single incremental sharding key may cause all written data to be on the last piece, and the last piece of writing pressure Increase, the amount of data increases, will cause the data to migrate to the previous partition. MongoDB defaults to a single record of 16M, especially when using GFS, you must pay attention to the design of shrading-key. Unreasonable sharding-key will appear, and multiple documents are stored in one chunk. At the same time, because GFS often stores large files, MongoDB cannot use sharding-key to separate these multiple documents when doing balance. To different shards, at this time MongoDB will continue to report errors and eventually lead to MongoDB down. Solution: increase the size of chunks (fix the symptoms) and design a reasonable sharding-key (fix the root cause).

Monitor data through profile : perform optimization to check whether the profile function is currently enabled, and use the command db.getProfilingLevel() to return the level level. The value is 0|1|2, which respectively represent meanings: 0 means off, 1 means recording slow commands, and 2 means all. The command to enable the profile function is  db.setProfilingLevel(level); #level. When the value level is 1, the default value of the slow command is 100ms. If it is changed to db.setProfilingLevel(level,slowms)如db.setProfilingLevel(1,50)this, it will be changed to 50ms by db.system.profile.find() viewing the current monitoring log.

2. Hardware level optimization

2.1 Determine the size of hot data: Your data set may be very large, but this is not so important. What matters is how big your hot data set is and how big your frequently accessed data is (including frequently accessed data and all index data) . Using MongoDB, you'd better ensure that your hot data is under the memory size of your machine, and ensure that the memory can accommodate all hot data; 2.2 Choose the correct file system: MongoDB's data files are pre-allocated and stored in Replication , the non-Arbiter nodes of Master and Replica Sets will pre-create enough empty files to store operation logs. These file allocation operations may be very slow on some file systems, causing the process to be blocked. So we should choose those file systems with fast space allocation. The conclusion here is to try not to use ext3, use ext4 or xfs;

3. Architecture optimization

Let the master and slave nodes be distributed on different machines as much as possible to avoid IO operations on the same machine as MongoDB;

7. Summary

MongoDB has the characteristics of high performance, easy expansion, and easy to use. When used correctly, its performance is still very powerful. In some key points, such as the selection of slice keys, the size of memory, and disk IO, its performance is often limited. The biggest bottleneck. For the shard key, in the early stage of the business system, data sharding of the collection can not be performed first, because once the shard key is determined, it cannot be modified. In the later stage, the fields can be carefully screened according to the situation of the business system.

In general, it is not recommended to use ascending shard keys (a field that grows steadily over time, and the self-increasing primary key is an ascending key), because this will lead to local hot reading and hot writing, and cannot exert the real strength of the sharded cluster . It is recommended to use a hash shard key or a randomly distributed shard key to ensure that the data is evenly distributed among the shard nodes. For memory, it is recommended that the size of the memory include the size of the hot data plus the size of the index to ensure that the memory can hold all the hot data. For disk resources, MongoDB's high-speed read and write is based on disk IO. To ensure its performance, it is recommended to separate master and slave nodes and high-IO applications to ensure that IO resources do not preempt as much as possible.

Original link: https://www.jianshu.com/p/f05f65d3a1dc

Guess you like

Origin blog.csdn.net/LinkSLA/article/details/130236289