The MongoDB is as pure in-memory database to use (Redis style)

The basic idea

MongoDB will be used as in-memory database (in-memory database), that is, do not let this MongoDB save data to disk usage, causing more and more people's interest. This usage for the following applications in terms of ultra-practical:

  • Slow RDBMS systems placed before write-intensive cache
  • Embedded Systems
  • PCI-compliant systems without the need to persist data
  • You need a lightweight database and library data can be easily removed unit testing (unit testing)

If all this can be achieved it was so elegant: we will be able to skillfully do not involve query / retrieval functions use MongoDB case of disk operations. You may also know that in 99% of cases, disk IO (especially random IO) is the bottleneck of the system, and, if you want to write the data, disk operations can not be avoided.

MongoDB has a very cool design decision that she can use memory mapping file (memory-mapped file) to process the data on the disk file read and write requests. That is to say, MongoDB does for both RAM and disk were treated differently, just as a huge array of file, and then follow the bytes access the data, and the rest were handed over to the operating system (OS) to deal with! This is a design decision that makes MongoDB can be run without modification on in RAM.

fbm
fbm
Translation at 2013/05/04 01:16
 top
3
 
 

Implementation

All this is by using a special type of file system called tmpfs implementation. It looks the same regular file system (FS) as in Linux, but it is entirely in RAM (unless its size exceeds the size of the RAM, in which case it can swap, this is very useful!). My server has 32GB of RAM, let's create a tmpfs 16GB of:

# mkdir /ramdata
# mount -t tmpfs -o size=16000M tmpfs /ramdata/
# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/xvde1 5905712 4973924 871792 86% / none 15344936 0 15344936 0% /dev/shm tmpfs 16384000 0 16384000 0% /ramdata 

Then use the appropriate settings to start MongoDB. To reduce the wasted amount of RAM, should smallfiles and noprealloc set to true. Now that is a RAM-based, so utterly without sacrificing performance. At this time, then use the journal would be meaningless, so it should be put nojournal set to true.

dbpath=/ramdata
nojournal = true
smallFiles = true
noprealloc = true 

After the start MongoDB, you will find her very well run, the file system files also appeared As expected:

# mongo
MongoDB shell version: 2.3.2
connecting to: test
> db.test.insert({a:1})
> db.test.find() { "_id" : ObjectId("51802115eafa5d80b5d2c145"), "a" : 1 }  # ls -l /ramdata/ total 65684 -rw-------. 1 root root 16777216 Apr 30 15:52 local.0 -rw-------. 1 root root 16777216 Apr 30 15:52 local.ns -rwxr-xr-x. 1 root root 5 Apr 30 15:52 mongod.lock -rw-------. 1 root root 16777216 Apr 30 15:52 test.0 -rw-------. 1 root root 16777216 Apr 30 15:52 test.ns drwxr-xr-x. 2 root root 40 Apr 30 15:52 _tmp 

Now let's add some data confirmed what its fully functional. We first create a document 1KB, and then add it to the MongoDB 4 million:

> str = ""

> aaa = "aaaaaaaaaa"
aaaaaaaaaa
> for (var i = 0; i < 100; ++i) { str += aaa; } > for (var i = 0; i < 4000000; ++i) { db.foo.insert({a: Math.random(), s: str});} > db.foo.stats() { "ns" : "test.foo", "count" : 4000000, "size" : 4544000160, "avgObjSize" : 1136.00004, "storageSize" : 5030768544, "numExtents" : 26, "nindexes" : 1, "lastExtentSize" : 536600560, "paddingFactor" : 1, "systemFlags" : 1, "userFlags" : 0, "totalIndexSize" : 129794000, "indexSizes" : { "_id_" : 129794000 }, "ok" : 1 } 
fbm
fbm
Translation at 2013/05/04 01:36
 top
3
 
 
As can be seen, wherein the average size of the document is 1136 bytes, the data space occupy a total of 5GB. _Id index size is over 130MB. Now we need to verify a very important thing: there is no data in RAM repeat, is not each saved in a file system and MongoDB? Remember MongoDB does not cache any data in the process of her own, her only cache data into the cache file system. Then we look to clear the cache file system, and then look at what data RAM in there:
# echo 3 > /proc/sys/vm/drop_caches 
# free
             total       used       free     shared    buffers     cached
Mem:      30689876    6292780   24397096          0       1044    5817368
-/+ buffers/cache:     474368   30215508
Swap:            0          0          0

It can be seen in the RAM 6.3GB been used, there 5.8GB for the file system cache (buffer, buffer). Why even after clearing all caches, the system still has 5.8GB file system cache? ? The reason is that, Linux is very smart, she does not save the duplicate data in tmpfs and cache. awesome! This means that you only copy of the data in RAM. Here we look at access to all of the document, and verify, RAM usage will not change:

> db.foo.find().itcount()
4000000

# free
             total       used       free     shared    buffers     cached
Mem:      30689876    6327988   24361888          0       1324    5818012
-/+ buffers/cache:     508652   30181224
Swap:            0          0          0
# ls -l /ramdata/
total 5808780
-rw-------. 1 root root  16777216 Apr 30 15:52 local.0
-rw-------. 1 root root  16777216 Apr 30 15:52 local.ns
-rwxr-xr-x. 1 root root         5 Apr 30 15:52 mongod.lock
-rw-------. 1 root root  16777216 Apr 30 16:00 test.0
-rw-------. 1 root root 33554432 Apr 30 16:00 test.1 -rw-------. 1 root root 536608768 Apr 30 16:02 test.10 -rw-------. 1 root root 536608768 Apr 30 16:03 test.11 -rw-------. 1 root root 536608768 Apr 30 16:03 test.12 -rw-------. 1 root root 536608768 Apr 30 16:04 test.13 -rw-------. 1 root root 536608768 Apr 30 16:04 test.14 -rw-------. 1 root root 67108864 Apr 30 16:00 test.2 -rw-------. 1 root root 134217728 Apr 30 16:00 test.3 -rw-------. 1 root root 268435456 Apr 30 16:00 test.4 -rw-------. 1 root root 536608768 Apr 30 16:01 test.5 -rw-------. 1 root root 536608768 Apr 30 16:01 test.6 -rw-------. 1 root root 536608768 Apr 30 16:04 test.7 -rw-------. 1 root root 536608768 Apr 30 16:03 test.8 -rw-------. 1 root root 536608768 Apr 30 16:02 test.9 -rw-------. 1 root root 16777216 Apr 30 15:52 test.ns drwxr-xr-x. 2 root root 40 Apr 30 16:04 _tmp # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/xvde1 5905712 4973960 871756 86% / none 15344936 0 15344936 0% /dev/shm tmpfs 16384000 5808780 10575220 36% /ramdata 

as predicted! :)

fbm
fbm
Translation at 2013/05/04 01:57
 top
3
 
 

Copy (replication) of it?

Since the server RAM, data will be lost upon reboot, so you might want to copy. Standard replica set (replica set) can be obtained automatically failover (failover), but also can improve the ability to read data (read capacity). If you have to restart the server, it can be from the same replica set another server to read the data in order to reconstruct their own data (resynchronization, resync). Even in the case where a large amount of data and index, this process will be fast enough, because the index operations are performed in RAM :)

It is important, it is to write operation writes a special called oplog Collection, which is located in local in the database. By default, the size of which is 5% of the total amount of data. In my case, oplog will occupy 16GB of 5%, which is 800MB of space. In the case of doubt, the safer approach is to use oplogSize this option to choose a fixed size oplog. If the alternate server downtime than oplog capacity, it must be re-synchronized. It should set the size of 1GB, you can be:

oplogSize = 1000

 

fbm
fbm
Translation at 2013/05/04 02:22
 top
2
 
 

Fragmentation (sharding) it?

Now that you have all the MongoDB query function, then use it to implement a large-scale service to how to get? You may want to use slices to achieve a large expandable memory database. Configuration server (save the data block allocation) was also used adopt disk-based solutions, because these small number of active servers, clusters rebuilt from scratch old is no fun.

Precautions

RAM is a scarce resource, but in this case you will want the entire data set can be placed in RAM. Although tmpfs exchange ability by means of a magnetic disk (swapping, or in), but it will be very significant performance degradation. In order to make full use of RAM, you should consider:

  • Use usePowerOf2Sizes option to standardize storage bucket
  • Regularly run compact command or the nodes resynchronize (resync)
  • schema is designed to be fairly standardized (in order to avoid a large number of relatively large document appears)

in conclusion

Baby, you are now able to be used as a memory database MongoDB, and it can use all of her features! Performance Well, it should be quite amazing: I tested in the case of a single thread / core, up to 20K per second write speed, and the number of cores increase the number of times the write speed will increase again.

Guess you like

Origin www.cnblogs.com/ExMan/p/10951552.html