MongoDB: Introduction 1

MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built for high read and write throughput and the ability to scale easily with automatic failover.

Document-based data model can represent rich, hierarchical data structures, it’s often possible to do with-
out the complicated multi-table joins imposed by relational databases.

key features

The document data model

Ad hoc queries

Ad hoc queries is to say that it’s not necessary to define in advance what sorts of queries the system will accept.Forinstance, key-value stores are queryable on one axis only: the value’s key. Like many other systems, key-value stores sacrifice rich query power in exchange for a simple scalability model. One of MongoDB’s design goals is to preserve most of the query power that’s been so fundamental to the relational database world.

A SQL query would look like this:

SELECT * FROM posts
INNER JOIN posts_tags ON posts.id = posts_tags.post_id
INNER JOIN tags ON posts_tags.tag_id == tags.id
WHERE tags.text = 'politics' AND posts.vote_count > 10;

The equivalent query in MongoDB is:

db.posts.find({'tags': 'politics', 'vote_count': {'$gt': 10}});

Secondary indexes

Once your data set grows to a certain size, indexes become necessary for query efficiency. Proper indexes will increase query and sort speeds by orders of magnitude;consequently, any system that supports ad hoc queries should also support secondary indexes.

Secondary indexes in MongoDB are implemented as B-trees.With MongoDB, you can create up to 64 indexes per collection.

Replication

MongoDB provides database replication via a topology known as a replica set. Replica sets distribute data across machines for redundancy and automate failover in the event of server and network outages. Additionally, replication is used to scale database reads. If you have a readintensive application, as is commonly the case on the web, it’s possible to spread database reads across machines in the replica set cluster.
Replica sets consist of exactly one primary node and one or more secondary nodes.a replica set’s primary node can accept both reads and writes, but the secondary nodes are read-only. What makes replica sets unique is their support for automated failover: if the primary node fails, the cluster will pick a secondary node and automatically promote it to primary. When the former primary comes back online, it’ll do so as a secondary.

Speed and durability

Write speed can be understood as the volume of inserts, updates,and deletes that a database can process in a given time frame. Durability refers to level of assurance that these write operations have been made permanent.

In MongoDB’s case, users control the speed and durability trade-off by choosing write semantics and deciding whether to enable journaling. All writes, by default, are fire-and-forget, which means that these writes are sent across a TCP socket without requiring a database response. If users want a response, they can issue a write using a special safe mode provided by all drivers. This forces a response, ensuring that the write has been received by the server with no errors. Safe mode is configurable; it can also be used to block until a write has been replicated to some number of servers. For high-volume, low-value data (like clickstreams and logs), fire-and-forget-style writes can be ideal. For important data, a safe-mode setting is preferable.

Scaling

The technique of augmenting a single node’s hardware for scale is known as vertical scaling or scaling up.horizontally, or scaling out is scaling horizontally means distributing the database across multiple machines.

MongoDB has been designed to make horizontal scaling manageable. It does so via a range-based partitioning mechanism, known as auto-sharding, which automatically manages the distribution of data across nodes. The sharding system handles the addition of shard nodes, and it also facilitates automatic failover. Individual shards are made up of a replica set consisting of at least two nodes,4 ensuring automatic recovery with no single point of failure.

The difference between various databases