8 Mainstream NoSQL Database System Features Comparison and Best Application Scenarios

Kristóf Kovács, a software architect and consultant who has worked for several large companies, made a comprehensive comparison of the mainstream NoSQL databases (Cassandra, Mongodb, CouchDB, Redis, Riak, Membase, Neo4j and HBase) in his blog.

While SQL databases are very useful tools, the monopoly is about to be broken after 15 years of monopoly. It's only a matter of time: there are countless situations where you're forced to use a relational database, only to find it doesn't fit your needs.

But the difference between NoSQL databases is far more than the difference between the two SQL databases. This means that software architects should choose a suitable NoSQL database at the beginning of the project. For this situation, here is a comparison of Cassandra, Mongodb, CouchDB, Redis, Riak, Membase, Neo4j, and HBase:

Note: NoSQL is a brand-new database revolution, and NoSQL advocates advocate the use of non-relational data stores. Today's computer architectures require massive horizontal scalability in terms of data storage, and NoSQL aims to change that. Currently Google's BigTable and Amazon's Dynamo use NoSQL databases.

1. CouchDB

Language used: Erlang
Features: DB consistency, easy to
use License: Apache
protocol: HTTP/REST
bidirectional data replication,
continuous or temporary processing,
with conflict checking during processing,
therefore, master-master replication is used (see editor's note 2)
MVCC - Write operations do not block read operations
Can save previous versions of files
Crash-only (reliable) design
Requires data compression from time to time
Views: Embedded map/reduce
Formatted views: List display
Support for server-side document validation
support Authentication
Real-time updates based on changes
Support for attachment handling
Therefore, CouchApps (standalone js apps)
require the jQuery library

Best Application Scenario: It is suitable for applications with less data changes, executing predefined queries, and performing data statistics. For applications that need to provide data version support.

For example: CRM, CMS system. Master-master replication is very useful for multi-site deployments.

Note: master-master replication: is a database synchronization method that allows data to be shared among a group of computers, and data can be updated within the group by any member of the group.

2. Redis

Language used: C/C++
Features: Running very fast
License: BSD
protocol: Telnet-like
in-memory database supported by hard disk storage,
but data can be exchanged to hard disk since version 2.0 (note that this feature is not supported in versions after 2.4!)
Master-slave replication (see Note 3)
uses simple data or a hash table indexed by key values, but also supports complex operations such as ZREVRANGEBYSCORE.
INCR & co (suitable for calculating limits or statistics)
supports sets (also supports union/diff/inter)
supports lists (also supports queues; blocking pop operations)
supports hash tables (objects with multiple fields)
Support sorting sets (high score table, suitable for range queries)
Redis supports transactions
Supports setting data as expired data (similar to fast buffer design)
Pub/Sub allows users to implement message mechanism

Best application scenario: It is suitable for applications where data changes rapidly and the database size can be met (suitable for memory capacity).

For example: stock prices, data analysis, real-time data collection, real-time communication.

Note: Master-slave replication: If only one server handles all replication requests at the same time, this is called Master-slave replication and is usually used in server clusters that need to provide high availability.

3. MongoDB

Language used: C++
Features: Some friendly features of SQL (query, index) are retained.
License: AGPL (Originator: Apache)
Protocol: Custom, binary (BSON)
Master/slave replication (supports automatic error recovery, using sets replication)
Built-in sharding mechanism
Supports javascript expression queries
to execute arbitrary javascript on the server side Function
update-in-place support is better than CouchDB
Uses memory-to-file mapping
for data storage Performance concerns over functionality requirements It is
recommended that journaling be turned on (parameter --journal)
On 32-bit operating systems, the database size limit Empty database at about 2.5Gb
accounts for about 192Mb
using GridFS to store big data or metadata (not a real file system)

Best application scenarios: Suitable for applications that require dynamic query support; need to use indexes instead of map/reduce functions; need performance requirements for large databases; need to use CouchDB but fill up memory because data changes too frequently.

For example: You were going to use MySQL or PostgreSQL, but you were discouraged by their own predefined columns.

4. Ripples

Languages ​​used: Erlang and C, and some Javascript
Features: Fault tolerance
License: Apache
Protocol: HTTP/REST or custom binary
Adjustable distribution and reproduction (N, R, W)
with JavaScript or Erlang before or after operation Validation and security support.
Use JavaScript or Erlang for Map/reduce
connection and connection traversal: can be used as a graph database
Index: input metadata for search (version 1.0 will be supported)
Big data object support (Luwak)
provides both "open source" and "enterprise" versions
Full text This search, index, query via Riak search server (beta)
supports Masterless multi-site replication and SNMP monitoring for commercial licenses
  Best use case: For those who want to use a Cassandra-like (Dynamo-like) database but cannot handle bloat and complexity . It is suitable for the situation where you plan to do multi-site replication, but need the scalability, availability and error handling of a single site.

For example: sales data collection, factory control system; has strict requirements on downtime; can be used as a web server that is easy to update.

5. Base

Languages ​​used: Erlang and C
Features: Compatible with Memcache, but both persistence and cluster support
License: Apache 2.0
Protocol: Distributed cache and expansion
Very fast (200k+/sec), data
can be persisted through key-value indexing to Hard Disk
All nodes are unique (master-master replication)
In-memory also supports distributed cache-like cache units
Reduces IO by deduplicating data when writing data
Provides very good cluster management Web interface
Soft does not need to stop the database when updating software Services
Connection brokers that support connection pooling and multiplexing

Best use case: For applications that require low-latency data access, high concurrency support, and high availability

For example: low-latency data access such as advertising-targeted applications, high-concurrency web applications such as online games (eg Zynga)

6. Neo4j

Language used: Java
Features: Relational based graph database
License: GPL, some features use AGPL/Commercial License
Agreement: HTTP/REST (or embedded in Java) Nodes and edges
that can be used standalone or embedded in Java applications
graph All can have metadata
Good built-in web management capabilities
Supports path search using multiple algorithms
Indexes using key-value and relational
Optimized for read operations Supports
transactions (with Java api) using Gremlin
graph traversal language Backup, Advanced Monitoring and High Reliability Supported with AGPL/Commercial License

Best application scenario: suitable for data such as graphics. This is the most significant difference between Neo4j and other nosql databases

For example: social relations, public transport networks, maps and network topology

7. Cassandra

Language used: Java
Features: Best support for large tables and Dynamo
License: Apache
Protocol: Custom, binary (thrifty)
Adjustable distribution and replication (N, R, W)
support via a range of key values Column query
Large table-like functionality: columns, a set of columns for a feature
Write operations are faster than read operations
Map/reduce as much as possible based on the Apache distributed platform
I admit to being biased against Cassandra, partly because of its bloat and complexity properties, also because of Java issues (configuration, exceptions, etc.)

Best use case: when using more writes than reads (logging) if every system component has to be written in Java (no one gets fired for choosing Apache software)

For example: banking, finance (although not necessary for financial transactions, but these industries will have bigger database requirements than they are) write is faster than read, so a natural feature is real-time data analysis

8. HBase

(use with ghshephard)

Language used: Java
Features: Support billions of rows X millions of columns
License: Apache
protocol: HTTP/REST (support Thrift, see Note 4)
Modeled after BigTable
Use distributed architecture Map/reduce
for real-time query Optimized
High-performance Thrift gateway Predicts
query operations by scanning and filtering on the server side
Supports XML, Protobuf, and binary HTTP
Cascading, hive, and pig source and sink modules
Jruby (JIRB)-based shell
for configuration changes and less The upgrade will be rolled back again, there will
be no single point of failure
, comparable to the random access performance of MySQL

Best Application Scenario: Suitable for occasions where BigTable is preferred :) and random, real-time access to big data is required.

For example: Facebook message database (more general use cases coming soon)

Note: Thrift is an interface definition language that provides definition and creation services for many other languages, developed and open sourced by Facebook.


Of course, all systems have more than the characteristics listed above. Here I just list some of the important features that I think are important from my own perspective. At the same time, technological progress is rapid, so the above will definitely need to be updated constantly. I will do my best to update this list.

<!--endmain-->

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326487270&siteId=291194637