如何选择NOSQL的35＋个场景

35+ Use Cases for Choosing Your Next NoSQL Database

Monday, June 20, 2011 at 8:50AM

We've asked What The Heck Are You Actually Using NoSQL For? . We've asked 101 Questions To Ask When Considering A NoSQL Database . We've even had a webinar What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications .

Now we get to the point of considering use cases and which systems might be appropriate for those use cases.

What are your options?

First, let's cover what are the various data models. These have been adapted from Emil Eifrem and NoSQL databases .

Document Databases

Lineage: Inspired by Lotus Notes.
Data model: Collections of documents, which contain key-value collections.
Example: CouchDB, MongoDB
Good at: Natural data modeling. Programmer friendly. Rapid development. Web friendly, CRUD.

Graph Databases

Lineage: Euler and graph theory.
Data model: Nodes & relationships, both which can hold key-value pairs
Example: AllegroGraph, InfoGrid, Neo4j
Good at: Rock complicated graph problems. Fast.

Relational Databases

Lineage: E. F. Codd in A Relational Model of Data for Large Shared Data Banks
Data Model: a set of relations
Example: VoltDB, Clustrix, MySQL
Good at: High performing, scalable OLTP. SQL access. Materialized views. Transactions matter. Programmer friendly transactions.

Object Oriented Databases

Lineage: Graph Database Research
Data Model: Objects
Example: Objectivity, Gemstone

Key-Value Stores

Lineage: Amazon's Dynamo paper and Distributed HashTables .
Data model: A global collection of KV pairs.
Example: Membase, Riak
Good at: Handles size well. Processing a constant stream of small reads and writes. Fast. Programmer friendly.

BigTable Clones

Lineage: Google's BigTable paper .
Data model: Column family, i.e. a tabular model where each row at least in theory can have an individual configuration of columns.
Example: HBase, Hypertable, Cassandra
Goog at: Handles size well. Stream massive write loads. High availability. Multiple-data centers. MapReduce.

Data Structure Servers

Lineage: ?
Example: Redis
Data model: Operations over dictionaries, lists, sets and string values.
Good at: Quirky stuff you never thought of using a database for before.

Grid Databases

Lineage: Data Grid and Tuple Space research.
Data Model: Space Based Architecture
Example: GigaSpaces, Coherence
Good at: High performance and scalable transaction processing.

What should your application use?

Key point is to rethink how your application could work differently in terms of the different data models and the different products. Right data model for the right problem. Right product for the right problem.
To see what models might help your application take a look at What The Heck Are You Actually Using NoSQL For? In this article I tried to pull together a lot of unconventional use cases of the different qualities and features developers have used in building systems.
Match what you need to do with these use cases. From there you can backtrack to the products you may want to include in your architecture. NoSQL, SQL, it doesn't matter.
Look at Data Model + Product Features + Your Situation. Products have such different feature sets it's almost impossible to recommend by pure data model alone.
Which option is best is determined by your priorities.

If your application needs...

complex transactions because you can't afford to lose data or if you would like a simple transaction programming model then look at a Relational or Grid database.(复杂事务，请用关系数据库或网格数据库，因为你不能承受丢失数据的风险或者你需要简单的事务程序模型)
- Example: an inventory system that might want full ACID. I was very unhappy when I bought a product and they said later they were out of stock. I did not want a compensated transaction. I wanted my item!
to scale then NoSQL or SQL can work. Look for systems that support scale-out, partitioning, live addition and removal of machines, load balancing, automatic sharding and rebalancing, and fault tolerance.(伸缩性，NoSQL或SQL皆可，寻找支持水平伸缩，分区，在线增加或减少机器，负载均衡，自动分片和错误容忍等特性的系统)
to always be able to write to a database because you need high availability then look at Bigtable Clones which feature eventual consistency. (追求高可用性，可用Bigtable等支持最终一致性的数据库)
to handle lots of small continuous reads and writes , that may be volatile, then look at Document or Key-value or databases offering fast in-memory access. Also consider SSD. (需要处理海量的、持续的、可能急剧变动的小数据读写，可选用键值型数据库，同样可以考虑采用SSD)
to implement social network operations then you first may want a Graph database or second, a database like Riak that supports relationships. An in- memory relational database with simple SQL joins might suffice for small data sets. Redis' set and list operations could work too.(用于社交网络，首先可考虑图形数据库，其次支持关系的数据库如Riak。Redis可存储set和list的特性，使它同样适用大量小数据集的场合)

If your application needs...

to operate over a wide variety of access patterns and data types then look at a Document database, they generally are flexible and perform well. (如存在大量的访问模式和数据类型，可关注文档型数据库，他们足够灵活，运行良好)
powerful offline reporting with large datasets then look at Hadoop first and second, products that support MapReduce. Supporting MapReduce isn't the same as being good at it. (如需要强大的基于海量数据的离线报表，可首先考虑采用Hadhoop, 此外，也可考虑支持MapReduce的产品。当然，支持MapReduce不代表能跟MapReduce很好地配合 )
to span multiple data-centers then look at Bigtable Clones and other products that offer a distributed option that can handle the long latencies and are partition tolerant. (如需跨越多个数据中心，可选用基于Bigtable模型的产品，或其它提供分布式，可以处理长延迟，提供分区容忍性的产品)
to build CRUD apps then look at a Document database, they make it easy to access complex data without joins. (增删查改型的应用，可考虑文档型数据库，可以很方便的访问复杂数据结构而无需表关联)
built-in search then look at Riak. (内置全文搜索，可用Riak)
to operate on data structures like lists, sets, queues, publish- subscribe then look at Redis. Useful for distributed locking, capped logs, and a lot more. (需要存储lists, sets, queues, publish-subscribe等数据结构，考虑Redis)
programmer friendliness in the form of programmer friendly data types like JSON, HTTP, REST, Javascript then first look at Document databases and then Key-value Databases. (编程友好的，考虑文档型数据库及键值型数据库)

If your application needs...

transactions combined with materialized views for real-time data feeds then look at VoltDB. Great for data-rollups and time windowing. (需要实时的事务处理，物化视图，可考虑VoltDB。比较好的NOSQL替代方案)
enterprise level support and SLAs then look for a product that makes a point of catering to that market. Membase is an example. (需要企业级支持及SLA，考虑Membase等提供商业支持的产品)
to log continuous streams of data that may have no consistency guarantees necessary at all then look at Bigtable Clones because they generally work on distributed file systems that can handle a lot of writes.(如需记录连接的、根本不需要一致性保证流数据，可选用BigTable类产品，因为他们能在分布式文件系统下，处理大量的写操作)
to be as simple as possible to operate then look for a hosted or PaaS solution because they will do all the work for you. (需要尽可能的简单，可采用PAAS的解决方案)
to be sold to enterprise customers then consider a Relational Database because they are used to relational technology. (卖给企业客户的产品，考虑关系数据库，因为企业于习惯使用该技术)
to dynamically build r elationships between objects that have dynamic properties then consider a Graph Database because often they will not require a schema and models can be built incrementally through programming. (动态地在具有动态属性的对象之间构建多维关系，考虑图形数据库，因为他们不需要模式，且模型可在程序中逐步增加)
to support large media then look storage services like S3. NoSQL systems tend not to handle large BLOBS, though MongoDB has a file service. (需要支持大媒体对象，可考虑AMAZON S3之类的存储服务，NOSQL没有设计用于处理大BLOGS, 虽然MonoDB提供了文件服务 )

If your application needs...

to bulk upload lots of data quickly and efficiently then look for a product supports that scenario. Most will not because they don't support bulk operations. (需要批量上传大量数据，可以去寻找支持此场景的产品，大多产品不支持此功能)
an easier upgrade path then use a fluid schema system like a Document Database or a Key-value Database because it supports optional fields, adding fields, and field deletions without the need to build an entire schema migration framework. (更容易的升级路径，可用支持动态schema的文档数据库或键值数据库)
to implement integrity constraints then pick a database that support SQL DDL, implement them in stored procedures, or implement them in application code. (要实现数据完整性的约束，选用支持SQL DDL的产品，可在存储过程或应用程序代码中实现)
a very deep join depth the use a Graph Database because they support blisteringly fast navigation between entities. (要实现深层次的连接，可选用图形数据库)
to move behavior close to the data so the data doesn't have to be moved over the network then look at stored procedures of one kind or another. These can be found in Relational, Grid, Document, and even Key-value databases.

If your application needs...

to cache or store BLOB data then look at a Key-value store. Caching can for bits of web pages, or to save complex objects that were expensive to join in a relational database, to reduce latency, and so on.(需缓存或保存BLOG数据，可选键值型存储产品)
a proven track record like not corrupting data and just generally working then pick an established product and when you hit scaling (or other issues) use on of the common workarounds (scale-up, tuning, memcached, sharding, denormalization, etc). (选择已验证的可靠产品，在需要扩展时选择通用的解决方案，如向上扩展，调优，memcached, 分片, 反范式等等)
fluid data types because your data isn't tabular in nature, or requires a flexible number of columns, or has a complex structure, or varies by user (or whatever), then look at Document, Key-value, and Bigtable Clone databases. Each has a lot of flexibility in their data types. (多变的数据类型，不固定的列，或复杂的数据结构图，选择文档型，键值型或列式数据库)
other business units to run quick relational queries so you don't have to reimplement everything then use a database that supports SQL. (需要快速的关系查询，但不想在应用中实现一切，可选用支持SQL的数据库)
to operate in the cloud and automatically take full advantage of cloud features then we may not be there yet. (运行于云端，自动利用云的一切特性与优点，暂时没这方面的产品)

If your application needs...

support for secondary indexes so you can look up data by different keys then look at relational databases and Cassandra's new secondary index support. (需要支持二级索引，可考虑关系数据库或Cassandra等产品)
creates an ever-growing set of data (really BigData) that rarely gets accessed then look at Bigtable Clone which will spread the data over a distributed file system. (需要不停地写入数据，但数据很少被访问，可选列式数据库)
to integrate with other services then check if the database provides some sort of write- behind syncing feature so you can capture database changes and feed them into other systems to ensure consistency. (需要和其它服务集成，可寻找提供后台同步功能特性的产品，让你捕捉数据变化以同步至其它数据库)
fault tolerance check how durable writes are in the face power failures, partitions, and other failure scenarios. (需要考虑错误容忍性。检查停电、分区、或其它错误场景下的写操作的可靠性)
to push the technological envelope in a direction nobody seems to be going then build it yourself because that's what it takes to be great sometimes.
to work on a mobile platform then look at CouchDB/Mobile couchbase . (移动平台，可用CouchDB/Mobule couchbase)

Which is Better?

Moving for a 25% improvement is probably not a reason to go NoSQL.如果只是为了25％的性能提升，没必要用NoSQL
Benchmark relevancy depends on the use case. Does it match your situation(s)? 不同的应用场景会有不同的benchmark，在用之前最好自己测试一下
Are you a startup that needs to release a product as soon as possible and you are playing around with ideas? Both SQL and NoSQL can make an argument.
Performance may be equal on one box, but what happens when you need N?当你的数据比较少时，各个产品的性能可能都差不多，在你数据量增大时，哪个产品更好呢？
Everything has problems, if you look at Amazon forums it's EBS is slow, or my instances won't reply, etc. For GAE it's the datastore is slow or X. Every product which people are using will have problems. Are you OK with the problems of the system you've selected?请记住，每个产品都可能有问题，有bug，宕机等等，你是否已经对你选择的产品会出现的问题有充分的了解呢。是否已做好准备应对这些突发的问题呢？

原文地址：http://highscalability.com/blog/2011/6/20/35-use-cases-for-choosing-your-next-nosql-database.html