Database Storage Model - Data Storage

According to the storage model, it is divided into the following four categories.

key-value model

The main idea of ​​the key-value data model comes from the hash table: there is a specific key and a value pointer in the hash table, pointing to specific data. The biggest advantage of the key-value model for mass data storage systems is that the data model is simple and easy to implement, and it is very suitable for operations such as querying and modifying data through keys. However, if the entire mass data storage system needs to focus more on batch data query and update operations, the key-value data model is obviously insufficient in efficiency. Likewise, key-value stores do not support data manipulation with particularly complex logic.

1)Restate

Redis is essentially an in-memory database with a key-value model. The entire database is loaded in memory for data operations, and the database data is periodically written back to the hard disk for preservation through asynchronous operations. Because it is a pure memory operation, the performance of Redis is very good, and it can process more than 100,000 read and write operations per second. The excellence of Redis is not only performance, its biggest feature is that it supports complex data structures such as linked lists and sets, and also supports various operations on linked lists.

The main disadvantages are: the database capacity is limited by physical memory, and cannot be simply used for high-performance reading and writing of large amounts of data, and it does not have a native scalability mechanism and has no scalability, and relies on the client to achieve distributed reading. Write. Therefore, the scenarios that Redis is suitable for are mainly limited to high-performance operations and operations with a small amount of data.
The underlying principle of the redis data structure can be referred to: The underlying principle of redis

2)Dynamo

Dynamo is a distributed key-value storage system proposed by Amazon, which has a highly available and scalable distributed data storage center. Dynamo is a dynamically self-adapting distributed system. Storage nodes can be simply added and removed from Dynamo without any manual partitioning and reallocation.

As a commercial application, Dynamo does not disclose technical documentation and source code. For more information, please refer to: Amazon Dynamo System Architecture

columnar storage

Columnar storage mainly uses traditional data models like tables, but it does not support multi-table operations like table joins. Its main feature is that when storing data, it mainly revolves around columns, unlike traditional relational storage. db is stored on a row basis. That is to say, the data belonging to the same column is stored on the same page of the hard disk as much as possible, rather than storing the data belonging to the same row together, which will save a lot of I/O operations.

Most columnar databases support the "column family" feature, which means that multiple columns are combined into a group. In short, the advantage of this data model is that it is more suitable for applications such as data analysis and data warehouses that need to be quickly searched and large amounts of data.

1)BigTable

Bigtable is a distributed storage system designed for managing large-scale structured data, which can scale to petabytes of data and thousands of servers.
Essentially, Bigtable is a key-value map. According to the author, Bigtable is a sparse, distributed, persistent, multidimensional sorted map.

Let's look at multidimensionality, sorting, mapping. The keys of Bigtable have three dimensions, namely row key, column key and timestamp. Both row key and column key are byte strings, and the timestamp is a 64-bit integer; and the value is a byte string. You can use (row:string, column:string, time:int64)→string to represent a key-value pair record.

example:

table{  
  // ...  
  "aaaaa" : { //一行  
    "A:foo" : { //一列  
        15 : "y", //一个版本  
        4 : "m"  
      },  
    "A:bar" : { //一列  
        15 : "d",  
      },  
    "B:" : { //一列  
        6 : "w"  
        3 : "o"  
        1 : "w"  
      }  
  },  
  // ...  
}  

When querying, if only the row and column are given, the latest version of the data is returned; if the row and column timestamp is given, the data whose time is less than or equal to the timestamp is returned. For example, we query "aaaaa"/"A:foo", the returned value is "y"; query "aaaaa"/"A:foo"/10, the returned result is "m"; query "aaaaa"/"A :foo"/2, the returned result is empty.

2)Cassandra与Hbase

The Cassandra project was developed by Facebook in 2008. The full name of Hbase is Hadoop Database, which is a columnar database built on Apache Hadoop. Hbase is very scalable and is considered a clone of Bigtable, which can store hundreds of millions of rows.

The data models of both Cassandra and Hbase are borrowed from Google's bigtable: the items of each row of data are stored in different columns, and the collection of these columns is called a column family. And each data in each column contains a timestamp attribute, so that multiple versions of the same data item in the column can be saved.

document storage

The goal of document storage is to bridge the gap between key-value stores (providing high performance and scalability) and traditional relational data systems (rich functionality), combining the best of both worlds. Its data is mainly stored in JSON or JSON-like format documents, which is semantic. The document database can be regarded as an upgraded version of the key-value database, which allows key-values ​​to be nested in the stored values, and the document storage model can generally create indexes on its values ​​to facilitate upper-level applications, which is impossible for ordinary key-value databases. supported.

1)MongoDB

MongoDB is a scalable, high-performance, open source document-oriented database developed in C++. It is between relational and non-relational databases. The data structure supported by MongoDB is very loose and is a BJSON format similar to JSON, so it can store more complex data types. Its biggest feature is that the supported query language is very powerful, and its syntax is somewhat similar to the object-oriented query language.

It only needs to solve the problem of access efficiency of massive data. When the amount of data reaches more than 50G, his database access speed is more than 10 times that of MySQL.
To learn more, see: Illustrating the principle of MongoDB

2)CouchDB

CouchDB is a NoSQL open source database project released by the Apache organization. It is Nosql for document data types. It is written in Enlang and uses JSON format to save data. His data structure is very simple, with only three fields: document ID, document version number and content.

Its advantage is that its data storage format is JSON, and as a text data, JSON can be widely used for data transfer between multiple language modules, which is easy to learn.

Graphics storage

Using graph structure to store data can apply graph theory algorithm to perform various complex algorithms, such as shortest path, concentration measurement, etc.

1)Neo4J

It is an embedded, disk-based, JAVA persistence engine that supports complete objects. It stores data in a graph structure instead of a table structure. Neo4J can support massive scalability, processing billions of nodes, graph structures of relationships or attributes on a single machine, or scaling to multiple machines running in parallel.

Graph databases are good at handling large amounts of complex, interconnected, low-structure data. This data changes rapidly and requires frequent queries. It is important to solve the problem of performance degradation that traditional relational databases need to perform a large number of table joins when querying.
For more information, you can check: neo4j-Analysis of the underlying storage structure

2)GraphDB

GraphDB is an enterprise graph data storage system developed by Sones in 2007. It is developed using c#. One of the advantages of GraphDB is that it is good at dealing with a specific class of problems: datasets contain a large number of relationships, and processing these data requires fast and efficient traversal of these relationships.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324893162&siteId=291194637