Distributed storage system (a) - concept

Distributed storage system is a large number of ordinary PC server via Internet connectivity, external memory as a whole to provide services.

In a recent study, "large-scale distributed storage system," a book, by the way excerpts finishing, in-depth understanding of the principles and architecture, easy to learn, welcomed the exchange.

First, the concept

Several distributed storage system has the following characteristics:

1 and scalable

Distributed systems can be extended to cluster size to hundreds of thousands of units, and, with the growth of the cluster size, the overall system performance is linear growth.

2, low-cost

Automatic fault-tolerant distributed storage system, automatic load balancing mechanism so that it can be built on top of an ordinary PC. Further, the linear expansion so as to increase capacity and reduce the machine is very convenient, low cost can be used for automatic operation and maintenance.

3, High Performance

Whether the entire cluster or a stand-alone service, are required to have a high-performance distributed systems.

4, easy to use

Distributed storage system needs to provide medical external interfaces, while also required to have perfect monitoring, operation and maintenance tools, and the ability to easily integrate with other systems. The import data Hadoop cloud computing system.

 

The main challenge of distributed storage system that persistent data, status information, requiring automatic migration, automatic fault tolerance, concurrent read and write in the process to ensure data consistency.

Technology involved mainly: 1, distributed systems; 2, database;

Second, the classification

Since the data distributed storage system needs faced relatively complex, which can be roughly divided into three categories:

1, unstructured data. Such as office documents, pictures, audio and video;

2, structured data. Is typically stored in a relational database, a relational table may be represented by a two-dimensional structure;

3, semi-structured data. Interposed between the unstructured and structured data, such as HTML, generally self-describing;

Different types of data processing can be divided into:

1, Distributed File System

Internet applications which need to store a lot of pictures, photos, video and other unstructured data objects, such data as objects of the organization, there is no association between objects, commonly referred to as Blob (Binary Large Object) data. A typical system has a Facebook Haystack, TFS and so on. Within the distributed file system in accordance with the data blocks (the chunk) to organize the data, substantially the same as the size of each data block, each data block may comprise a plurality of fixed length blocks or Blob object, a large file may be split into a plurality of data block, which is achieved the underlying principles thereof.

2, a distributed key-value system

Distributed system for a simple memory key relationship semi-structured data. Only provide CRUD primary key function, the system has a typical Amazon Dynarno. From the data structure point of view, a system and a conventional key distributed Hash table similar, except that the system supports a distributed key distributed to a plurality of data storage nodes in the cluster. Usually used as a cache, such as the well-known Memcache.

3, distributed spreadsheet system

Distributed system table for storing relations more complex semi-structured data, in addition to CRUD function, but also supports a primary key range scan. And draw a lot of relational database technology, such as transaction support to some extent. A typical system includes Google Bigtable, Mega store and so on. But not more complex operations, such as multi-table associated multi-table join nested subqueries.

4, distributed database

Distributed by the general expansion of the database from single relational database for storing structured data, providing SQL relational query language that supports multi-table related, nested queries, concurrency control and complex operations such as database transactions. A typical system comprises Mysql Sharding, Amazon RDS, OceanBase. 

 

-------------------------------------------

Have any suggestions or questions, please add the exchange of micro-channel learning together, welcome in IT, love IT, like root out the source of a large cattle industry to join together to explore.

Personal Micro Signal: bboyHan

Passionate about: Golang, Java, Python, block chain, architecture design, data analysis and so on.

Published 77 original articles · won praise 123 · Views 300,000 +

Guess you like

Origin blog.csdn.net/han0373/article/details/83243159