background
This article mainly organizes Yang Chuanhui (Rizhao) "Large-scale Distributed Storage System Principle Analysis and Architecture Actual Combat" , big talk storage, network resources (refer to the link at the end of the article) and personal understanding, with the intention of constructing the basic trajectory of storage development and some basic common sense , So that more beginners like me have a macro understanding.
Storage history
From stand-alone to the Internet, the main development of storage as the infrastructure is to evolve around the goal of low-cost, high-performance, scalability, and ease of use. Today, storage is divided into stand-alone storage, centralized storage, and distributed storage. Storage, cloud storage, cloud native storage
The basic form of storage at each stage is as follows
Basic form of each stage
Stored data classification & model
Regardless of stand-alone storage, distributed storage, or cloud storage, it is based on specific application scenarios to build a corresponding storage data model for specified data types
Data Classification
Data model
Storage type
Three common storage types: block storage, file storage, and object storage
Block storage
Based on the Block storage mode, there are two common storage methods:
- DAS (Direct Attch Storage), directly connected to the host storage mode
- SAN (Storage Area Netowrk), high-speed network link host storage mode
File storage
Attached to the network to provide file storage services
Object storage
Built on key-value storage, the core is to separate the data path (data) from the control path (meta), and build a storage system based on Object-based Storage Device (OSD), and serve externally in the form of RSETful API
Stand-alone storage
basic concepts
The stand-alone storage system is an encapsulation of the stand-alone storage engine (the implementation of data structures on persistent media such as mechanical disks and SSDs), which provides external storage services for files, key values, tables, or relational models.
Storage engine
The storage engine is the engine of the storage system, which determines the functions and performance that the storage system can provide. The provided functions include:
- Add (Create)
- Read (Retrieve), random read and sequential scan
- Update
- Delete (Delete)
The differences between the engines are as follows:
engine | mechanism | stand by | not support | Corresponding storage system |
Hash storage engine | Persistence implementation of hash table. Key-value storage system based on hash table structure, realized by array + linked list | Add, delete, modify, random read | Sequential scan | Key-Value storage system |
B-tree storage engine | Implementation of B-tree persistence | Add, delete, modify, random read & sequential scan | relational database | |
LSM (Log-Structured Merge Tree) tree storage engine | Similar to the B-tree, the difference is that when a large tree is split into N small trees, it is written to the memory first, and after reaching a certain threshold, it is written to the disk. The trees in the disk can be merged periodically to merge into one tree. Big tree to optimize read performance | Add, delete, modify, random read & sequential scan | Bigtable; HBase; |
Centralized storage
basic concepts
Compared with stand-alone storage, centralized storage contains more components, in addition to equipment such as the head (controller), disk array (JBOD) and switches, as well as auxiliary equipment such as management equipment.
Reference: Basic logic diagram of centralized storage
System Components
- The head, the core component of the entire storage system, usually consists of a controller, front and rear ports,
- There are usually two controllers to achieve mutual backup and high availability. The software in the controller implements the management of the disk, abstracts the disk into a storage resource pool, and then divides it into LUNs for use by the server.
- Front and rear ports, front-end port users provide storage services for the server, and back-end ports are used to expand the capacity of the storage system (connect more storage devices)
- Disk cabinet (Just a Bound Of Disk, JBOD), the disk is hung in a dedicated cabinet outside the server, with independent power supply, heat dissipation, interfaces, etc., internal cable connection (SCSI), and unified mounting to the rear port of the head
Distributed storage
basic concepts
A distributed storage system connects discrete and independent storage devices through a network and associates the system to provide storage services externally as a whole.
Taxonomy
Design Principles
Reference CAP
Cloud storage
basic concepts
Cloud storage is a storage service method in the cloud computing field. The bottom layer is built on the basis of distributed storage, and the upper layer provides storage services through the Internet. In addition to the basic characteristics of distributed storage, it is more flexible, usually by cloud vendors. provide
Reference product
engine | Object storage | File storage | Block storage |
AWS | Amazon Simple Storage Service (Amazon S3) | Amazon Elastic File System (Amazon EFS)Amazon FSx for Windows File ServerAmazon FSx for Lustre | Amazon Elastic Block Store (EBS) |
Aliyun | US | File storage NAS file storage CPFS file storage HDFS | Block storage |
Cloud native storage
basic concepts
Cloud native storage is born out of cloud storage. In addition to the characteristics of cloud storage, all other components in the cloud native ecosystem must have the same dynamics (public cloud/private cloud/hybrid cloud, etc.) to build scalable applications, S3 API driven, K8S friendly, etc.
Reference example
CNCF's first cloud-native storage project, Rook, introduces file, data block, and object storage systems into a Kubernetes cluster, and runs seamlessly with other applications and services that are using storage. In this way, cloud-native clusters can be self-sufficient and portable in public clouds and local deployments. The purpose of this project is to enable enterprises to modernize data centers for distributed storage systems running in local and public cloud environments through dynamic application orchestration.
Rook Architecure
Ceph Rook integrates with Kubernetes
MinIO is a high-performance, software-defined, object storage suite that helps customers build cloud-native data infrastructure. It can be integrated with Kubernetes, allowing operators to use the Kubernetes interface to manage storage, and Kubernetes can handle all transactions from storage provision to volume placement.
Born cloud native
This article is the original content of Alibaba Cloud and may not be reproduced without permission.