Common sense of graphic storage: stand-alone, centralized, distributed, cloud, cloud native storage

background

This article mainly organizes Yang Chuanhui (Rizhao) "Large-scale Distributed Storage System Principle Analysis and Architecture Actual Combat" , big talk storage, network resources (refer to the link at the end of the article) and personal understanding, with the intention of constructing the basic trajectory of storage development and some basic common sense , So that more beginners like me have a macro understanding.

Storage history

From stand-alone to the Internet, the main development of storage as the infrastructure is to evolve around the goal of low-cost, high-performance, scalability, and ease of use. Today, storage is divided into stand-alone storage, centralized storage, and distributed storage. Storage, cloud storage, cloud native storage

The basic form of storage at each stage is as follows

Basic form of each stage

Stored data classification & model

Regardless of stand-alone storage, distributed storage, or cloud storage, it is based on specific application scenarios to build a corresponding storage data model for specified data types

Data Classification

Data model

Storage type

Three common storage types: block storage, file storage, and object storage

Block storage

Based on the Block storage mode, there are two common storage methods:

  • DAS (Direct Attch Storage), directly connected to the host storage mode
  • SAN (Storage Area Netowrk), high-speed network link host storage mode

File storage

Attached to the network to provide file storage services

Object storage

Built on key-value storage, the core is to separate the data path (data) from the control path (meta), and build a storage system based on Object-based Storage Device (OSD), and serve externally in the form of RSETful API

Stand-alone storage

basic concepts

The stand-alone storage system is an encapsulation of the stand-alone storage engine (the implementation of data structures on persistent media such as mechanical disks and SSDs), which provides external storage services for files, key values, tables, or relational models.

Storage engine

The storage engine is the engine of the storage system, which determines the functions and performance that the storage system can provide. The provided functions include:

  • Add (Create)
  • Read (Retrieve), random read and sequential scan
  • Update
  • Delete (Delete)

The differences between the engines are as follows:

engine mechanism stand by not support Corresponding storage system
Hash storage engine Persistence implementation of hash table. Key-value storage system based on hash table structure, realized by array + linked list Add, delete, modify, random read Sequential scan Key-Value storage system
B-tree storage engine Implementation of B-tree persistence Add, delete, modify, random read & sequential scan   relational database
LSM (Log-Structured Merge Tree) tree storage engine Similar to the B-tree, the difference is that when a large tree is split into N small trees, it is written to the memory first, and after reaching a certain threshold, it is written to the disk. The trees in the disk can be merged periodically to merge into one tree. Big tree to optimize read performance Add, delete, modify, random read & sequential scan   Bigtable; HBase;

Centralized storage

basic concepts

Compared with stand-alone storage, centralized storage contains more components, in addition to equipment such as the head (controller), disk array (JBOD) and switches, as well as auxiliary equipment such as management equipment.

Reference: Basic logic diagram of centralized storage

System Components

  • The head, the core component of the entire storage system, usually consists of a controller, front and rear ports,
    • There are usually two controllers to achieve mutual backup and high availability. The software in the controller implements the management of the disk, abstracts the disk into a storage resource pool, and then divides it into LUNs for use by the server.
    • Front and rear ports, front-end port users provide storage services for the server, and back-end ports are used to expand the capacity of the storage system (connect more storage devices)
  • Disk cabinet (Just a Bound Of Disk, JBOD), the disk is hung in a dedicated cabinet outside the server, with independent power supply, heat dissipation, interfaces, etc., internal cable connection (SCSI), and unified mounting to the rear port of the head

Distributed storage

basic concepts

A distributed storage system connects discrete and independent storage devices through a network and associates the system to provide storage services externally as a whole.

Taxonomy

Design Principles

Reference CAP

Cloud storage

basic concepts

Cloud storage is a storage service method in the cloud computing field. The bottom layer is built on the basis of distributed storage, and the upper layer provides storage services through the Internet. In addition to the basic characteristics of distributed storage, it is more flexible, usually by cloud vendors. provide

Reference product

engine Object storage File storage Block storage
AWS Amazon Simple Storage Service (Amazon S3) Amazon Elastic File System (Amazon EFS)Amazon FSx for Windows File ServerAmazon FSx for Lustre Amazon Elastic Block Store (EBS)
Aliyun US File storage NAS file storage CPFS file storage HDFS Block storage

Cloud native storage

basic concepts

Cloud native storage is born out of cloud storage. In addition to the characteristics of cloud storage, all other components in the cloud native ecosystem must have the same dynamics (public cloud/private cloud/hybrid cloud, etc.) to build scalable applications, S3 API driven, K8S friendly, etc.

Reference example

Rook

CNCF's first cloud-native storage project, Rook, introduces file, data block, and object storage systems into a Kubernetes cluster, and runs seamlessly with other applications and services that are using storage. In this way, cloud-native clusters can be self-sufficient and portable in public clouds and local deployments. The purpose of this project is to enable enterprises to modernize data centers for distributed storage systems running in local and public cloud environments through dynamic application orchestration.

Rook Architecure

Ceph Rook integrates with Kubernetes

MinIO

MinIO is a high-performance, software-defined, object storage suite that helps customers build cloud-native data infrastructure. It can be integrated with Kubernetes, allowing operators to use the Kubernetes interface to manage storage, and Kubernetes can handle all transactions from storage provision to volume placement.

Born cloud native

Original link

This article is the original content of Alibaba Cloud and may not be reproduced without permission.

Guess you like

Origin blog.csdn.net/weixin_43970890/article/details/114656485