Analysis of Ceph and 9000 distributed storage

http://www.360doc.com/content/17/0822/09/46248428_681150085.shtml

    Ceph is a very open source distributed SDS product storage system. At the same time provide three functions of object storage, block storage and file system storage to meet different application needs. Ceph is developed in C ++ language and is open source following the LGPL agreement. Sage Weil (Ceph paper publisher) founded Inktank in 2011 to lead the development and community maintenance of Ceph. In 2014, Redhat acquired inktank company and released Inktank Ceph Enterprise Edition. The business scenario focused on cloud, backup and archive, and supported object and block storage applications. Since then, the Ceph open source community version and Redhat Enterprise Edition have appeared.

      OceanStor 9000 is a relatively popular commercial distributed file system in China. It evolved based on Huawei's previous generation CSS-F mass storage system. It abandoned the metadata node in the architecture and adopted a distributed fully symmetric architecture. There are many successful cases and delivery experience in media resources, high-performance computing, big data and video surveillance. Today we take a moment to discuss the differences between Ceph and 9000 storage architectures, application scenarios, and compatibility.

 

Cehp's basic service architecture

      Cehp's basic service architecture mainly includes ObjectStorage Device (OSD), Monitor and MDS. Based on this, Ceph provides the Librados native object base library, the Librdb block storage library, the Librgw S3 and Swift compatible object library and the Libceph file system library.

      OSD (Object Storage Device) is responsible for storing data, processing data replication, data recovery, data rebalancing, and monitoring other OSD conditions through the heartbeat mechanism and reporting to Ceph Monitors.

      Monitor is responsible for monitoring the cluster status, including monitoring its own status, cluster OSD status, PlacementGroup (storage organization and location mapping) status, CRUSH status (Controlled Replication Under Scalable Hashing, a pseudo-random data distribution algorithm). At the same time, Monitor will also record the version information of each of their historical state changes to determine which version the cluster should follow.

      MDS is responsible for the storage and management of the metadata of the file system, that is, as mentioned above, this module is not required for block storage and object storage services. MDS is responsible for providing standard POSIX file access interface.

      Building a Ceph system requires at least one Ceph Monitor and two Ceph OSD logical roles, and the Ceph Metadata server only stores file metadata when running CephFS. But in physical deployment, these logical roles can run on the same physical machine. The Ceph storage data defaults to 2 copies of the copy, so whether it provides Object, Block or File system services, the smallest Ceph system requires 2 OSD storage servers.

 

9000 basic service architecture

      9000 is a relatively large cluster system, which is internally composed of many small clusters responsible for different roles. These clusters are deployed on common storage nodes. The ISM, CMS, and Monitoring clusters are responsible for GUI management configuration, system cluster management, and status monitoring. The nodes are in Active Standby mode to ensure system reliability.

      CA (Client Agent) is responsible for the semantic analysis and execution of file system protocols. It is the file system business engine. File slicing and data combination are all done by CA. 9000 supports standard CIFS, NFS and private clients; in the case of private clients, CA / SCA is installed on the server, developed based on the VFS file system and compatible with Posix interface standards.

      MDS (MetaData Service) manages the metadata of the file system. In 9000, metadata and user data are kept on the storage node, and the metadata is stored in multiple copies with high reliability. The metadata management service manages the layout information of the metadata and file data of the entire system and is responsible for the resource allocation of the system. Each storage node is a metadata node.


      The client's request for the object service is responded to through the OSC object interface service. It first searches the OMD object storage metadata, and then obtains the object location according to the search result and reads and writes the object. OMD provides object metadata services in a cluster mode and is deployed on each object storage physical node.

      OBS is the basic server system of the entire system. It is based on the object storage system. Data is stored on the disk in the form of Key-Value. The OBS system provides upper-layer NAS and Object storage services. OBS-C is a client, responsible for data operations; OBS-S is a server, providing data storage services, data is stored in the data subdomain in the form of objects.

 

Cehp's software architecture

(1) Basic storage system RADOS

The layer of RADOS (Reliable, Autonomic, Distributed Object Store) itself is a complete object storage system, including Cehp's basic services (MDS, OSD, Monitor). In fact, all user data stored in the Ceph system is ultimately generated by this Stored in one layer. Ceph's high reliability, high scalability, high performance, high automation and other features are essentially provided by this layer. Therefore, understanding RADOS is the foundation and key to understanding Ceph.


      The physical form of RADOS consists of a large number of storage device nodes, each node has its own hardware resources (CPU, memory, hard disk, network), and runs the operating system and file system.

 

(2) Basic library librados

      The function of this layer is to abstract and encapsulate RADOS, and provide different APIs to the upper layer, so as to directly develop native objects or upper-layer objects, blocks and files based on RADOS. In particular, RADOS is an object storage system, so the API implemented based on librados is only for object storage functions.

      The native librados API provided by RADOS includes C and C ++. Librados is deployed on the same machine as the applications developed on it. The application calls the librados API on the local machine, and the latter communicates with the nodes in the RADOS cluster through the socket and completes various operations.

 

(3) High-level storage application interface

      This layer includes three parts: RADOSGW (RADOS Gateway), RBD (Reliable Block Device) and Ceph FS (Ceph File System). Its role is to provide a higher level of abstraction on the basis of the librados library, which is more convenient for applications or clients. The upper interface used.

      RADOS GW is a gateway that provides RESTful APIs compatible with Amazon S3 and Swift for the development of corresponding object storage applications. The level of abstraction provided by RADOS GW is higher, but the function is not as powerful as librados. Therefore, developers should choose to use according to their needs.

      RBD provides a standard block device interface, which is often used to create volumes for virtual machines in virtualized scenarios. As mentioned earlier, Red Hat has integrated the RBD driver in KVM / QEMU to improve virtual machine access performance.

      CephFS is a POSIX compatible distributed file system. It is currently under development, so Ceph's official website does not recommend its use in production environments.

 

(4) Server client layer

      This layer is a variety of application methods for various application interfaces of Ceph in different scenarios, such as object storage applications developed directly based on librados, object storage applications developed based on RADOS GW, cloud hard drives based on RBD, and so on.


      Ceph Client is developed based on Fuse layer (User SpacE) and VFS file system, compatible with Posix interface standard. In the Ceph storage system, Ceph Metadata Daemon provides a metadata server, and Ceph Object Storage Daemon provides the actual storage of data and metadata. Ceph needs to use the Crush algorithm (the algorithm responsible for data placement and retrieval in the cluster) to complete storage location calculation and data assembly for DFS, Block, and Object data writing and reading.

   

9000 software architecture

(1) Basic service layer

      OBS is the basic server system of the entire system. Data is stored on the disk in the form of Key-Value, and upper-layer NAS and Object storage services are provided based on the OBS system. Addressing and storage using the Key-Value method can reduce the amount of metadata storage in the original LBA method.

 

(2) Data processing layer

      Based on the OBS basic service layer, NAS and Object services are constructed through the data processing layer. The main core modules of NAS are MDS metadata service and CA data service. Each data request requires CA to find the location of different metadata location request files on different nodes and summarize the data read from each storage node. , Uniformly returned to the client. The main feasible modules of Object are OSC object interface service and OMD object storage metadata service.

(3) Storage service layer

      9000 provides two types of NAS and object services, while providing a wealth of storage value-added services. Objects include data deduplication, multi-version, multi-tenancy, Swift and S3 services; NAS includes value-added services such as snapshots, file replication, tiered storage, load balancing, and other services such as HDFS and NDMP.

Because the 9000 cluster management and monitoring services use the 1 Active master 2 Standby mode, and EC algorithm requirements, so the 9000 storage system supports a minimum of 3 nodes; when the NAS and objects coexist, small nodes start from 5 nodes Match. File and object metadata are separated from each other, but the user data space is shared.

 

OpenStack compatibility and support

      One of the most common uses of Ceph storage software is as an OpenStack cloud storage backend. Another use is to store and retrieve VM images (OpenStack Glance image service) in RADOS. Currently, leading enterprise IT vendors represented by HP, Dell, Intel, etc. and emerging vendors in the OpenStack community represented by Mirantis, eNovance, and UnitedStack all regard Ceph as an important and even preferred open source storage solution.

      Ceph is in fact the most popular open source storage solution in the OpenStack ecosystem. Ceph's Object Storage can connect to network disk and other application services; Block Device Storage can connect to (IaaS), such as OpenStack, CloudStack, Zstack, Eucalyptus, and KVM virtualization and other mainstream IaaS cloud platform software, The file system (CephFS) is not yet mature.

      At present, it has been integrated with QEMU virtual machine (hardware virtualization), and the Ceph block storage service (RBD) is called and managed through related commands. It supports the configuration of various virtual machine images such as KVM, XEN, LXC, and VirtualBOX through the configuration between OpenStacklibvirt and QEMU.


      Support for adding Ceph Block Storage Service (RBD) to OpenStack through libvirt. Integration with Glance (VM image management) and Cinder (block storage) has been completed. Store the virtual machine image to CephRBD through Glance or start the virtual machine through Cinder.

      Ceph Object (Object Gateway) currently supports Amazon S3 and OpenStack Swift, integrated support for OpenStack keystone authentication.

 

      9000 currently plans to support the Openstack Malina NAS interface, and also implements Openstack connection. In addition, it also supports Amazon S3 and OpenStack Swift, integrated support for OpenStack keystone identity authentication. Since the 9000 currently does not support SAN storage, it cannot be integrated with OpenStack Glance mirroring.

 

summarize

      Both the Ceph basic library Librados and high-level applications provide APIs, but the user objects they face are different. The librados API is more low-level and does not have advanced concepts such as accounts and containers. It is more suitable for advanced storage users who have a deep understanding of the storage system and perform function customization and deep optimization of performance. Advanced application APIs such as RADOS Gateway are more suitable for application developers. . Below we make a simple comparison between Ceph and 9000 storage systems.

      Scalability: The scenarios targeted by Ceph are mainly large-scale and distributed storage. The amount of data is generally expected to be above the PB level, and thousands of storage nodes. The 9000 currently supports a maximum of 288 nodes and 60PB capacity. It is reported that after planning to switch to the RH series X86 server in the future, there should be no restrictions on the nodes.

      System architecture: 9000 and Ceph are very similar in architecture, distributed, fully symmetric, X86 commercial hardware SDS architecture, storage layer data is stored on disk based on objects. The difference is that the 9000 integrates a CA data access client, provides standard CIFS and NFS, and is compatible with systems such as Windows, Linux, Unix, and Mac OS, and supports NAS and Object storage services; Ceph requires a separate server client, which is currently mainly compatible with Linux Mainstream system.

      Application scenarios: Ceph supports NAS, SAN, and Object services, with richer service interfaces, but mainly provides SAN and Object storage, which is generally used to interface with OpenStack cloud, focusing on backup and archiving; 9000 supports NAS and Object services, targeting big data, media It is mainly used in high-bandwidth media asset editing, gene sequencing, video monitoring, and capacity resource pools. The application scenarios are slightly different.

      Reliability and capacity utilization: Ceph uses the Crush algorithm (similar to EC) and the double copy method to store data, but Ceph data storage and retrieval processing are processed at the Clinet side, and may have some impact on the performance of the Clinet server during the data reading and writing process. 9000 also uses EC (Erasure Code) and multi-copy technology, so it is basically the same as Ceph in reliability and capacity utilization; but 9000 data slicing and aggregation are implemented on the storage, which has no impact on the host.

      Software features: Both Ceph and 9000 have implemented file snapshot, HDFS and other features, but file classification, file replication, NDMP, etc. have not yet been seen in Ceph. Ceph implements block storage snapshot, thin, and replication functions. Although Ceph is lacking in NAS functions, the current development strategy of Ceph is from Block, Object to FilesyStem.

      Unified management: At present, Ceph uses CLI and configuration files to complete system configuration and management. This is basically a typical feature of open source projects, and it has relatively high requirements for operation and maintenance management personnel. 9000 supports unified GUI management, configuration and monitoring, which is more convenient for operation and maintenance. But Ceph's ability in cloud integration is worth learning and learning from 9000.

Published 13 original articles · Likes6 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/majianting/article/details/102984483