Ceph is a set of high-performance, easy-to-expand, non-single-point distributed file storage system, developed based on Sage A. Weil's thesis, mainly provides the following three storage services:
The structure of Ceph, object storage is provided by LIBRADOS and RADOSGW, block storage is provided by RBD, and file system is provided by CEPH FS, while RADOSGW, RBD, CEPH FS all need to call the interface of LIBRADOS, and finally they are stored in RADOS in the form of objects inside.
Nodes in a Ceph cluster have three roles:
For more information on Ceph architecture, please refer to the official introduction: Architecture
Ceph is already a relatively mature storage system. It is an ideal storage backend for OpenStack and can also be used as a storage backend for Hadoop. This involves comparison with Swift and HDFS.
Ceph and Swift
Ceph is written in C++ and Swift is written in Python, and Ceph should be superior in performance. However, unlike Ceph, Swift focuses on object storage. As one of the OpenStack components, it has been verified by a large number of production practices and is well integrated with OpenStack. At present, many people use Ceph to provide block storage for OpenStack, but still use Swift to provide object storage.
The developers of Swift have written an article comparing Ceph and Swift: Ceph and Swift: Why we are not fighting.
Ceph and HDFS
The advantage of Ceph compared to HDFS is that it is easy to expand and has no single point. HDFS is specially designed for cloud computing such as Hadoop, and has inherent advantages in offline batch processing of big data, while Ceph is a general-purpose real-time storage system. Although Hadoop can use Ceph as a storage backend (according to Ceph's official tutorial, it can't be integrated, so I wrote a concise step: http://www.kai-zhang.com/cloud-com putting/Running- Hadoop -on- CEPH/ ), but the performance of computing tasks is still slightly inferior to HDFS (about 30% slower in time Haceph: Scalable Meta- data Management for Hadoop using Ceph )。
- Object Storage, which can access or store data in the form of objects through the use of Ceph libraries, using C, C++, Java, Python, PHP code, or through the Restful gateway, is compatible with Amazon's S3 and OpenStack's Swift.
- Block storage is directly mounted as a block device like a hard disk.
- File System, mounted like a network file system, compatible with POSIX interface.
The structure of Ceph, object storage is provided by LIBRADOS and RADOSGW, block storage is provided by RBD, and file system is provided by CEPH FS, while RADOSGW, RBD, CEPH FS all need to call the interface of LIBRADOS, and finally they are stored in RADOS in the form of objects inside.
Nodes in a Ceph cluster have three roles:
- Monitor, monitor the health of the cluster, and send the latest CRUSH map (including the current network topology) to the client
- OSD, maintains objects on nodes, responds to client requests, synchronizes with other OSD nodes
- MDS, which provides the metadata of the file, can not be installed if CephFS is not used
For more information on Ceph architecture, please refer to the official introduction: Architecture
Ceph is already a relatively mature storage system. It is an ideal storage backend for OpenStack and can also be used as a storage backend for Hadoop. This involves comparison with Swift and HDFS.
Ceph and Swift
Ceph is written in C++ and Swift is written in Python, and Ceph should be superior in performance. However, unlike Ceph, Swift focuses on object storage. As one of the OpenStack components, it has been verified by a large number of production practices and is well integrated with OpenStack. At present, many people use Ceph to provide block storage for OpenStack, but still use Swift to provide object storage.
The developers of Swift have written an article comparing Ceph and Swift: Ceph and Swift: Why we are not fighting.
Ceph and HDFS
The advantage of Ceph compared to HDFS is that it is easy to expand and has no single point. HDFS is specially designed for cloud computing such as Hadoop, and has inherent advantages in offline batch processing of big data, while Ceph is a general-purpose real-time storage system. Although Hadoop can use Ceph as a storage backend (according to Ceph's official tutorial, it can't be integrated, so I wrote a concise step: http://www.kai-zhang.com/cloud-com putting/Running- Hadoop -on- CEPH/ ), but the performance of computing tasks is still slightly inferior to HDFS (about 30% slower in time Haceph: Scalable Meta- data Management for Hadoop using Ceph )。
Source: https://www.zhihu.com/question/21718731/answer/21545274