Hadoop's HBase architecture principle and cluster deployment overview

1. Summary

HBase is developed based on the Google BigTable model. It is a distributed column storage system built on HDFS. It is a typical key/value open source database. It is mainly used for massive structured data storage. Like hadoop, HBase mainly relies on horizontal expansion. , to increase computing and storage capacity by continuously adding cheap commodity servers. Logically, HBase stores data in tables, rows, and columns. It is suitable for scenarios that require real-time read and write and random access to ultra-large datasets.
insert image description here

HBase (Hadoop Database) is a sub-project of Apache's Hadoop project. It has high reliability, high performance, column-oriented, scalable, and distributed characteristics. Using HBase technology, a large-scale structured storage cluster can be built on a cheap PC Server. Unlike general relational databases, HBase is a database suitable for storage of unstructured data and semi-structured loose data (column-stored NoSQL database). Data on HBase is stored in blocks on HDFS in the form of StoreFile (HFile) binary streams. But HDFS does not know what hbase stores, it only stores files as binary files, that is to say, hbase storage data is transparent to the HDFS file system. HDFS does not support random modification, the query efficiency is low, and it is not friendly to small file support.

Also, HBase has a column -based rather than row-based schema. HBase is an open source implementation of Google Bigtable, similar to Google Bigtable using GFS as its file storage system, HBase using Hadoop HDFS as its file storage system; Google runs MapReduce to process Bigtable

Guess you like

Origin blog.csdn.net/ximenjianxue/article/details/122980959