Hbase acquaintance

Brief introduction

Data Model Relational database typical application Advantage Disadvantaged
key-value Redis Cache Quick Search Lack of structured data storage
Column family Cassandra,Hbase Distributed file systems, large-scale data storage Ease of distributed expansion Limited functionality
document Mongo,CouchDB Easy to use Poor scalability
Map Neo4j Social network Correlation algorithm using the structure of FIG. Difficult to expand

NoSQL for the classification, and Hbase Cassandra is a database class, family data types are listed.

About hbae and contrast can facie cassandra why the domestic epidemic hbase, but more with foreign cassandra? , Not repeat them here.

Introduction nouns

Tables, rows which are consistent with the relational database

Column family

Column group by definition is a combination of columns, wide-column such data types are implemented in accordance with BigTable model, which is a sparse multi-dimensional texture mapping. The actual storage, data storage column family is together, rather than as a relational database, there is a line together. So column family is defined in advance.

key-value, wide-column, json several types of data comparison NoSQL Overview - turn from Mongo NoSQL and Cassandra

region

region is the range of composition range partition, a row key is set. region is automatically divided. Normal size is 1GB-2GB, exceeds the configured size, will be split.

Deployment Architecture

Hbase deployment architecture is more complex. For a distributed database cluster architecture generally have three roles: a routing node, the node configuration information, fragmented data node.
Some of these database features are integrated into a single node, so expansion is relatively simple, single-point less. If split into different nodes, then deploy them more trouble, expansion, then more trouble, might need to go to every part of the expansion, the benefits are segregation of duties and will not cause failure of the entire node because the coupling. The following is an HBase cluster deployment architecture

Hbase Master

Hbase是AP型分布式数据库,Master-Slave模式。Master负责管理所有的RegsionServer,也就是上面所说的配置信息节点这个角色。
记录了数据块HRegions属于哪个Region Server。当RegionServer增加或者下线时,需要进行HRegion的重新分配。一般为了可用性,Master节点个数要大于1,避免单点故障。

Region Server
Region Server负责数据的读写,数据存放在内存中,持续化需要和HDFS文件系统进行I/O交互。HBase是列族数据库,列的数据是存放在一起的,不同的行按照row key分布,存储在不同的Region Server中。

一般来说,扩容主要是扩容Region Server,因为主要是Region Server负责数据的读写。

Zookeeper
管理HMaster的信息

HDFS DataNode

数据的存储与备份。将数据存储在HDFS的一个显而易见的好处时,当集群Region Server发生变化时,增加或者减少时,不需要在节点间进行数据的复制,这大大减少了节点的上下线时间,和I/O消耗。

分片

Hbase的分片策略很简单,就是根据rowkey来分片,每个Region Server负责一组rowkey.

数据存储与维护

数据存储和Cassandra类似,先写log和内存,内存memstore也是LSM树,然后在flush到磁盘中,HFile,存储在HDFS中。

当HFile超过一定大小后,进行数据的分离。

读写分析

读操作

读操作一般在Hbase里面叫3跳,涉及到Hbase集群的3个角色。

Meta table
HRegion的metadata信息都存储在.META表中,Region增加减少,这个信息都会更改。

Root table
Root table是用来记录META表信息的,存储在ZK中。

Hbase的读一般需要三跳

非常繁琐,所以路由信息一般cache到client,减少client与Hbase各个节点之间的交互。

写操作

没什么复杂的,和cassandra类似,不再赘述

总结

Hbase的集群部署架构模式和Mongo类似,多角色方式。所以读取数据的3跳也比较类似。单节点写入的话和Cassandra类似。

参考

https://www.iteblog.com/archives/2516.html

Guess you like

Origin www.cnblogs.com/stoneFang/p/11985440.html