NoSQL 之 HBase

Hbase全称Hadoop database(基于HDFS的数据库),该设计源于goole 的bigtable论文,hbase仿照bigtable设计(bigtable时google filesystem之上NoSQL数据库)基于HDFS之上构建一款数据库.(HDFS和Hbase关系|区别?)

CAP原则

CAP原则又称CAP定理，指的是在一个分布式系统中，Consistency（一致性）、 Availability（可用性）、Partition tolerance（分区容错性），三者不可兼得 .

NoSQL产品设计遵循CPA定论,目前常见的NoSQL产品有以下分类:key-value:Redis|Riak,doucemnt:MongoDB|CouchDB基于列存储:Hbase|Cassandra,基于图形关系存储:Neo4j,其中Redis/MongDB/HBase都属于CP类型.

什么基于列存储 HBase?

这个概念是和传统数据操作作对比,传统数据对数据操作的最小单位一行记录(若干可字段)

update from user set username='zhangsan' where id=1;
select username,pwd from user where id=1;

即使用户修改的是一个字段,但是对于RDBMS底层都是通过整行加载数据完成对数据的修改.

因此RDBMS在对单个`列`数据的处理方式上,性能并不高,因为系统会做一些无用IO操作.而面向列存储系统最小操作表的粒度是一个单元格`列`

Hbase环境搭建-单机

必须安装hadoop并且配置HADOOP_HOME,因为HBase会按照该配置读取hdfs位置信息,启动HDFS(略)
必须安装zookeeper,并且保证zookeeper正常启动(单机安装).

[root@CentOS ~]# tar -zxf zookeeper-3.4.6.tar.gz  -C /usr/ #解压
[root@CentOS ~]# cp /usr/zookeeper-3.4.6/conf/zoo_sample.cfg /usr/zookeeper-3.4.6/conf/zoo.cfg #复制并命名
[root@CentOS ~]# vi  /usr/zookeeper-3.4.6/conf/zoo.cfg
tickTime=2000
dataDir=/root/zkdata
clientPort=2181
[root@CentOS ~]# mkdir /root/zkdata #创建目录
[root@CentOS ~]# cd /usr/zookeeper-3.4.6/
[root@CentOS zookeeper-3.4.6]# ./bin/zkServer.sh start zoo.cfg #启动
JMX enabled by default
Using config: /usr/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@CentOS zookeeper-3.4.6]# ./bin/zkServer.sh status zoo.cfg
JMX enabled by default
Using config: /usr/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: standalone

安装配置HBASE

[root@CentOS ~]# tar -zxf hbase-1.2.4-bin.tar.gz -C /usr/ #解压hbase
[root@CentOS ~]# vi .bashrc  
HBASE_MANAGES_ZK=false
HBASE_HOME=/usr/hbase-1.2.4
HADOOP_HOME=/usr/hadoop-2.6.0
HADOOP_CLASSPATH=/root/mysql-connector-java-5.1.44.jar
JAVA_HOME=/usr/java/latest
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin
CLASSPATH=.
export JAVA_HOME
export PATH
export CLASSPATH
export HADOOP_HOME
export HADOOP_CLASSPATH
export HBASE_HOME
export HBASE_MANAGES_ZK
[root@CentOS ~]# source .bashrc #重新执行刚修改的初始化文件，使之立即生效
[root@CentOS ~]# vi /usr/hbase-1.2.4/conf/hbase-site.xml 
<configuration>
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://CentOS:9000/hbase</value>
    </property>
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>CentOS</value>
    </property>
    <property>
        <name>hbase.zookeeper.property.clientPort</name>
        <value>2181</value>
    </property>
</configuration>
[root@CentOS ~]# vi /usr/hbase-1.2.4/conf/regionservers #配置端口
CentOS

启动Hbase服务

[root@CentOS ~]# start|stop-hbase.sh #启动 停止
starting master, logging to /usr/hbase-1.2.4/logs/hbase-root-master-CentOS.out
CentOS: starting regionserver, logging to /usr/hbase-1.2.4/logs/hbase-root-regionserver-CentOS.out
[root@CentOS ~]# jps
2133 `HRegionServer`
2008 `HMaster`
1433 NameNode
1260 QuorumPeerMain
1708 SecondaryNameNode

访问:http://centos:16010这是hbase中hmaster进程提供的web-ui页面,通过该页面可以查看hbase信息

Shell脚本连接HBase

[root@CentOS ~]# hbase shell
hbase(main):001:0>

常用命令(了解)

hbase(main):006:0> status #状态
1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load

hbase(main):008:0> version #版本
1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017

命名空间操作-等同于MySQL数据库操作

`alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables`

hbase(main):015:0> create_namespace 'zpark',{'user'=>'zhangsan'}
0 row(s) in 0.0860 seconds

hbase(main):016:0> describe_namespace 'zpark'
DESCRIPTION                                                                                    
{NAME => 'zpark', user => 'zhangsan'}                                                          
1 row(s) in 0.0180 seconds

hbase(main):017:0> drop_namespace 'zpark' #只能删除空的namespace
0 row(s) in 0.0640 seconds

hbase(main):029:0> alter_namespace 'zpark',{METHOD => 'set','user'=>'lisi'}
0 row(s) in 0.0600 seconds

hbase(main):030:0> describe_namespace 'zpark' # 修改配置信息
DESCRIPTION                                                                                    
{NAME => 'zpark', user => 'lisi'}                                                              
1 row(s) in 0.0130 seconds

hbase(main):031:0> alter_namespace 'zpark',{METHOD => 'unset',NAME=>'user'} #删除属性
0 row(s) in 0.0250 seconds

hbase(main):032:0> describe_namespace 'zpark'
DESCRIPTION                                                                                    
{NAME => 'zpark'}                                                                              
1 row(s) in 0.0050 seconds

hbase(main):033:0> list_namespace #系统的表
NAMESPACE                                                                                      
default                                                                                        
hbase                                                                                          
zpark                                                                                          
3 row(s) in 0.0680 seconds

hbase(main):034:0> list_namespace_tables 'zpark' #自己创建的表
TABLE                                                                                       
t_user                                                                                     
1 row(s) in 0.0300 seconds

数据定义语言ddl(表)

`alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters`

hbase(main):037:0> create 'zpark:t_user','cf1','cf2' #创建表,最简方式
0 row(s) in 1.2820 seconds

=> Hbase::Table - zpark:t_order

hbase(main):038:0> describe 'zpark:t_user' # 查看建表详情
Table zpark:t_user is ENABLED                                                                  
zpark:t_user                                                                                   
COLUMN FAMILIES DESCRIPTION                                                                    
{NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS
 => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIO
NS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}               
{NAME => 'cf2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS
 => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIO
NS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}               
2 row(s) in 0.0630 seconds

hbase(main):040:0> disable 'zpark:t_user' # 删除表之前,必须先disable
0 row(s) in 2.2680 seconds

hbase(main):041:0> drop 'zpark:t_user'
0 row(s) in 1.2490 seconds


hbase(main):042:0> create 'zpark:t_user',{NAME=>'cf1',VERSIONS=>3},{NAME=>'cf2',TTL=>60}
0 row(s) in 1.2360 seconds

=> Hbase::Table - zpark:t_user

hbase(main):043:0> list #查看用户表,不包含系统表
TABLE                                                                                          
zpark:t_order                                                                                  
zpark:t_user                                                                                   
2 row(s) in 0.0140 seconds

=> ["zpark:t_order", "zpark:t_user"]

hbase(main):044:0> list_namespace_tables 'hbase' #查看namespace表
TABLE                                                                                          
meta                                                                                           
namespace                                                                                      
2 row(s) in 0.0080 seconds

hbase(main):050:0> t = get_table 'baizhi:t_user' #获取表引用
0 row(s) in 0.0000 seconds

更多请使用 help '命令'

基于数据的CRUD DML(数据管理语言)

`append, count, delete, deleteall, get, put, scan, truncate`

==put==

hbase(main):001:0> t = get_table 'zpark:t_user' #获取表引用
0 row(s) in 0.0310 seconds

=> Hbase::Table - zpark:t_user
hbase(main):002:0> t.put 1 ,'cf1:name','zhangsan'
0 row(s) in 0.3470 seconds
hbase(main):003:0> t.put 1 ,'cf1:age','18'
0 row(s) in 0.0140 seconds
hbase(main):004:0> t.put 1 ,'cf1:name','zs' #覆盖数据/修改
0 row(s) in 0.0210 seconds
hbase(main):026:0> put 'zpark:t_user',2,'cf1:name','ww'
0 row(s) in 0.0330 seconds

==get==

hbase(main):005:0> t.get 1
COLUMN                   CELL                                                               
 cf1:age                 timestamp=1547484317857, value=18                                   
 cf1:name                timestamp=1547484378850, value=zs                                     
2 row(s) in 0.0480 seconds
hbase(main):007:0> t.get 1 ,{COLUMNS=>'cf1:name',VERSIONS=>2}
COLUMN                   CELL                                                               
 cf1:name                timestamp=1547484378850, value=zs                                 
 cf1:name                timestamp=1547484297127, value=zhangsan 
 
hbase(main):011:0> t.get 1 ,{COLUMNS=>'cf1:name',TIMESTAMP=>1547484378850}
COLUMN                   CELL                                                               
 cf1:name                timestamp=1547484378850, value=zs                                    
1 row(s) in 0.0100 seconds

hbase(main):020:0> t.get 1 ,{COLUMNS=>'cf1:name',TIMERANGE=>[1547484297127,1547484605994]}
COLUMN                   CELL                                                               
 cf1:name                timestamp=1547484378850, value=zs
 
hbase(main):020:0> get 'zpark:t_user' ,1 ,{COLUMNS=>'cf1:name',TIMERANGE=>[1547484297127,1547484605994]}
COLUMN                   CELL

==delete/deleteall==

hbase(main):030:0> t.delete 1 ,'cf1:age'
0 row(s) in 0.0440 seconds
hbase(main):034:0> t.get 1,{VERSIONS=>3,COLUMNS=>'cf1'}
COLUMN                   CELL                                                               
 cf1:name                timestamp=1547484605994, value=zs1                                 
 cf1:name                timestamp=1547484378850, value=zs 
hbase(main):035:0> t.delete 1 ,'cf1:name',1547484605994 # 删除该版本以及该版本以前的所有数据
0 row(s) in 0.0200 seconds
hbase(main):036:0> t.get 1,{VERSIONS=>3,COLUMNS=>'cf1'}
COLUMN                   CELL                                                               
0 row(s) in 0.0290 seconds
hbase(main):037:0> t.deleteall 1 # 删除该rowkey对应的所有列
0 row(s) in 0.0100 seconds

==scan==

hbase(main):044:0> t.scan
ROW                      COLUMN+CELL
 1                       column=cf1:age, timestamp=1547485477592, value=18
 1                       column=cf1:name, timestamp=1547485432058, value=zs                 
 2                       column=cf1:name, timestamp=1547485439614, value=ls                 
 3                       column=cf1:name, timestamp=1547485445432, value=ww                 
 4                       column=cf1:name, timestamp=1547485462124, value=zl 
 hbase(main):054:0> scan 'zpark:t_user', {COLUMNS => ['cf1'],STARTROW => '1'}
ROW                      COLUMN+CELL   
 1                       column=cf1:age, timestamp=1547485477592, value=18 
 1                       column=cf1:name, timestamp=1547485432058, value=zs 
 2                       column=cf1:name, timestamp=1547485439614, value=ls
 3                       column=cf1:name, timestamp=1547485445432, value=ww 
 4                       column=cf1:name, timestamp=1547485462124, value=zl 
 hbase(main):055:0> scan 'zpark:t_user', {COLUMNS => ['cf1'],STARTROW => '1',LIMIT=>2}
ROW                      COLUMN+CELL             
 1                       column=cf1:age, timestamp=1547485477592, value=18  
 1                       column=cf1:name, timestamp=1547485432058, value=zs  
 2                       column=cf1:name, timestamp=1547485439614, value=ls
 
hbase(main):058:0> scan 'zpark:t_user', {COLUMNS=>['cf1'],STARTROW => '3',LIMIT=>3,REVERSED=>true}
ROW                      COLUMN+CELL                                                       
 3                       column=cf1:name, timestamp=1547485445432, value=ww              
 2                       column=cf1:name, timestamp=1547485439614, value=ls 
 1                       column=cf1:age, timestamp=1547485477592, value=18 
 1                       column=cf1:name, timestamp=1547485432058, value=zs

==count==

hbase(main):059:0> t.count
4 row(s) in 0.0560 seconds
=> 4

==append==

hbase(main):061:0> t.append 1,'cf1:name','110' #追加
0 row(s) in 0.0180 seconds
hbase(main):063:0> t.get 1
COLUMN                   CELL      
 cf1:age                 timestamp=1547485477592, value=18  
 cf1:name                timestamp=1547486003061, value=zs110

==truncate==

hbase(main):067:0> truncate 'zpark:t_user' #清除表中的数据
Truncating 'zpark:t_user' table (it may take a while):
 - Disabling table...
 - Truncating table...
0 row(s) in 3.8100 seconds
hbase(main):068:0> t.scan
ROW                      COLUMN+CELL   
0 row(s) in 0.1500 seconds

JAVA API操作HBase

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client</artifactId>
    <version>1.2.4</version>
</dependency>

*核心步骤

private Admin admin;//DDL
private Connection conn;
@Before
public void before() throws IOException {
    Configuration conf=new Configuration();
    
    conn= ConnectionFactory.createConnection(conf);
    admin=conn.getAdmin();
}

@After
public void after() throws IOException {
    admin.close();
    conn.close();
}

namespace常规操作

@Test
public void testCreateNameSpace() throws IOException {
    NamespaceDescriptor nd=NamespaceDescriptor.create("baizhi")
        .addConfiguration("user","wangwu") 
        .build();//建表
    admin.createNamespace(nd);
}
@Test
public void testModifyNameSpace() throws IOException {
    NamespaceDescriptor nd=NamespaceDescriptor.create("baizhi")
        .addConfiguration("aa","bb")
        .removeConfiguration("user")
        .build();//建表删除user
    admin.modifyNamespace(nd);
}
@Test
public void testDeleteNameSpace() throws IOException {
    admin.deleteNamespace("baizhi");//删除 baizhi
}

表常规操作

@Test
public void testCreateTable() throws IOException {
    TableName tname=TableName.valueOf("baizhi:t_user");
    HTableDescriptor td=new HTableDescriptor(tname);

    HColumnDescriptor cf1=new HColumnDescriptor("cf1");
    cf1.setMaxVersions(3);

    HColumnDescriptor cf2=new HColumnDescriptor("cf2");
    cf2.setTimeToLive(60);
    cf2.setInMemory(true);
    //添加列簇
    td.addFamily(cf1);
    td.addFamily(cf2);

    admin.createTable(td);
}
@Test
public void testDropTable() throws IOException {

    TableName tname=TableName.valueOf("baizhi:t_user");
    if(admin.tableExists(tname)){
        admin.disableTable(tname);
        admin.deleteTable(tname);
    }
}

CRUD

==put擅长更新单个记录== 添加 修改

TableName tname=TableName.valueOf("baizhi:t_user");
Table table = conn.getTable(tname);
Put put=new Put("1".getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),"zs2".getBytes());
put.addColumn("cf1".getBytes(),"age".getBytes(),"28".getBytes());
put.addColumn("cf1".getBytes(),"sex".getBytes(),"false".getBytes());
table.put(put);
table.close();

//批量插入或者更新
TableName tname=TableName.valueOf("baizhi:t_user");

BufferedMutator mb=conn.getBufferedMutator(tname);

Put put=new Put("2".getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),"lisi".getBytes());
put.addColumn("cf1".getBytes(),"age".getBytes(),"28".getBytes());
put.addColumn("cf1".getBytes(),"sex".getBytes(),"false".getBytes());

mb.mutate(put);
mb.close();

//delete
TableName tname=TableName.valueOf("baizhi:t_user");
Table table = conn.getTable(tname);
Delete delete=new Delete("2".getBytes());
//delete.addColumn("cf1".getBytes(),"name".getBytes());
table.delete(delete);//如果不设置上面column等价deleteall
//批量删除
TableName tname=TableName.valueOf("baizhi:t_user");
BufferedMutator mb=conn.getBufferedMutator(tname);
Delete delete=new Delete("1".getBytes());
mb.mutate(delete);
mb.close();


//GET=
TableName tname=TableName.valueOf("baizhi:t_user");
Table table = conn.getTable(tname);

Get get = new Get("1".getBytes());
Result result = table.get(get);

byte[] nameBytes = result.getValue("cf1".getBytes(), "name".getBytes());
byte[] ageBytes = result.getValue("cf1".getBytes(), "age".getBytes());
byte[] sexBytes = result.getValue("cf1".getBytes(), "sex".getBytes());

System.out.println(Bytes.toString(nameBytes));
System.out.println(Bytes.toString(ageBytes));

//获取多个版本
TableName tname=TableName.valueOf("baizhi:t_user");
Table table = conn.getTable(tname);

Get get = new Get("1".getBytes());
get.setMaxVersions(3);//获取最新的3个版本数据
get.setTimeStamp(1547487891548L);
get.addColumn("cf1".getBytes(),"name".getBytes());
Result result = table.get(get);

List<Cell> cells = result.getColumnCells("cf1".getBytes(), "name".getBytes());

for (Cell cell : cells) {
    byte[] rowkeyBytes=CellUtil.cloneRow(cell);
    byte[] cfBytes = CellUtil.cloneFamily(cell);
    byte[] qualifierBytes = CellUtil.cloneQualifier(cell);
    byte[] valueBytes = CellUtil.cloneValue(cell);
    long ts=cell.getTimestamp();

    System.out.println(Bytes.toString(rowkeyBytes)+
                       "->"+Bytes.toString(cfBytes)+":"+Bytes.toString(qualifierBytes)
                       +"\t"+Bytes.toString(valueBytes)
                       +",ts:"+ts);

//scan
TableName tname=TableName.valueOf("baizhi:t_user");
Table table = conn.getTable(tname);

Scan scan = new Scan();
Filter f1=new PrefixFilter("2".getBytes());
Filter f2=new PrefixFilter("1".getBytes());
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ONE,
                                       f1, f2);
scan.setFilter(filterList);

ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
    System.out.println("----------------");
    byte[] nameBytes = result.getValue("cf1".getBytes(), "name".getBytes());
    byte[] ageBytes = result.getValue("cf1".getBytes(), "age".getBytes());
    byte[] sexBytes = result.getValue("cf1".getBytes(), "sex".getBytes());

    System.out.println(Bytes.toString(nameBytes));
    System.out.println(Bytes.toString(ageBytes));
    System.out.println(Bytes.toString(sexBytes));
}
scanner.close();
table.close();

RowKey设计

量级: 数亿行  * 百万列 *  数千版本  TB|PB 

	----rowkey > cf > qualifier > timestamp
		...
region01
		...
	----rowkey > cf > qualifier > timestamp
		...
region02
		...
	----rowkey > cf > qualifier > timestamp
		...
region03
		...


不能使用随机数据作为RowKey取值,一般需要将查询的条件,作为Rowkey组成部分.

 电信/移动/联通  时间  采集信息 存储到Hbase,需要基于运营商和时间做区间检索

RowKey                      cf:qualifier

运营商信息:时间戳   采集信息

Map Reduce On Hbase

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.6.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.6.0</version>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-core</artifactId>
    <version>2.6.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
    <version>2.6.0</version>
</dependency>

<dependency>
    <groupId>log4j</groupId>
    <artifactId>log4j</artifactId>
    <version>1.2.17</version>
</dependency>
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client</artifactId>
    <version>1.2.4</version>
</dependency>
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-server</artifactId>
    <version>1.2.4</version>
</dependency>

Mapper

public static class UserMapper extends TableMapper<Text,DoubleWritable> {
    @Override
    protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {
        byte[] bytes = key.get();
        String company= Bytes.toString(bytes).split(":")[0];
        byte[] salaryBytes = value.getValue("cf1".getBytes(), "salary".getBytes());
        double salary=Bytes.toDouble(salaryBytes);
        context.write(new Text(company),new DoubleWritable(salary));
    }
}

Reducer

public static class UserReducer extends TableReducer<Text,DoubleWritable, NullWritable> {
    @Override
    protected void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException {
        double totalCount=0.0;
        int count=0;
        for (DoubleWritable value : values) {
            totalCount+=value.get();
            count++;
        }

        Put put=new Put(key.copyBytes());
        put.addColumn("cf1".getBytes(),"avgSalary".getBytes(),(totalCount/count+"").getBytes());

        context.write(null,put);
    }
}

任务提交

//1.封装job对象
Configuration conf=getConf();
conf.addResource("core-site.xml");
conf.addResource("hdfs-site.xml");
conf.addResource("yarn-site.xml");
conf.addResource("mapred-site.xml");
conf.set("hbase.zookeeper.quorum","CentOS");

conf.set(MRJobConfig.JAR,"file:///E:\\训练营大数据\\20190102\\MapreduceHbaseDemo\\target\\mapreducehbase-1.0-SNAPSHOT.jar");
Job job=Job.getInstance(conf);

//2.设置数据读入和写出格式
job.setInputFormatClass(TableInputFormat.class);
job.setOutputFormatClass(TableOutputFormat.class);

TableMapReduceUtil.initTableMapperJob(
    "baizhi:t_user",
    new Scan(),
    UserMapper.class,
    Text.class,
    DoubleWritable.class,
    job
        );
TableMapReduceUtil.initTableReducerJob(
    "baizhi:t_result",
    UserReducer.class,
    job);

//6.提交任务
//job.submit();
job.waitForCompletion(true);
return 0;

HBase单击般

NoSQL 之 HBase

Hbase全称Hadoop database(基于HDFS的数据库),该设计源于goole 的bigtable论文,hbase仿照bigtable设计(bigtable时google filesystem之上NoSQL数据库)基于HDFS之上构建一款数据库.(HDFS和Hbase关系|区别?)

CAP原则

CAP原则又称CAP定理，指的是在一个分布式系统中，Consistency（一致性）、 Availability（可用性）、Partition tolerance（分区容错性），三者不可兼得 .

NoSQL产品设计遵循CPA定论,目前常见的NoSQL产品有以下分类:key-value:Redis|Riak,doucemnt:MongoDB|CouchDB基于列存储:Hbase|Cassandra,基于图形关系存储:Neo4j,其中Redis/MongDB/HBase都属于CP类型.

什么基于列存储 HBase?

这个概念是和传统数据操作作对比,传统数据对数据操作的最小单位一行记录(若干可字段)

即使用户修改的是一个字段,但是对于RDBMS底层都是通过整行加载数据完成对数据的修改.

因此RDBMS在对单个列数据的处理方式上,性能并不高,因为系统会做一些无用IO操作.而面向列存储系统最小操作表的粒度是一个单元格列

Hbase环境搭建-单机

必须安装hadoop并且配置HADOOP_HOME,因为HBase会按照该配置读取hdfs位置信息,启动HDFS(略)

必须安装zookeeper,并且保证zookeeper正常启动(单机安装).

安装配置HBASE

启动Hbase服务

访问:http://centos:16010这是hbase中hmaster进程提供的web-ui页面,通过该页面可以查看hbase信息

Shell脚本连接HBase

常用命令(了解)

命名空间操作-等同于MySQL数据库操作

alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

数据定义语言ddl(表)

alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters

更多请使用 help '命令'

基于数据的CRUD DML(数据管理语言)

append, count, delete, deleteall, get, put, scan, truncate

==put==

==get==

==delete/deleteall==

==scan==

==count==

==append==

==truncate==

JAVA API操作HBase

*核心步骤

namespace常规操作

表常规操作

CRUD

RowKey设计

Map Reduce On Hbase

Mapper

Reducer

任务提交

参考:http://abloz.com/hbase/book.html

猜你喜欢

因此RDBMS在对单个`列`数据的处理方式上,性能并不高,因为系统会做一些无用IO操作.而面向列存储系统最小操作表的粒度是一个单元格`列`

必须安装hadoop并且配置`HADOOP_HOME`,因为HBase会按照该配置读取hdfs位置信息,启动HDFS(略)

访问:`http://centos:16010`这是hbase中hmaster进程提供的web-ui页面,通过该页面可以查看hbase信息

`alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables`

`alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters`

`append, count, delete, deleteall, get, put, scan, truncate`