大数据 Hbase

HBase

一:hbase的理解

HBase:是一个基于hadoop的分布式,可扩展,巨大存储仓库,当用户需要对海量的数据进行实时的失效性随机的读写操作,用户可以使用hbase设计一张巨大的表,该表的规模能达到十亿行*数百万列,并且可以运行在商用的硬件集群之上。
hbase:是一个基于hadoop的分布式,可扩展,版本化的巨大的非关系型数据库。

hdfs和hbase的区别:hbase是构建在hdfs之上的一个数据库服务,能够使用户通过hbase数据库服务间接操作hdfs,使用户对hdfs上的数据实现crud操作(细粒度的操作)。

二:传统关系型数据库存在的索引问题

传统的关系型数据库索引特点:
按照搜索条件快速定位当前记录
在数据库中加载所有属性
映射返回需要的字段
而加载所有属性这一行为是多余的,这一部分io的读取对于系统而言是一种浪费,不支持稀疏存储。
解决之道:列的共现性问题—》进行分表
列存储:
​ 将io特性相似的列归为一个簇,列簇为最小的加载单位
​ hbase中所有的记录都是按照一定顺序排列的:rowkey,列簇,列名,时间戳(默认返回最新的时间戳)
​ 支持稀疏存储,null不存
​ 缺点:值中包含rowkey、列信息、时间戳。
rowkey:等同于关系型数据库中的主键id
列簇:将io特性相似的列归为一个簇,hbase底层会以列簇作为单位进行索引。
列:列簇,列名,时间戳构成
时间戳:记录hbase中数据的版本,系统自动插入数据时间

三:hbase的安装

1.HDFS基本环境(存储)

1,安装JDK,配置环境变量JAVA_HOME

[root@CentOS ~]# rpm -ivh jdk-8u171-linux-x64.rpm 
Preparing...                          ################################# [100%]
Updating / installing...
   1:jdk1.8-2000:1.8.0_171-fcs        ################################# [100%]
Unpacking JAR files...
        tools.jar...
        plugin.jar...
        javaws.jar...
        deploy.jar...
        rt.jar...
        jsse.jar...
        charsets.jar...
        localedata.jar...
[root@CentOS ~]# vi .bashrc

JAVA_HOME=/usr/java/latest
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin
export JAVA_HOME
export CLASSPATH
export PATH  

[root@CentOS ~]# source .bashrc 
[root@CentOS ~]# jps
1933 Jps

2,关闭防火墙

[root@CentOS ~]# systemctl stop firewalld # 关闭 服务
[root@CentOS ~]# systemctl disable firewalld # 关闭开机自启动
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@CentOS ~]# firewall-cmd --state
not running

3,配置主机名和IP映射关系

[root@CentOS ~]# cat /etc/hostname 
CentOS
[root@CentOS ~]# vi /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.186.150 CentOS

4,配置SSH免密码登录

[root@CentOS ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:6yYiypvclJAZLU2WHvzakxv6uNpsqpwk8kzsjLv3yJA root@CentOS
The key's randomart image is:
+---[RSA 2048]----+
|  .o.            |
|  =+             |
| o.oo            |
|  =. .           |
| +  o . S        |
| o...=   .       |
|E.oo. + .        |
|BXX+o....        |
|B#%O+o o.        |
+----[SHA256]-----+
[root@CentOS ~]# ssh-copy-id CentOS
[root@CentOS ~]# ssh CentOS
Last failed login: Mon Jan  6 14:30:49 CST 2020 from centos on ssh:notty
There was 1 failed login attempt since the last successful login.
Last login: Mon Jan  6 14:20:27 2020 from 192.168.186.1

5,上传Hadoop安装包,并解压到/usr目录

[root@CentOS ~]# tar -zxf  hadoop-2.9.2.tar.gz -C /usr/

6,配置HADOOP_HOME环境变量

[root@CentOS ~]# vi .bashrc
HADOOP_HOME=/usr/hadoop-2.9.2
JAVA_HOME=/usr/java/latest
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME
export CLASSPATH
export PATH
export HADOOP_HOME
[root@CentOS ~]# source .bashrc    
[root@CentOS ~]# hadoop classpath #打印Hadoop的类路径
/usr/hadoop-2.9.2/etc/hadoop:/usr/hadoop-2.9.2/share/hadoop/common/lib/*:/usr/hadoop-2.9.2/share/hadoop/common/*:/usr/hadoop-2.9.2/share/hadoop/hdfs:/usr/hadoop-2.9.2/share/hadoop/hdfs/lib/*:/usr/hadoop-2.9.2/share/hadoop/hdfs/*:/usr/hadoop-2.9.2/share/hadoop/yarn:/usr/hadoop-2.9.2/share/hadoop/yarn/lib/*:/usr/hadoop-2.9.2/share/hadoop/yarn/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/*:/usr/hadoop-2.9.2/contrib/capacity-scheduler/*.jar

7,修改core-site.xml

[root@CentOS ~]# vi /usr/hadoop-2.9.2/etc/hadoop/core-site.xml
<!--nn访问入口-->
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://CentOS:9000</value>
</property>
<!--hdfs工作基础目录-->
<property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/hadoop-2.9.2/hadoop-${user.name}</value>
</property>

8,修改hdfs-site.xml

[root@CentOS ~]# vi /usr/hadoop-2.9.2/etc/hadoop/hdfs-site.xml 
<!--block副本因子-->
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>
<!--配置Sencondary namenode所在物理主机-->
<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>CentOS:50090</value>
</property>
<!--设置datanode最大文件操作数-->
<property>
        <name>dfs.datanode.max.xcievers</name>
        <value>4096</value>
</property>
<!--设置datanode并行处理能力-->
<property>
        <name>dfs.datanode.handler.count</name>
        <value>6</value>
</property>

9,修改slaves

[root@CentOS ~]# vi /usr/hadoop-2.9.2/etc/hadoop/slaves 
CentOS

10,格式化NameNode,生成fsimage

[root@CentOS ~]# hdfs namenode -format
[root@CentOS ~]# yum install -y tree
[root@CentOS ~]# tree /usr/hadoop-2.9.2/hadoop-root/
/usr/hadoop-2.9.2/hadoop-root/
└── dfs
    └── name
        └── current
            ├── fsimage_0000000000000000000
            ├── fsimage_0000000000000000000.md5
            ├── seen_txid
            └── VERSION

3 directories, 4 files

11,启动HDFS服务

[root@CentOS ~]# start-dfs.sh 

2.Zookeeper安装(协调)

1,上传zookeeper的安装包,并解压在/usr目录下

[root@CentOS ~]# tar -zxf zookeeper-3.4.12.tar.gz -C /usr/

2,配置Zookepeer的zoo.cfg

[root@CentOS ~]# tar -zxf zookeeper-3.4.12.tar.gz -C /usr/
[root@CentOS ~]# cd /usr/zookeeper-3.4.12/
[root@CentOS zookeeper-3.4.12]# cp conf/zoo_sample.cfg conf/zoo.cfg
[root@CentOS zookeeper-3.4.12]# vi conf/zoo.cfg 
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/root/zkdata
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

3,创建zookeeper的数据目录

[root@CentOS ~]# mkdir /root/zkdata

4,启动zookeeper服务

[root@CentOS ~]# cd /usr/zookeeper-3.4.12/
[root@CentOS zookeeper-3.4.12]# ./bin/zkServer.sh start zoo.cfg
ZooKeeper JMX enabled by default
Using config: /usr/zookeeper-3.4.12/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@CentOS zookeeper-3.4.12]# ./bin/zkServer.sh status zoo.cfg
ZooKeeper JMX enabled by default
Using config: /usr/zookeeper-3.4.12/bin/../conf/zoo.cfg
Mode: standalone

3.Hbase配置与安装(数据库服务)

1,上传Hbase安装包,并解压到/usr目录下

[root@CentOS ~]# tar -zxf hbase-1.2.4-bin.tar.gz -C /usr/

2,配置Hbase环境变量HBASE_HOME

[root@CentOS ~]# vi .bashrc 
HBASE_HOME=/usr/hbase-1.2.4
HADOOP_HOME=/usr/hadoop-2.9.2
JAVA_HOME=/usr/java/latest
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin
export JAVA_HOME
export CLASSPATH
export PATH
export HADOOP_HOME
export HBASE_HOME

[root@CentOS ~]# source .bashrc 
[root@CentOS ~]# hbase classpath # 测试Hbase是否识别Hadoop
/usr/hbase-1.2.4/conf:/usr/java/latest/lib/tools.jar:/usr/hbase-1.2.4:/usr/hbase-1.2.4/lib/activation-1.1.jar:/usr/hbase-1.2.4/lib/aopalliance-1.0.jar:/usr/hbase-1.2.4/lib/apacheds-i18n-2.0.0-M15.jar:/usr/hbase-1.2.4/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/hbase-1.2.4/lib/api-asn1-api-1.0.0-M20.jar:/usr/hbase-1.2.4/lib/api-util-1.0.0-M20.jar:/usr/hbase-1.2.4/lib/asm-3.1.jar:/usr/hbase-1.2.4/lib/avro-
...
1.7.4.jar:/usr/hbase-1.2.4/lib/commons-beanutils-1.7.0.jar:/usr/hbase-1.2.4/lib/commons-
2.9.2/share/hadoop/yarn/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/*:/usr/hadoop-2.9.2/contrib/capacity-scheduler/*.jar

3,配置hbase-site.xml

[root@CentOS ~]# cd /usr/hbase-1.2.4/
[root@CentOS hbase-1.2.4]# vi conf/hbase-site.xml
<property>
    <name>hbase.rootdir</name>
    <value>hdfs://CentOS:9000/hbase</value>
</property>
<property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
</property>
<property>
    <name>hbase.zookeeper.quorum</name>
    <value>CentOS</value>
</property>
<property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
</property>

4,修改hbase-env.sh,将HBASE_MANAGES_ZK修改为false

[root@CentOS ~]# cd /usr/hbase-1.2.4/
[root@CentOS hbase-1.2.4]# grep -i HBASE_MANAGES_ZK conf/hbase-env.sh 
# export HBASE_MANAGES_ZK=true
[root@CentOS hbase-1.2.4]# vi conf/hbase-env.sh 
export HBASE_MANAGES_ZK=false
[root@CentOS hbase-1.2.4]# grep -i HBASE_MANAGES_ZK conf/hbase-env.sh 
export HBASE_MANAGES_ZK=false

export HBASE_MANAGES_ZK=false告知Hbase,使用外部Zookeeper

5,启动Hbase

[root@CentOS hbase-1.2.4]# ./bin/start-hbase.sh 
starting master, logging to /usr/hbase-1.2.4/logs/hbase-root-master-CentOS.out
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
starting regionserver, logging to /usr/hbase-1.2.4/logs/hbase-root-1-regionserver-CentOS.out
[root@CentOS hbase-1.2.4]# jps
3090 NameNode
5027 HMaster
3188 DataNode
5158 HRegionServer
3354 SecondaryNameNode
5274 Jps
3949 QuorumPeerMain

6,验证Hbase安装是否成功

  • WebUI验证 http://192.168.186.150:16010/

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IZeHpqqR-1578489016282)(C:/Users/Administrator/Desktop/Hbase NoSQL数据库/assets/1578294500020.png)]

  • Hbase shell验证(靠谱)
[root@CentOS hbase-1.2.4]# ./bin/hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017

hbase(main):001:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load

hbase(main):002:0> version
1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017

hbase(main):003:0> 
  • 链接HDFS查看

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-R5Zs6s0T-1578489016290)(C:/Users/Administrator/Desktop/Hbase NoSQL数据库/assets/1578294578829.png)]

 4.安装总结

环境搭建:
​ 安装jdk,配置环境变量 /etc/profile
​ 关闭防火墙 systemctl stop/disable firewalld
​ 配置主机名和主机名映射关系
​ ssh免密登录
​ 安装hadoop并配置
​ 格式化 hdfs -namenode format
​ 启动hdfs start-dfs.sh
​ 安装zookeeper并配置
​ 安装hbase
安装检测:
​ 登录ip:16010
​ hbase shell
​ namenode:50070

四:shell命令 (重点)

1.基本命令

打开base shell

获取帮助: help help ‘get’

查看服务器状态:status

查看版本信息:verson

2.namespace数据库

​ alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

创建namespace :create_namespace ‘baizhi’ , {‘user’=>‘xiaoming’}

查看namespace详情:describe_namespace ‘baizhi’

修改namespace:alter_namespace ‘baizhi’, {METHOD => ‘set’, ‘user’ => ‘aaa’}

查看所有namespace:list_namespace

查看namespace中所有的表:list_namespace_tables ‘baizhi’

删除namespace:drop_namespace ‘baizhi’ //表的个数必须为空

3.表的操作

查看所有表(用户表):list

创建表:create ‘baizhi:t_user’ , {NAME=>‘cf1’,VERSIONS=>3,BLOCKCACHE => true}

VERSIONS:保留数据版本,默认值1 TTL:列簇下列存活时间,默认是FOREVER BLOCKCACHE:是否开启缓存,用于加快读.IN_MEMORY:设置是否将列簇下所有数据加载内存中,加速读写,默认值false,BLOOMFILTER:配置布隆过滤器(一种数据文件过滤机制),默认值ROW,可选值两个ROW|ROWCOL,如果修改为ROWCOL系统需要额外开销存储列信息作为过滤文件的索引.

查看表的详情:desc ‘baizhi’

禁用表:disable_all ‘baizhi:t_user’

启用表:enabel_all ‘baizhi:t_user’

截断表(清空表):truncate ‘baizhi:t_user’

删除表:drop ‘baizhi:t_user’

4.DML操作

put操作(添加,修改):put ‘baizhi:t_user’ , ‘001’ , ‘cf1:name’ , ‘xiaoming’

get操作(查询):

​ get ‘baizhi:t_user’ , ‘001’ , {COLUMN=>‘cf1’ , VERSIONS=>3 , TIMESTAMP=>1553961198084}

​ get ‘baizhi:t_user’ , ‘001’ , {COLUMN=>‘cf1’,TIMERANGE=[1553961198084,1553961219306],VERSIONS=>3}

delete操作(删除):delete ‘baizhi:t_user’ , ‘001’ , ‘cf1:age’

scan操作:scan ‘baizhi:t_user’

五.java操作hbase

1.导jar包

<dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>2.9.2</version>
    </dependency>

    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-client</artifactId>
      <version>1.2.4</version>
    </dependency>

2.链接hbase并提供admin

    private Admin admin;
    private Connection conn;
    @Before
    public void before() throws IOException {
    
    
        Configuration conf= HBaseConfiguration.create();
        conf.set(HConstants.ZOOKEEPER_QUORUM,"hbase");
        conn= ConnectionFactory.createConnection(conf);
        admin=conn.getAdmin();
    }
    @After
    public void after() throws IOException {
    
    
        admin.close();
        conn.close();
    }

3.操作namespace

创建一个namespace(数据库)

    @Test
    public void testCreateNamespace() throws IOException {
    
    
        NamespaceDescriptor nd=NamespaceDescriptor.create("zpark")
                .addConfiguration("user","xiaoming")
                .build();
        admin.createNamespace(nd);
    }

查看所有数据库

    @Test
    public void testListNamespace() throws IOException {
    
    
        NamespaceDescriptor[] namespaceDescriptors = admin.listNamespaceDescriptors();
        for (NamespaceDescriptor nd : namespaceDescriptors) {
    
    
            System.out.println(nd.getName());
            System.out.println(nd.getConfiguration());
        }
    }

删除数据库

    @Test
    public void testDeleteNamespace() throws IOException {
    
    
        admin.deleteNamespace("zpark");
    }

4.操作表table

查看所有用户表

    @Test
    public void testListTable() throws IOException {
    
    
        HTableDescriptor[] hTableDescriptors = admin.listTables();
        for (HTableDescriptor hTableDescriptor : hTableDescriptors) {
    
    
            System.out.println(hTableDescriptor);
        }
    }

查看此namespace下的所有表

    @Test
    public void testListTableByNamespace() throws IOException {
    
    
        HTableDescriptor[] baizhis = admin.listTableDescriptorsByNamespace("baizhi");
        for (HTableDescriptor baizhi : baizhis) {
    
    
            System.out.println(baizhi);
        }
    }

创建一个表

//    建表
    @Test
    public void testCreateTable() throws IOException {
    
    
//      要创建哪个表
        TableName tname=TableName.valueOf("baizhi:t_user");
//      表描述
        HTableDescriptor htabledescriptor=new HTableDescriptor(tname);
//      表的列簇信息
        HColumnDescriptor cf1=new HColumnDescriptor("cf1");
        cf1.setMaxVersions(3);
        cf1.setInMemory(true);
        cf1.setBlockCacheEnabled(true);
        HColumnDescriptor cf2=new HColumnDescriptor("cf2");
        cf2.setTimeToLive(300);
        cf2.setBloomFilterType(BloomType.ROW);

        htabledescriptor.addFamily(cf1);
        htabledescriptor.addFamily(cf2);
//      使用admin创建表
        admin.createTable(htabledescriptor);
    }

删除表(前提:禁用表)

    @Test
    public void testDeleteTable() throws IOException {
    
    
        TableName tname=TableName.valueOf("baizhi:t_user");
        if(!admin.isTableDisabled(tname)){
    
    
            admin.disableTable(tname);
        }
        admin.deleteTable(tname);
    }

5.操作CRUD

put添加操作

带缓存,好用

    @Test
    public void testPut() throws IOException {
    
    
        BufferedMutator bufferedMutator=conn.getBufferedMutator(TableName.valueOf("baizhi:t_user"));

        for (int i=0;i<100;i++){
    
    
            DecimalFormat decimalFormat = new DecimalFormat("000");
            String rowKey=decimalFormat.format(i);
            Put put = new Put(rowKey.getBytes());
            put.addColumn("cf2".getBytes(),"name".getBytes(),("user"+rowKey).getBytes());

            bufferedMutator.mutate(put);
            if(i%50==0){
    
    
                bufferedMutator.flush();
            }
        }

        bufferedMutator.close();
    }

get的两种操作

第一种:根据条件查询

    @Test
    public void testGet01() throws IOException {
    
    
        Table table = conn.getTable(TableName.valueOf("baizhi:t_user"));
//      获取想要查询的rowkey
        String rowKey="000";

//      构建查询条件
        Get get=new Get(rowKey.getBytes());
        get.setMaxVersions(3);
//      指定时间戳
//        get.setTimeStamp(1578330481699L);
        Filter filter=new QualifierFilter(CompareFilter.CompareOp.EQUAL,new SubstringComparator("me"));
        get.setFilter(filter);

//      遍历查询结果
        Result result = table.get(get);
        CellScanner cellScanner = result.cellScanner();
        while(cellScanner.advance()){
    
    
            Cell cell = cellScanner.current();
            String family = Bytes.toString(CellUtil.cloneFamily(cell));
            String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
            String value=Bytes.toString(CellUtil.cloneValue(cell));
            Long version=cell.getTimestamp();
            String key=Bytes.toString(cell.getRow());
            System.out.println(key+"\t"+family+"\t"+qualifier+"\t"+value+"\t"+version);
        }
        table.close();
    }

第二种:简单遍历

    @Test
    public void testGet02() throws IOException {
    
    
        Table table = conn.getTable(TableName.valueOf("baizhi:t_user"));
//        想要查询的rowkey
        String rowKey="002";
        Get get=new Get(rowKey.getBytes());
//        在这里构建查询条件
        Result result = table.get(get);
//        处理查询结果
        String key = Bytes.toString(result.getRow());
        String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes()));
        String age = Bytes.toString(result.getValue("cf1".getBytes(), "age".getBytes()));
        String salary = Bytes.toString(result.getValue("cf1".getBytes(), "salary".getBytes()));
        String depa = Bytes.toString(result.getValue("cf1".getBytes(), "depa".getBytes()));
        System.out.println(key+"\t"+name+","+age+","+salary+","+depa);

        table.close();
    }

删除表中的数据

    @Test
    public void testDeleteAll() throws IOException {
    
    
        Table table = conn.getTable(TableName.valueOf("baizhi:t_user"));

        String rowKey="000";
        Delete delete=new Delete(rowKey.getBytes());
        table.delete(delete);

        table.close();
    }

删除表中的某个单元格

    @Test
    public void testDeleteOneCell() throws IOException {
    
    
        Table table = conn.getTable(TableName.valueOf("baizhi:t_user"));
        String rowKey="000";
        Delete delete=new Delete(rowKey.getBytes());
        delete.addColumn("cf1".getBytes(),"name".getBytes(),1578330481699L);
        table.delete(delete);
        table.close();
    }

scan浏览

    @Test
    public void testScan() throws IOException {
    
    
        Table table = conn.getTable(TableName.valueOf("baizhi:t_user"));

        Scan scan=new Scan();
//        scan.addFamily("cf1".getBytes());        指定要检索的列
//        scan.setStartRow("010".getBytes());      指定要检索的rowkey
//        scan.setStopRow("020".getBytes());
        Filter filter1=new RowFilter(CompareFilter.CompareOp.LESS,new BinaryComparator("010".getBytes()));
        Filter filter2=new RowFilter(CompareFilter.CompareOp.GREATER,new BinaryComparator("005".getBytes()));
        Filter list=new FilterList(FilterList.Operator.MUST_PASS_ALL,filter1,filter2);
        scan.setFilter(list);
//        处理查询结果
        ResultScanner resultScanner = table.getScanner(scan);
        Iterator<Result> resultIterator = resultScanner.iterator();
        while(resultIterator.hasNext()){
    
    
            Result result = resultIterator.next();
            String key = Bytes.toString(result.getRow());
            String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes()));
            String age = Bytes.toString(result.getValue("cf1".getBytes(), "age".getBytes()));
            String salary = Bytes.toString(result.getValue("cf1".getBytes(), "salary".getBytes()));
            String depa = Bytes.toString(result.getValue("cf1".getBytes(), "depa".getBytes()));
            System.out.println(key+"\t"+name+","+age+","+salary+","+depa);
        }

        table.close();
    }

六.hbase整合mepreduce

1.导jar包

    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>2.9.2</version>
    </dependency>

    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-hdfs</artifactId>
      <version>2.9.2</version>
    </dependency>

    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
      <version>2.9.2</version>
    </dependency>

    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-mapreduce-client-core</artifactId>
      <version>2.9.2</version>
    </dependency>
    <!--Hbase依赖-->
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-client</artifactId>
      <version>1.2.4</version>
    </dependency>

    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-server</artifactId>
      <version>1.2.4</version>
    </dependency>

2.job作业

public class MapreduceSalary extends Configured implements Tool {
    
    
    @Override
    public int run(String[] strings) throws Exception {
    
    
        //创建job对象
        Configuration conf=getConf();
        conf.set(HConstants.ZOOKEEPER_QUORUM,"hbase");
        Job job=Job.getInstance(conf);
        job.setJarByClass(MapreduceSalary.class);
        //设置数据的输入输出格式
        //设置数据的读入和写出路径
        TableMapReduceUtil.initTableMapperJob(
                "baizhi:t_user",
                new Scan(),
                UserMapper.class,
                Text.class,
                DoubleWritable.class,
                job
        );
        TableMapReduceUtil.initTableReducerJob(
                "baizhi:t_result",
                UserReducer.class,
                job
        );
        //设置map和reduce实现
        //map和reduce泛型说明

        //提交job作业
        job.waitForCompletion(true);
        return 0;
    }
    public static void main(String[] args) throws Exception {
    
    
        ToolRunner.run(new MapreduceSalary(),args);
    }
}

3.map实现

public class UserMapper extends TableMapper<Text, DoubleWritable> {
    
    
    @Override
    protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {
    
    
        String depa = Bytes.toString(value.getValue("cf1".getBytes(), "depa".getBytes()));
        String salary = Bytes.toString(value.getValue("cf1".getBytes(), "salary".getBytes()));
        context.write(new Text(depa),new DoubleWritable(Double.parseDouble(salary)));
    }
}

4.reduce实现

public class UserReducer extends TableReducer<Text,DoubleWritable,Text> {
    
    
    @Override
    protected void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException {
    
    
        Double totals=0.0;
        for (DoubleWritable value : values) {
    
    
            totals+=value.get();
        }
//        通过put往hbase中输出数据
        Put put = new Put(key.toString().getBytes());
        put.addColumn("cf1".getBytes(),"sum".getBytes(),(totals+"").getBytes());
        context.write(null,put);
    }
}

七.高可用的hbase集群搭建

在这里插入图片描述

1.基础配置

1, 保证所有物理主机的时钟同步,否则集群搭建失败

[root@CentOSX ~]# yum install -y ntp -y
[root@CentOSX ~]# ntpdate time.apple.com
[root@CentOSA ~]# clock -w

由于Hbase服务器之间需要通过心跳确定服务器是否在正常运行,所以这里在搭建的物理主机的时候一定要确保所有的物理主机的时钟是同步的。

2、为了保证服务器间能够正常通信,通常需要配置

  • 主机名和IP映射关系
[root@CentOSX ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.186.152 CentOSA
192.168.186.153 CentOSB
192.168.186.154 CentOSC
  • 关闭防火墙
[root@CentOSX ~]# systemctl stop firewalld
[root@CentOSX ~]# systemctl disable firewalld
[root@CentOSX ~]# firewall-cmd --state
not running

3、所有的物理主机安装JDK

[root@CentOSX ~]# rpm -ivh jdk-8u171-linux-x64.rpm
[root@CentOSX ~]# vi .bashrc
JAVA_HOME=/usr/java/latest/
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin
export JAVA_HOME
export CLASSPATH
export PATH
[root@CentOSX ~]# source .bashrc

2.Zookeeper集群

1、上传zookeeper的安装包,并解压在/usr目录下

[root@CentOX ~]# tar -zxf zookeeper-3.4.12.tar.gz -C /usr/

2、配置Zookepeer的zoo.cfg

[root@CentOSX ~]# tar -zxf zookeeper-3.4.12.tar.gz -C /usr/
[root@CentOSX ~]# cd /usr/zookeeper-3.4.12/
[root@CentOSX zookeeper-3.4.12]# cp conf/zoo_sample.cfg conf/zoo.cfg
[root@CentOSX zookeeper-3.4.12]# vi conf/zoo.cfg 
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/root/zkdata
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=CentOSA:2888:3888
server.2=CentOSB:2888:3888
server.3=CentOSC:2888:3888

3、创建zookeeper的数据目录

[root@CentOSX ~]# mkdir /root/zkdata

4、在CentOSA/B/C的zookeeper的datadir分别创建myid文件

[root@CentOSA ~]# echo 1 > /root/zkdata/myid
[root@CentOSB ~]# echo 2 > /root/zkdata/myid
[root@CentOSC ~]# echo 3 > /root/zkdata/myid

5、启动zookeeper服务

[root@CentOSX ~]# cd /usr/zookeeper-3.4.12/
[root@CentOSX zookeeper-3.4.12]# ./bin/zkServer.sh start zoo.cfg
ZooKeeper JMX enabled by default
Using config: /usr/zookeeper-3.4.12/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@CentOSX zookeeper-3.4.12]# ./bin/zkServer.sh status zoo.cfg
ZooKeeper JMX enabled by default
Using config: /usr/zookeeper-3.4.12/bin/../conf/zoo.cfg
Mode: leader|folower

3.HDFS-HA

1、解压Hadoop安装包到/usr目录下,并且需要配置HADOOP_HOME

[root@CentOSX ~]# tar -zxf hadoop-2.9.2.tar.gz -C /usr/
[root@CentOSX ~]# vi .bashrc
HADOOP_HOME=/usr/hadoop-2.9.2
JAVA_HOME=/usr/java/latest/
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME
export CLASSPATH
export PATH
export HADOOP_HOME
[root@CentOSX ~]# source .bashrc
[root@CentOSX ~]# hadoop classpath
/usr/hadoop-2.9.2/etc/hadoop:/usr/hadoop-2.9.2/share/hadoop/common/lib/*:/usr/hadoop-2.9.2/share/hadoop/common/*:/usr/hadoop-2.9.2/share/hadoop/hdfs:/usr/hadoop-2.9.2/share/hadoop/hdfs/lib/*:/usr/hadoop-2.9.2/share/hadoop/hdfs/*:/usr/hadoop-2.9.2/share/hadoop/yarn:/usr/hadoop-2.9.2/share/hadoop/yarn/lib/*:/usr/hadoop-2.9.2/share/hadoop/yarn/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/*:/usr/hadoop-2.9.2/contrib/capacity-scheduler/*.jar

2、配置core-site.xml

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://mycluster</value>
</property>

<property>
  <name>hadoop.tmp.dir</name>
  <value>/usr/hadoop-2.9.2/hadoop-${user.name}</value>
</property>

<property>
	<name>ha.zookeeper.quorum</name>
  <value>CentOSA:2181,CentOSB:2181,CentOSC:2181</value>
</property>

3、配置hdfs-site.xml

<!--指定hdfs的nameservice为ns,需要和core-site.xml中的保持一致 -->
<property>
    <name>dfs.nameservices</name>
    <value>mycluster</value>
</property>
<!-- ns下面有两个NameNode,分别是nn1,nn2 -->
<property>
    <name>dfs.ha.namenodes.mycluster</name>
    <value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
    <name>dfs.namenode.rpc-address.mycluster.nn1</name>
    <value>CentOSA:9000</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
    <name>dfs.namenode.rpc-address.mycluster.nn2</name>
    <value>CentOSB:9000</value>
</property>

<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://CentOSA:8485;CentOSB:8485;CentOSC:8485/mycluster</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/usr/hadoop-2.9.2/hadoop-journaldata</value>
</property>
<!-- 开启NameNode故障时自动切换 -->
<property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
    <name>dfs.client.failover.proxy.provider.mycluster</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制,如果ssh是默认22端口,value直接写sshfence即可 -->
<property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
</property>
<!-- 使用隔离机制时需要ssh免登陆 -->
<property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/root/.ssh/id_rsa</value>
</property>

4、配置三台主机ssh免密码登录

[root@CentOSX ~]# ssh-keygen -t rsa
[root@CentOSX ~]# ssh-copy-id CentOSA
[root@CentOSX ~]# ssh-copy-id CentOSB
[root@CentOSX ~]# ssh-copy-id CentOSC

5、初始化HDFS-HA

[root@CentOSX ~]# hadoop-daemon.sh start journalnode
[root@CentOSA ~]# hdfs namenode -format
[root@CentOSA ~]# hadoop-daemon.sh start namenode
[root@CentOSB ~]# hdfs namenode -bootstrapStandBy
[root@CentOSB ~]# hadoop-daemon.sh start namenode
[root@CentOSA|B ~]# hdfs zkfc -formatZK
[root@CentOSA ~]# hadoop-daemon.sh start zkfc
[root@CentOSB ~]# hadoop-daemon.sh start zkfc
[root@CentOSX ~]# hadoop-daemon.sh start datanode

注意CentOS7,需要额外安装yum install psmisc -y 才可以实现自动故障转移

4.HBase-HA

1、 解压并配置HBase

[root@CentOSX ~]# tar -zxf hbase-1.2.4-bin.tar.gz -C /usr/

2,配置Hbase环境变量HBASE_HOME

[root@CentOSX ~]# vi .bashrc 
HBASE_HOME=/usr/hbase-1.2.4
HADOOP_HOME=/usr/hadoop-2.9.2
JAVA_HOME=/usr/java/latest
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin
export JAVA_HOME
export CLASSPATH
export PATH
export HADOOP_HOME
export HBASE_HOME

[root@CentOSX ~]# source .bashrc 
[root@CentOSX ~]# hbase classpath # 测试Hbase是否识别Hadoop
/usr/hbase-1.2.4/conf:/usr/java/latest/lib/tools.jar:/usr/hbase-1.2.4:/usr/hbase-1.2.4/lib/activation-1.1.jar:/usr/hbase-1.2.4/lib/aopalliance-1.0.jar:/usr/hbase-1.2.4/lib/apacheds-i18n-2.0.0-M15.jar:/usr/hbase-1.2.4/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/hbase-1.2.4/lib/api-asn1-api-1.0.0-M20.jar:/usr/hbase-1.2.4/lib/api-util-1.0.0-M20.jar:/usr/hbase-1.2.4/lib/asm-3.1.jar:/usr/hbase-1.2.4/lib/avro-
...
1.7.4.jar:/usr/hbase-1.2.4/lib/commons-beanutils-1.7.0.jar:/usr/hbase-1.2.4/lib/commons-
2.9.2/share/hadoop/yarn/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/*:/usr/hadoop-2.9.2/contrib/capacity-scheduler/*.jar

3,配置hbase-site.xml

[root@CentOSX ~]# cd /usr/hbase-1.2.4/
[root@CentOSX hbase-1.2.4]# vi conf/hbase-site.xml
<property>
    <name>hbase.rootdir</name>
    <value>hdfs://mycluster/hbase</value>
</property>
<property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
</property>
<property>
    <name>hbase.zookeeper.quorum</name>
    <value>CentOSA,CentOSB,CentOSC</value>
</property>
<property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
</property>

4,修改hbase-env.sh,将HBASE_MANAGES_ZK修改为false

[root@CentOSX ~]# cd /usr/hbase-1.2.4/
[root@CentOSX hbase-1.2.4]# grep -i HBASE_MANAGES_ZK conf/hbase-env.sh 
# export HBASE_MANAGES_ZK=true
[root@CentOSX hbase-1.2.4]# vi conf/hbase-env.sh 
export HBASE_MANAGES_ZK=false
[root@CentOSX hbase-1.2.4]# grep -i HBASE_MANAGES_ZK conf/hbase-env.sh 
export HBASE_MANAGES_ZK=false

export HBASE_MANAGES_ZK=false告知Hbase,使用外部Zookeeper

5, 修改RegionServers

CentOSA
CentOSB
CentOSC

6,启动Hbase

[root@CentOSX hbase-1.2.4]# ./bin/hbase-daemon.sh start master
[root@CentOSX hbase-1.2.4]# ./bin/hbase-daemon.sh start regionserver

6,验证Hbase安装是否成功

  • WebUI验证 http://192.168.186.152:16010/

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JwTYAwZ6-1578489016296)(C:/Users/Administrator/Desktop/Hbase NoSQL数据库/assets/1578477928839.png)]

  • Hbase shell验证(靠谱)
[root@CentOSB hbase-1.2.4]# ./bin/hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017

hbase(main):001:0> status
1 active master, 2 backup masters, 3 servers, 0 dead, 0.6667 average load

在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/origin_cx/article/details/103898359