Hive实现update和delete

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/zimiao552147572/article/details/88062466

总帖:CDH 6系列(CDH 6.0、CHD6.1等)安装和使用

1.官网关于update和delete的相关说明:
    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Update
    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Delete

2.如果一个表要实现update和delete功能,该表就必须支持ACID,而支持ACID,就必须满足以下条件:
    1、表的存储格式必须是ORC:STORED AS ORC
       在执行的表中, 需要指定格式,其余格式目前赞不支持,如:parquet格式,目前只支持ORCFileformat和AcidOutputFormat
    2、表必须进行分桶:CLUSTERED BY (col_name, col_name, ...) INTO num_buckets BUCKETS 
    3、Table property中参数transactional必须设定为True:TBLPROPERTIES('transactional'='true')

3.hive-site.xml的Hive服务高级配置:
    必须适当设置这些配置参数才能在Hive中打开事务支持:
        hive.support.concurrency – true
        hive.enforce.bucketing – true(从Hive 2.0开始不需要  )
        hive.exec.dynamic.partition.mode – nonstrict
        hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
        hive.compactor.initiator.on – true (仅适用于Thrift Metastore服务的一个实例)
        hive.compactor.worker.threads – Thrift Metastore服务中至少一个实例的正数

    <property>
        <name>hive.support.concurrency</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.enforce.bucketing</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.exec.dynamic.partition.mode</name>
        <value>nonstrict</value>
    </property>
    <property>
        <name>hive.txn.manager</name>
        <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
    </property>
    <property>
        <name>hive.compactor.initiator.on</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.compactor.worker.threads</name>
        <value>3</value>
    </property>
 

4.hive-site.xml的Hive客户端高级配置:
    <property>
        <name>hive.support.concurrency</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.enforce.bucketing</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.exec.dynamic.partition.mode</name>
        <value>nonstrict</value>
    </property>
    <property>
        <name>hive.txn.manager</name>
        <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
    </property>
    <property>
        <name>hive.compactor.worker.threads</name>
        <value>3</value>
    </property>
    <property>
         <name>hive.in.test</name>
         <value>true</value>
    </property>
 
5.重启并部署客户端配置


6.例子1
    create table rimengshe.student2(id string,name string) clustered by(id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');
    insert into rimengshe.student2(id,name) values('1','guzhi');
    select * from rimengshe.student2;

    update rimengshe.student2 set name='guzhipeng' where id='1';
     delete from rimengshe.student2 where id='1';

7.例子2
    create table rimengshe.student3(id string,name string,num double) clustered by(id) into 10 buckets stored as orc TBLPROPERTIES('transactional'='true');
    insert into rimengshe.student3(id,name,num) values('1','guzhi',10.5);
    select * from rimengshe.student3;

    create table rimengshe.student4(id string,name string,num double) clustered by(id) into 10 buckets stored as orc TBLPROPERTIES('transactional'='true');
    insert into rimengshe.student4(id,name,num) values('1','guzhipeng',6.5);
    select * from rimengshe.student4;
 
    select a.num+b.num from rimengshe.student3 a join rimengshe.student4 b on a.id=b.id;
 

猜你喜欢

转载自blog.csdn.net/zimiao552147572/article/details/88062466