总帖：CDH 6系列（CDH 6.0、CHD6.1等）安装和使用

1.官网关于update和delete的相关说明：
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Update
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Delete

2.如果一个表要实现update和delete功能，该表就必须支持ACID，而支持ACID，就必须满足以下条件：
   1、表的存储格式必须是ORC：STORED AS ORC
   在执行的表中, 需要指定格式，其余格式目前赞不支持，如：parquet格式，目前只支持ORCFileformat和AcidOutputFormat
   2、表必须进行分桶：CLUSTERED BY (col_name, col_name, ...) INTO num_buckets BUCKETS
   3、Table property中参数transactional必须设定为True：TBLPROPERTIES('transactional'='true')

3.hive-site.xml的Hive服务高级配置：
   必须适当设置这些配置参数才能在Hive中打开事务支持：
       hive.support.concurrency – true
       hive.enforce.bucketing – true（从Hive 2.0开始不需要）
       hive.exec.dynamic.partition.mode – nonstrict
       hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
       hive.compactor.initiator.on – true （仅适用于Thrift Metastore服务的一个实例）
       hive.compactor.worker.threads – Thrift Metastore服务中至少一个实例的正数

   <property>
       <name>hive.support.concurrency</name>
       <value>true</value>
   </property>
   <property>
       <name>hive.enforce.bucketing</name>
       <value>true</value>
   </property>
   <property>
       <name>hive.exec.dynamic.partition.mode</name>
       <value>nonstrict</value>
   </property>
   <property>
       <name>hive.txn.manager</name>
       <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
   </property>
   <property>
       <name>hive.compactor.initiator.on</name>
       <value>true</value>
   </property>
   <property>
       <name>hive.compactor.worker.threads</name>
       <value>3</value>
   </property>

4.hive-site.xml的Hive客户端高级配置：
   <property>
       <name>hive.support.concurrency</name>
       <value>true</value>
   </property>
   <property>
       <name>hive.enforce.bucketing</name>
       <value>true</value>
   </property>
   <property>
       <name>hive.exec.dynamic.partition.mode</name>
       <value>nonstrict</value>
   </property>
   <property>
       <name>hive.txn.manager</name>
       <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
   </property>
   <property>
       <name>hive.compactor.worker.threads</name>
       <value>3</value>
   </property>
   <property>
      <name>hive.in.test</name>
      <value>true</value>
   </property>

5.重启并部署客户端配置

6.例子1
   create table rimengshe.student2(id string,name string) clustered by(id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');
   insert into rimengshe.student2(id,name) values('1','guzhi');
   select * from rimengshe.student2;

update rimengshe.student2 set name='guzhipeng' where id='1';
delete from rimengshe.student2 where id='1';

7.例子2
   create table rimengshe.student3(id string,name string,num double) clustered by(id) into 10 buckets stored as orc TBLPROPERTIES('transactional'='true');
   insert into rimengshe.student3(id,name,num) values('1','guzhi',10.5);
   select * from rimengshe.student3;

   create table rimengshe.student4(id string,name string,num double) clustered by(id) into 10 buckets stored as orc TBLPROPERTIES('transactional'='true');
   insert into rimengshe.student4(id,name,num) values('1','guzhipeng',6.5);
   select * from rimengshe.student4;

   select a.num+b.num from rimengshe.student3 a join rimengshe.student4 b on a.id=b.id;

Hive实现update和delete

总帖：CDH 6系列（CDH 6.0、CHD6.1等）安装和使用

猜你喜欢