总帖:CDH 6系列(CDH 6.0、CHD6.1等)安装和使用
1.官网关于update和delete的相关说明:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Update
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Delete
2.如果一个表要实现update和delete功能,该表就必须支持ACID,而支持ACID,就必须满足以下条件:
1、表的存储格式必须是ORC:STORED AS ORC
在执行的表中, 需要指定格式,其余格式目前赞不支持,如:parquet格式,目前只支持ORCFileformat和AcidOutputFormat
2、表必须进行分桶:CLUSTERED BY (col_name, col_name, ...) INTO num_buckets BUCKETS
3、Table property中参数transactional必须设定为True:TBLPROPERTIES('transactional'='true')
3.hive-site.xml的Hive服务高级配置:
必须适当设置这些配置参数才能在Hive中打开事务支持:
hive.support.concurrency – true
hive.enforce.bucketing – true(从Hive 2.0开始不需要 )
hive.exec.dynamic.partition.mode – nonstrict
hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.compactor.initiator.on – true (仅适用于Thrift Metastore服务的一个实例)
hive.compactor.worker.threads – Thrift Metastore服务中至少一个实例的正数
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>3</value>
</property>
4.hive-site.xml的Hive客户端高级配置:
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>3</value>
</property>
<property>
<name>hive.in.test</name>
<value>true</value>
</property>
5.重启并部署客户端配置
6.例子1
create table rimengshe.student2(id string,name string) clustered by(id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');
insert into rimengshe.student2(id,name) values('1','guzhi');
select * from rimengshe.student2;
update rimengshe.student2 set name='guzhipeng' where id='1';
delete from rimengshe.student2 where id='1';
7.例子2
create table rimengshe.student3(id string,name string,num double) clustered by(id) into 10 buckets stored as orc TBLPROPERTIES('transactional'='true');
insert into rimengshe.student3(id,name,num) values('1','guzhi',10.5);
select * from rimengshe.student3;
create table rimengshe.student4(id string,name string,num double) clustered by(id) into 10 buckets stored as orc TBLPROPERTIES('transactional'='true');
insert into rimengshe.student4(id,name,num) values('1','guzhipeng',6.5);
select * from rimengshe.student4;
select a.num+b.num from rimengshe.student3 a join rimengshe.student4 b on a.id=b.id;