06-Hive表属性操作

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_29622761/article/details/51570661

大家好!砸门又又又又见面了。我再自我介绍一下哈,我长得比较帅,帅到哭的那种。
呵呵,开玩笑。这世界上,唯独美人和大数据不可辜负。好好学大数据技术,技多不压身。越学人就越帅,你说是吧?言归正传,今天要做的实验是Hive表属性操作。
表属性有啥?表名称、增加列,修改列呀!

修改表名alter table table_name rename to new_table_name;

修改列名alter table tablename change column c1 c2 int comment 'xxxxx' after severity;//可以把该列放到指定列的后面,或者使用‘first’放到第一位

增加列alter table tablename add columns(c1 string comment 'xxxx',c2 long comment 'yyyyyy')

1 砸门还是来实战一下,照着这个代码敲就可以练个七七八八:

hive> create table testchange(
    > name string,value string
    > );
OK
Time taken: 0.085 seconds
hive> alter table testchange rename to test;
OK
Time taken: 0.478 seconds
hive> 

来,还是要检查一下:

hive> desc test;
OK
name    string  
value   string  
Time taken: 0.381 seconds
hive> 

把test表增加两个列type,col

hive> alter table test add columns(type string,col int comment 'xielaoshi');
OK
Time taken: 0.13 seconds
hive> desc test;
OK
name    string  
value   string  
type    string  
col int xielaoshi
Time taken: 0.235 seconds
hive> 

还可以这样玩,把type列调到name后面:

hive> alter table test change column type type string after name; 
OK
Time taken: 0.161 seconds
hive> desc test;                                                  
OK
name    string  
type    string  
value   string  
col int xielaoshi
Time taken: 0.156 seconds
hive> 

还可以这样玩,把type放在第一列:

hive> alter table test change column type type string first;
OK
Time taken: 0.113 seconds
hive> desc test;                                            
OK
type    string  
name    string  
value   string  
col int xielaoshi
Time taken: 0.149 seconds
hive> desc formatted test;

好不好玩?好玩吧?其实学技术就是要自己把自己逗乐,就有趣了!
2 修改tblproperties

hive> alter table test set tblproperties('comment'='xxxxx');
OK
Time taken: 0.122 seconds
hive> desc formatted test;                                  
OK
# col_name              data_type               comment             

type                    string                  None                
name                    string                  None                
value                   string                  None                
col                     int                     xielaoshi           

# Detailed Table Information         
Database:               default                  
Owner:                  root                     
CreateTime:             Thu Jun 02 06:20:43 PDT 2016     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://hadoop1:9000/user/hive/warehouse/test     
Table Type:             MANAGED_TABLE            
Table Parameters:        
    comment                 xxxxx               
    last_modified_by        root                
    last_modified_time      1464874178          
    transient_lastDdlTime   1464874178          

# Storage Information        
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe   
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:         
    serialization.format    1                   
Time taken: 0.14 seconds
hive> 

其实这就改了表的描述信息。so easy!
3 修改serdeproperties 。serde是序列化和反序列化。究竟什么是序列化什么是发序列化?
额外补充:序列化就是指将结构化对象 (实例) 转化为字节流 (字符数组)。反序列化就是将字节流转向结构化对象的逆过程。 于是,如果想把“活的”对象存储到文件,存储这串字节即可,如果想把“活的”对象发送到远程主机,发送这串字节即可,需要对象的时候,做一下反序列化,就能将对象“复活”了。
将对象序列化存储到文件,术语又叫“持久化”。将对象序列化发送到远程计算机,术语又叫“数据通信”。
其实不难理解吧,让我想想有什么好的比喻来说明?其实就好比你打电话给你爸妈,要把声音转换成电信号(序列化),到了你爸妈那里又转换成声音(反序列化)。
因为Hadoop在集群之间进行通讯或者RPC调用的时候,需要序列化,而且要求序列化要快,且体积要小,占用带宽要小。所以必须理解Hadoop的序列化机制。
Hive底层是hadoop,序列化对hadoop很重要。有多重要?
Hadoop中各个节点的通信是通过远程调用(RPC)实现的:进程通信和永久存储。,所以序列化和反序列化在分布式数据处理领域经常出现然而,那么 RPC序列化要求具有以下特点:

    1. 紧凑:紧凑的格式能让我们能充分利用网络带宽,而带宽是数据中心最稀缺的资源;
    2. 快速:进程通信形成了分布式系统的骨架,所以需要尽量减少序列化和反序列化的性能开销,这是基本的;
    3. 可扩展:协议为了满足新的需求变化,所以控制客户端和服务器过程中,需要直接引进相应的协议,这些是新协议,原序列化方式能支持新的协议报文;
    4. 互操作:能支持不同语言写的客户端和服务端进行交互;

好,不啰嗦了。serdeproperties的修改针对有分区有无分区是不同的。
无分区:

alter table table_name set serdeproperties('field.delim'='\t');

有分区是啥情况呢?

alter table test partition(dt=' xxxxx') set serdeproperties('field.delim'='\t');

为什么要这么修改?其实道理很简单,有些数据就是格式不符合你的统计分析,删除又是不可能,所以只能修改。
呵呵,还是来点实际的比较好:

hive> create table city(
    > time string,
    > country string,
    > province string,
    > city string)
    > row format delimited fields terminated by '#' 
    > lines terminated by '\n'
    > stored as textfile;
OK
Time taken: 0.111 seconds
hive> 

加载点数据玩玩:

hive> load data local inpath '/usr/host/city' into table city;
Copying data from file:/usr/host/city
Copying file: file:/usr/host/city
Loading data to table default.city
OK
Time taken: 0.426 seconds
hive> select * from city;
OK
20130829234535  china   henan   nanyang NULL    NULL    NULL
20130829234536  china   henan   xinyang NULL    NULL    NULL
20130829234537  china   beijing beijing NULL    NULL    NULL
20130829234538  china   jiang   susuzhou    NULL    NULL    NULL
20130829234539  china   hubei   wuhan   NULL    NULL    NULL
20130829234540  china   sandong weizhi  NULL    NULL    NULL
20130829234541  china   hebei   shijiazhuang    NULL    NULL    NULL
20130829234542  china   neimeng eeduosi NULL    NULL    NULL
20130829234543  china   beijing beijing NULL    NULL    NULL
20130829234544  china   jilin   jilin   NULL    NULL    NULL
Time taken: 0.105 seconds
hive>  

是不是不知道我city里面的数据是啥?可以给你看看哟!

[root@hadoop1 data3]# cd /usr/host/
[root@hadoop1 host]# cat city 
20130829234535  china   henan   nanyang
20130829234536  china   henan   xinyang
20130829234537  china   beijing beijing
20130829234538  china   jiang   susuzhou
20130829234539  china   hubei   wuhan
20130829234540  china   sandong weizhi
20130829234541  china   hebei   shijiazhuang
20130829234542  china   neimeng eeduosi
20130829234543  china   beijing beijing
20130829234544  china   jilin   jilin
[root@hadoop1 host]# 

神奇吧,但是问题来了。下面的数据整整齐齐,上面的数据怎么那么多null?看官,不着急,我先喝口水。
好,其实我下面的数据字段之间是用‘\t’制表符分隔的,我上面的字段是用‘#’分隔的。那么hive会找‘#’,没找到就把所有的数据当做第一个字段,后面三个字段用null补上。明白了吗?没明白,加我微信xiehuadong1,语音交流。
不信,你瞧瞧:

hive> desc formatted city;                                      
OK
# col_name              data_type               comment             

time                    string                  None                
country                 string                  None                
province                string                  None                
city                    string                  None                

# Detailed Table Information         
Database:               default                  
Owner:                  root                     
CreateTime:             Thu Jun 02 06:43:57 PDT 2016     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://hadoop1:9000/user/hive/warehouse/city     
Table Type:             MANAGED_TABLE            
Table Parameters:        
    transient_lastDdlTime   1464875117          

# Storage Information        
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe   
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:         
    field.delim             #                   
    line.delim              \n                  
    serialization.format    #                   
Time taken: 0.132 seconds
hive> 

“field.delim # ” 看到了吧,那我们来修改一下:把#分隔符改为\t

hive> alter table city set serdeproperties('field.delim'='\t');
OK
Time taken: 0.118 seconds
hive> select * from city;
OK
20130829234535  china   henan   nanyang
20130829234536  china   henan   xinyang
20130829234537  china   beijing beijing
20130829234538  china   jiang   susuzhou
20130829234539  china   hubei   wuhan
20130829234540  china   sandong weizhi
20130829234541  china   hebei   shijiazhuang
20130829234542  china   neimeng eeduosi
20130829234543  china   beijing beijing
20130829234544  china   jilin   jilin
Time taken: 0.125 seconds
hive> 

接下来我们再创建一个分区表city1;

hive> create table city1(
    > time string,
    > country string,
    > province string,
    > city string) partitioned by (dt string)
    > row format delimited fields terminated by '#' 
    > lines terminated by '\n'
    > stored as textfile;
OK
Time taken: 0.046 seconds
hive> 

加载数据,查询,查看描述信息:

hive> load data local inpath '/usr/host/city' into table city1 partition(dt='20160519');
Copying data from file:/usr/host/city
Copying file: file:/usr/host/city
Loading data to table default.city1 partition (dt=20160519)
OK
Time taken: 0.189 seconds
hive> select * from city1;
OK
20130829234535  china   henan   nanyang NULL    NULL    NULL    20160519
20130829234536  china   henan   xinyang NULL    NULL    NULL    20160519
20130829234537  china   beijing beijing NULL    NULL    NULL    20160519
20130829234538  china   jiang   susuzhou    NULL    NULL    NULL    20160519
20130829234539  china   hubei   wuhan   NULL    NULL    NULL    20160519
20130829234540  china   sandong weizhi  NULL    NULL    NULL    20160519
20130829234541  china   hebei   shijiazhuang    NULL    NULL    NULL    20160519
20130829234542  china   neimeng eeduosi NULL    NULL    NULL    20160519
20130829234543  china   beijing beijing NULL    NULL    NULL    20160519
20130829234544  china   jilin   jilin   NULL    NULL    NULL    20160519
Time taken: 0.083 seconds
hive> desc formatted city1;
OK
# col_name              data_type               comment             

time                    string                  None                
country                 string                  None                
province                string                  None                
city                    string                  None                

# Partition Information      
# col_name              data_type               comment             

dt                      string                  None                

# Detailed Table Information         
Database:               default                  
Owner:                  root                     
CreateTime:             Thu Jun 02 06:55:56 PDT 2016     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://hadoop1:9000/user/hive/warehouse/city1    
Table Type:             MANAGED_TABLE            
Table Parameters:        
    transient_lastDdlTime   1464875756          

# Storage Information        
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe   
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:         
    field.delim             #                   
    line.delim              \n                  
    serialization.format    #                   
Time taken: 0.076 seconds
hive> 

把#分隔符改为\t

hive> alter table city1 set serdeproperties('field.delim'='\t');
OK
Time taken: 0.103 seconds
hive> 

另一种修改方式

hive> alter table city1 partition(dt='20160519') set serdeproperties('field.delim'='\t');
OK
Time taken: 0.141 seconds
hive> select * from city1;                                                               
OK
20130829234535  china   henan   nanyang 20160519
20130829234536  china   henan   xinyang 20160519
20130829234537  china   beijing beijing 20160519
20130829234538  china   jiang   susuzhou    20160519
20130829234539  china   hubei   wuhan   20160519
20130829234540  china   sandong weizhi  20160519
20130829234541  china   hebei   shijiazhuang    20160519
20130829234542  china   neimeng eeduosi 20160519
20130829234543  china   beijing beijing 20160519
20130829234544  china   jilin   jilin   20160519
Time taken: 0.086 seconds
hive> 

多玩玩:

hive> select * from city1 where dt='20160519';
OK
20130829234535  china   henan   nanyang 20160519
20130829234536  china   henan   xinyang 20160519
20130829234537  china   beijing beijing 20160519
20130829234538  china   jiang   susuzhou    20160519
20130829234539  china   hubei   wuhan   20160519
20130829234540  china   sandong weizhi  20160519
20130829234541  china   hebei   shijiazhuang    20160519
20130829234542  china   neimeng eeduosi 20160519
20130829234543  china   beijing beijing 20160519
20130829234544  china   jilin   jilin   20160519
Time taken: 0.579 seconds
hive> 

3 修改location
文章有点长,耐心看呀,最后一点点了。

alter table table_name[partition()] set location 'path'
alter table table_name set TBLPROPERTIES('EXTERNAL'='TRUE');//内部表转外部表
alter table table_name set TBLPROPERTIES('EXTERNAL'='FALSE');//外部表转内部表

例如:

[root@hadoop1 host]# hadoop fs -mkdir /location
[root@hadoop1 host]# hadoop fs -put /usr/host/city /location
[root@hadoop1 host]# hadoop fs -ls /location
Found 1 items
-rw-r--r--   1 root supergroup        359 2016-06-02 07:04 /location/city
[root@hadoop1 host]# 

接下来我要把city的location从hdfs://hadoop1:9000/user/hive/warehouse/city 变为hdfs://hadoop1:9000/location

hive> alter table city set location 'hdfs://hadoop1:9000/location';
OK
Time taken: 0.112 seconds

不信你看desc formatted city;

hive> select * from city;
OK
20130829234535  china   henan   nanyang
20130829234536  china   henan   xinyang
20130829234537  china   beijing beijing
20130829234538  china   jiang   susuzhou
20130829234539  china   hubei   wuhan
20130829234540  china   sandong weizhi
20130829234541  china   hebei   shijiazhuang
20130829234542  china   neimeng eeduosi
20130829234543  china   beijing beijing
20130829234544  china   jilin   jilin
Time taken: 0.112 seconds
hive> desc formatted city;                                         
OK
# col_name              data_type               comment             

time                    string                  None                
country                 string                  None                
province                string                  None                
city                    string                  None                

# Detailed Table Information         
Database:               default                  
Owner:                  root                     
CreateTime:             Thu Jun 02 06:43:57 PDT 2016     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://hadoop1:9000/location     
Table Type:             MANAGED_TABLE            
Table Parameters:        
    last_modified_by        root                
    last_modified_time      1464876384          
    transient_lastDdlTime   1464876384          

# Storage Information        
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe   
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:         
    field.delim             \t                  
    line.delim              \n                  
    serialization.format    #                   
Time taken: 0.078 seconds
hive> 

上面的location变了吧。看看删除这个表时什么情况

hive> drop table city;
OK
Time taken: 0.122 seconds
hive> 

你把city 删了后,元数据的指定位置的数据就全部都没有了,因为city是一个MANAGED_TABLE 内部表。
外部表和内部表有啥区别呢?欢迎看官留言!

4 再重新创建一下city;

hive> create table city(
    > time string,
    > country string,
    > province string,
    > city string)
    > row format delimited fields terminated by '#' 
    > lines terminated by '\n'
    > stored as textfile;
OK
Time taken: 1.042 seconds
hive>

当然咯,这时一个内部表。我要把内部表改成外部表:把city表修改成外表

hive> 
    > alter table city set tblproperties('EXTERNAL'='TRUE');
OK
Time taken: 0.184 seconds
hive> desc formatted city;
OK
# col_name              data_type               comment             

time                    string                  None                
country                 string                  None                
province                string                  None                
city                    string                  None                

# Detailed Table Information         
Database:               default                  
Owner:                  root                     
CreateTime:             Thu Jun 02 07:15:20 PDT 2016     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://hadoop1:9000/user/hive/warehouse/city     
Table Type:             EXTERNAL_TABLE           
Table Parameters:        
    EXTERNAL                TRUE                
    last_modified_by        root                
    last_modified_time      1464876990          
    transient_lastDdlTime   1464876990          

# Storage Information        
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe   
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:         
    field.delim             #                   
    line.delim              \n                  
    serialization.format    #                   
Time taken: 0.296 seconds
hive> 

那你说怎么把外部表改成内部表呀?
执行代码:

alter table city set tblproperties('EXTERNAL'='FALSE');
desc formatted city;

好了,今天就到这里,有点累了,休息一下。如果你看到此文,想进一步学习或者和我沟通,加我微信公众号:名字:五十年后
see you again!

猜你喜欢

转载自blog.csdn.net/qq_29622761/article/details/51570661