HiveQL DQL10—虚拟列


Hive在0.8.0提供了虚拟列的功能,虚拟列是一种特殊的函数,目前有2个可用的虚拟列:

  • INPUT__FILE__NAME:显示mapper task的输入文件名
  • BLOCK__OFFSET__INSIDE__FILE:显示当前全局文件的位置,或者当前block在文件的偏移位置(如果文件是压缩的)。

示例1

> SELECT INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE as OFFSIDE from employee_partitioned;
+----------------------------------------------------+----------+
|                 input__file__name                  | offside  |
+----------------------------------------------------+----------+
| hdfs://ns001/tmp/hive/employee_partitioned/year=2012/month=11/000000_0 | 0        |
| hdfs://ns001/tmp/hive/employee_partitioned/year=2018/month=9/000000_0 | 0        |
| hdfs://ns001/tmp/hive/employee_partitioned/year=2018/month=9/000000_0 | 63       |
| hdfs://ns001/tmp/hive/employee_partitioned/year=2018/month=9/000000_0 | 116      |
| hdfs://ns001/tmp/hive/employee_partitioned/year=2018/month=9/000000_0 | 177      |
+----------------------------------------------------+----------+

示例2

> select * from employee_partitioned where BLOCK__OFFSET__INSIDE__FILE > 120;
+----------------------------+----------------------------------+----------------------------------+------------------------------------+------------------------------------+----------------------------+-----------------------------+
| employee_partitioned.name  | employee_partitioned.work_place  | employee_partitioned.gender_age  | employee_partitioned.skills_score  | employee_partitioned.depart_title  | employee_partitioned.year  | employee_partitioned.month  |
+----------------------------+----------------------------------+----------------------------------+------------------------------------+------------------------------------+----------------------------+-----------------------------+
| Lucy                       | ["Vancouver"]                    | {"gender":"Female","age":57}     | {"Sales":89,"HR":94}               | {"Sales":["Lead"]}                 | 2018                       | 9                           |
+----------------------------+----------------------------------+----------------------------------+------------------------------------+------------------------------------+----------------------------+-----------------------------+

示例3

> select  count(INPUT__FILE__NAME) from employee_partitioned;
+------+
| _c0  |
+------+
| 5    |
+------+

参考

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns
书籍 Apache Hive Essentials Second Edition (by Dayong Du) Chapter 5

发布了57 篇原创文章 · 获赞 3 · 访问量 1万+

猜你喜欢

转载自blog.csdn.net/CPP_MAYIBO/article/details/104057255