Hive(五)——查询

Hive(五)——查询

  • SELECT… FROM语句,一般情况不再赘述,以下展示复合数据结构如何取值

    # 先提供几条数据与建表语句,方便随手练习
    John Doe!100000.0!Mary Smith$Todd Jones!Federal Taxes,0.2$State Taxes,0.05$Insurance,0.1!1 Michigan Ave.$Chicago$IL$60600
    Mary Smith!80000.0!Bill King!Federal Taxes,0.2$State Taxes,0.05$Insurance,0.1!100 Ontario St.$Chicago$IL$60601
    Todd Jones!70000.0!lili!Federal Taxes,0.15$State Taxes,0.03$Insurance,0.1!200 Chicago Ave.$Oak Park$IL$60700
    Bill King!60000.0!Huahua$Xixi!Federal Taxes,0.15$State Taxes,0.03$Insurance,0.1!300 Obscure Dr.$Obscuria$IL$60100
    
    CREATE TABLE employees (name STRING,salary FLOAT,subordinates ARRAY<STRING>,deductions MAP<STRING, FLOAT>,address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>) ROW FORMAT DELIMITED FIELDS TERMINATED BY '!' COLLECTION ITEMS TERMINATED BY '$' MAP KEYS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
    

    接下来就是查询复合数据类型的值

    # 先看下描述信息
    hive (default)> desc employees;
    name                    string
    salary                  float
    subordinates            array<string>
    deductions              map<string,float>
    address                 struct<street:string,city:string,state:string,zip:int>
    
    # 查询array类型数据
    hive (default)> select subordinates from employees;
    ["Mary Smith","Todd Jones"]
    ["Bill King"]
    ["lili"]
    ["Huahua","Xixi"]
    hive (default)> select subordinates[0] from employees;
    Mary Smith
    Bill King
    lili
    Huahua
    
    # 查询map类型数据
    hive (default)> select deductions from employees;
    {"Federal Taxes":0.2,"StateTaxes":0.05,"Insurance":0.1}
    {"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1}
    {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1}
    {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1}
    hive (default)> select deductions['State Taxes'] from employees;
    0.05
    0.05
    0.03
    0.03
    
    # 查询struct类型数据
    hive (default)> select address from employees;
    {"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}
    {"street":"100 Ontario St.","city":"Chicago","state":"IL","zip":60601}
    {"street":"200 Chicago Ave.","city":"Oak Park","state":"IL","zip":60700}
    {"street":"300 Obscure Dr.","city":"Obscuria","state":"IL","zip":60100}
    hive (default)> select address.city from employees;
    Chicago
    Chicago
    Oak Park
    Obscuria 
    
  • 使用正则表达式来指定列

    # 需要先设置属性才能使用正则
    set hive.support.quoted.identifiers=none;
    
    # 查询所有`s`开头的列的数据
    hive (default)> select name,`s.*` from employees;
    John Doe        100000.0        ["Mary Smith","Todd Jones"]
    Mary Smith      80000.0 ["Bill King"]
    Todd Jones      70000.0 ["lili"]
    Bill King       60000.0 ["Huahua","Xixi"]
    
  • 使用列值进行计算

    hive (default)> select upper(name),salary,deductions['Federal Taxes'],round(salary*(1-deductions['Federal Taxes'])) from employees;
    JOHN DOE        100000.0        0.2     80000.0
    MARY SMITH      80000.0 0.2     64000.0
    TODD JONES      70000.0 0.15    59500.0
    BILL KING       60000.0 0.15    51000.0
    

    当进行算术运算时,需要注意数据溢出或数据下溢问题,如果用户比较担心溢出和下溢,那么可以考虑在表模式中定义使用范围更广的数据类型。不过这样做的缺点是每个数据值会占用更多额外的内存。也可以使用特定的表达式将值转换为范围更广的数据类型。

  • 使用函数

    • 查看month 相关的函数

      show functions like ‘month

    • 查看 add_months 函数的用法

    ​ desc function add_months;

    • 查看 add_months 函数的详细说明并举例

    ​ desc function extended add_months;

猜你喜欢

转载自blog.csdn.net/weixin_45639174/article/details/115318376