Common query functions for getting started with hive

1. Empty field assignment

  • NVL: Assign values ​​to data with a value of NULL. Its format is NVL (value, default_value). Its function is that if value is NULL, the NVL function returns the value of default_value, otherwise it returns the value of value, and if both parameters are NULL, it returns NULL.
    eg:
select comm,nvl(comm, -1) from emp;

Query the comm field. If the value is null, -1 is displayed.

2.CASE WHEN

  1. data preparation
name	dept_id	sex
悟空	A	男
大海	A	男
宋宋	B	男
凤姐	A	女
婷姐	B	女
婷婷	B	女

2. Demand
Find out how many men and women are in different sectors. The results are as follows:

A     2       1
B     1       2

3. Create local emp_sex.txt, import data
[atguigu @ hadoop102 datas] $ vi emp_sex.txt

悟空	A	男
大海	A	男
宋宋	B	男
凤姐	A	女
婷姐	B	女
婷婷	B	女

4. Create hive table and import data

create table emp_sex(
name string, 
dept_id string, 
sex string) 
row format delimited fields terminated by "\t";
load data local inpath '/opt/module/datas/emp_sex.txt' into table emp_sex;

5. Query data on demand

select 
  dept_id,
  sum(case sex when '男' then 1 else 0 end) male_count,
  sum(case sex when '女' then 1 else 0 end) female_count
from 
  emp_sex
group by
  dept_id;

3. Row to column

Related function description
  • CONCAT (string A / col, string B / col…): returns the result of connecting input strings, supports any number of input strings;
  • CONCAT_WS (separator, str1, str2, ...): It is a special form of
    CONCAT (). The separator between the remaining parameters of the first parameter. The separator can be the same string as the remaining parameters. If the separator is NULL, the return value will also be
    NULL. This function skips any NULL and empty strings after the separator parameter. The separator will be added between the connected strings;
  • COLLECT_SET (col): The function only accepts basic data types. Its main function is to deduplicate and summarize the value of a field to produce an array type field (COLLECT_LIST does not deduplicate and summarize)
Create local constellation.txt, import data
vi constellation.txt
孙悟空	白羊座	A
大海	     射手座	A
宋宋	     白羊座	B
猪八戒    白羊座	A
凤姐	     射手座	A
Create hive table and import data
create table person_info(
name string, 
constellation string, 
blood_type string) 
row format delimited fields terminated by "\t";
load data local inpath "/opt/module/datas/constellation.txt" into table person_info;

6. Query data on demand

select
    t1.base,
    concat_ws('|', collect_set(t1.name)) name
from
    (select
        name,
        concat(constellation, ",", blood_type) base
    from
        person_info) t1
group by
    t1.base;

Insert picture description here

Function description

EXPLODE (col): Split the complex array or map structure in a column of hive into multiple rows.
LATERAL VIEW
usage: LATERAL VIEW udtf (expression) tableAlias ​​AS columnAlias
Explanation: Used with split, explode and other UDTF, it can split a column of data into multiple rows of data, on this basis, the split data can be aggregated .

Create local movie.txt, import data
vi movie.txt
《疑犯追踪》	悬疑,动作,科幻,剧情
《Lie to me》	悬疑,警匪,动作,心理,剧情
《战狼2》	战争,动作,灾难
Create hive table and import data
create table movie_info(
    movie string, 
    category array<string>) 
row format delimited fields terminated by "\t"
collection items terminated by ",";
load data local inpath "/opt/module/datas/movie.txt" into table movie_info;
Query data on demand
select
    movie,
    category_name
from 
    movie_info lateral view explode(category) table_tmp as category_name;

It is actually the same as the following:

select
    m.movie,
    table_tmp.category_name
from 
    movie_info m
     lateral view explode(category) table_tmp as category_name;
Published 39 original articles · won praise 1 · views 4620

Guess you like

Origin blog.csdn.net/thetimelyrain/article/details/104172637