Use of MACRO (macro) in HIVE

In the process of writing HQL, a lot of logic needs to be used repeatedly. At this time, we can use macros to refine this logic, which can optimize development efficiency and improve program readability (especially when parentheses are nested in many layers and case-when is nested in many layers). for example:

create temporary macro sayhello (x string) concat('hello,',x,'!');
select sayhello('程序员'); --输出:hello,程序员!

In the above code, first we define a macro named sayhello, the input parameter is a string x, and the output is the concatenation of x. If you need to say hello to HR later, just type sayhello('HR').

Obviously, we can think of a macro as a custom "function", and its development process is simpler than UDF.

Here are a few macros that I commonly use in my work:

1. Handling of null values

  1. empty string to NULL
create temporary macro empty2null (x string) if(trim(x) = '', null, x);

Usage scenario: When using coalesce or nvl, if the previous parameter is an empty string, the following parameters cannot be obtained. If written as follows

nvl(empty2null(a),empty2null(b))

Returns the value of b when a is an empty string, and returns NULL if b is an empty string or NULL.

In this example, we not only save the time of writing code, but also no longer need to spend energy to investigate whether a or b may be an empty string, as long as no brains write code in this way. Similarly, for numeric fields, we can write 0 to NULL macros.

  1. NULL to empty string
create temporary macro null2empty (x string) if(x is null, '', x);

Use scenario 1: When using concat to concatenate two fields, as long as one is NULL, the output is also NULL. At this time, if we want the output not to be NULL, we can convert NULL to an empty string. Likewise, there is no need to expend effort to investigate whether two fields are likely to be NULL.

Use scenario 2: Unified output, such as case-when the output of many branches contains both NULL and empty strings.

  1. Judging NULL and empty string
create temporary macro nn(x string) nvl(trim(x),'') = '';

Returns true if x is NULL or an empty string. Personally, I think this logic is still very common, so I wrote such a macro with a simple name, just tap n twice.

further:

create temporary macro nn2rand (x string) case when nn(x) then concat('hive',rand()) else x end;

As the name suggests, nn2rand converts NULL and empty strings into random strings. When encountering the data skew problem caused by "key=NULL or empty string", the key should be converted into a random string, so that this part of the records is evenly distributed to each reduce.

2. Calculation of relevant time

  1. first day of last month
create temporary macro firstDayLastMonth (x string) trunc(add_months(x,-1),'MM');

Just pass in CURRENT_DATE. The reason for writing such a macro is that using a name like firstDayLastMonth makes the program more readable.

  1. last day of last month
create temporary macro lastDayLastMonth (x string) last_day(add_months(x,-1));

Just pass in CURRENT_DATE. The reason is the same as above.

  1. Time difference
create temporary macro hourdiff (x string, y string) hour(x)-hour(y)+(datediff(x,y))*24;

Returns the difference in hours between two time points

  1. date handling
create temporary macro properdt (dt string) concat_ws('-',split(dt,'/')[0],lpad(split(dt,'/')[1],2,'0'),lpad(split(dt,'/')[2],2,'0'));

Its function is to change 2019/1/1 to 2019-01-01. Among them, 2019/1/1 is the common format of excel, and 2019-01-01 is the common format of hive table. If you need to upload local files to hdfs and query in hive, you can consider using it.

  1. time comparison
create temporary macro earliest (x string, y string) least(empty2null(x),empty2null(y));

Let time1 and time2 be two time fields, both of type string, and empty strings to indicate missing. The current requirement is to select the earlier of the two time points. If the minimum value is directly selected, then when time1 is an empty string, an empty string must be output (because the empty string is smaller than all strings), but if time2 is not an empty string at this time, obviously time2 should be taken as the result. At this time, you can use the above macro to convert the empty string to NULL and then take the minimum value.

3. Mathematical calculation

create temporary macro halfceil (x decimal) 
case 
    when x = floor(x) then x
    when x - floor(x) <= 0.5 then floor(x) + 0.5
    else ceil(x)
end;

Function: round up by 0.5. For example, 1.2 becomes 1.5, 1.7 becomes 2.0, and 1.5 and 2.0 remain the same. With macros, even long mathematical formulas can be implemented on a single line.

Guess you like

Origin blog.csdn.net/zmzdmx/article/details/114996433