Use of Hive special functions

Use of Hive special functions

with as

In Hive, WITH AS is a usage of subquery to define a temporary expression at the beginning of the query. Its grammatical structure is as follows:

WITH [表达式名称] AS (
	子查询表达式
)

In this structure, [expression name] is the name used to refer to the temporary expression result, and the subquery expression is a valid SELECT statement that returns the result set as a temporary table.

Case (there is a table named orders, containing two columns: order number and order amount. We can use WITH AS to create a temporary expression to calculate the total amount of each order, and then use it in subsequent queries):

WITH order_totals AS (
	SELECT order_id, SUM(order_amount) AS total_amount
	FROM orders
	GROUP BY order_id
)
SELECT order_id, total_amount
FROM order_totals
WHERE total_amount > 1000;

In the above example, we first define a temporary expression called order_totals, which uses a subquery to calculate the total amount of each order. Then, we use order_totals in subsequent SELECT statements to obtain orders with a total amount greater than 1,000. In this way, we can define and reference temporary expressions in a query statement, making the query more concise and easy to understand.

When multiple subquery statements are required, commas can be used to connect each statement. The example is as follows:

with a as (select name,age,sno from table_A),
	 b as (select * from a where age >= 12 and age <=22),
	 c as (select * from b where sno = "0001")
select * from c where name = "zs"

cast

In Hive, CAST is a type conversion function used to convert an expression or column to a specified data type. Its syntax is as follows:

CAST(表达式 AS 数据类型)

In this structure, the expression can be a specific value, a column name, or the return value of a function. The data type can be any valid data type supported by Hive, such as INT, STRING, BOOLEAN, etc.

Case (there is a table named orders, which contains two columns order_id and order_amount, where the data type of order_amount is string. We can use the CAST function to convert order_amount to a floating point type, and then perform the sum calculation):

SELECT SUM(CAST(order_amount AS FLOAT))
FROM orders;

In the above example, we converted the order_amount column from string type to floating point type (FLOAT) through the CAST function and then calculated the sum of the converted column using the SUM function.

It should be noted that when performing type conversion, make sure that the target data type is compatible with the source data type, otherwise the conversion may fail or cause an error. In addition, forced type conversion may also cause the loss or truncation of data precision, so it should be used with caution according to the specific situation.

get_json_object

Used to extract the value of a specific JSON object from a JSON string. It takes as input a JSON string and a JSON path and returns the value of the JSON object corresponding to the path. The JSON path can be used to specify the location of the JSON object you want to extract, which can be the field name of the object, the index of the array, or a wildcard character. By using this function, specific parts of JSON data can be retrieved and manipulated to facilitate data extraction and analysis.

get_json_object(string json_string,string path)

Case (there is an existing json object called log_information, which contains fields such as time, name, age, birth, etc.):

get_json_object(log_information,'$.time') as time

unix_timestamp

unix_timestamp is used to calculate timestamps. A timestamp is a number that represents a specific time, usually the number of seconds that have elapsed since January 1, 1970. The unix_timestamp function can convert the specified date and time into the corresponding timestamp.

unix_timestamp(time,"yyyyMMddHHmmss") as timestamp

from_unixtime

The from_unixtime function is used to convert timestamps into corresponding dates and times. It accepts a timestamp as parameter and converts it into a date and time string with a specific format. This function can be used to convert unix timestamps into readable date and time formats for better understanding and processing of time data.

from_unixtime(timestamp,"yyyy-MM-dd HH:mm:ss") as time

Some functions that have been used recently to process some data will continue to be updated if new functions or SQL are used in the future! ! !

Guess you like

Origin blog.csdn.net/weixin_57367513/article/details/134017967