Alibaba Cloud Big Data Practical Record 10: Pitfalls of Hive Compatibility Mode


1 Introduction

When I was developing a form today, MaxCompute threw me an error:

SQL Runtime Unretryable Error: ODPS-0121125:[xx,xx] Unsupported operation - function signature DATE_FORMAT(string, string) is not supported in current mode, please set odps.sql.hive.compatible=true to use it

What does that mean? It just tells me that MaxCompute does not support this syntax DATE_FORMAT(string, string), but if I still want to use it, I can add a configuration: set odps.sql.hive.compatible=trueso that I can use the above syntax.

The original SQL of this error can be abstracted as:

SELECT DATE_FORMAT(FROM_UNIXTIME(1672538400),'yyyyMMdd')

That is, FROM_UNIXTIME(1672538400)the returned result is treated as a STRING data type. It DATE_FORMAT(string, string)needs to be added set odps.sql.hive.compatible=trueto work properly.

2. What is Hive compatibility mode?

So what does this configuration ( set odps.sql.hive.compatible=true) mean?
This configuration is to enable Hive compatibility mode, so that the function syntax of Hive SQL can be used in MaxCompute SQL.

3. Why should we enable Hive mode?

Because the usage of some functions is not supported by MaxCompute or is different, Hive mode must be enabled before they can be used.
For example, the error reported above: DATE_FORMAT()If the parameter passed in to the function is of type STRING, you need to enable Hive compatibility mode before it can be used. Otherwise, an error will be reported.

As mentioned above, in MaxCompute, DATE_FORMAT()when using functions, the parameters passed in do not support the STRING type. It DATE_FORMAT(string,string)is only supported in Hive SQL, so you need to enable Hive compatibility mode to use it.

4. What are the side effects?

certainly! There will definitely be side effects, because the return values ​​of some functions in Hive mode and non-Hive mode are different.
For example FROM_UNIXTIME(), for a function, its return value in Hive compatibility mode is of type STRING, while in the ODPS 1.0 and ODPS 2.0 data type versions, its return value is of type DATETIME.

What impact will this have? If you have used MaxCompute for data development, you will be familiar with this: MaxCompute has strict requirements on data type consistency. In most scenarios, inconsistent data types will make it impossible to judge .

At this time, either an error is thrown, or a null value is returned directly. The former is easy to handle, just adjust the data type explicitly, but the latter is a bit confusing and needs to be investigated step by step.

Therefore, if the return data type changes, the biggest impact is that the data cannot be obtained. For this reason, the developed form must perform data verification to avoid overturning.

As an aside, data type consistency is almost invisible on MySQL, because MySQL will help us perform implicit conversions without requiring us to handle it separately, so on MySQL, there is basically no need to worry too much about data types. Question, precisely because of this it is particularly suitable for beginners~~

For more "side effects", please refer to the picture below:

Reference link: https://help.aliyun.com/zh/maxcompute/user-guide/hive-compatible-data-type-edition

image.png

5. How to enable Hive compatibility mode?

After understanding the positive effects and side effects of Hive, and if you still decide to use it, you can turn on the Hive switch and "enjoy" it!

Set switch syntax:

set odps.sql.hive.compatible=true; -- 打开Hive兼容模式。

To use, place it in front of the SQL and execute it together with the SQL. Each setting command is treated as an independent statement and ;ends with:

set odps.sql.hive.compatible=true;   --打开Hive兼容模式
SELECT xxx FROM xxx;

In order to ensure no difference, the settings of ODPS 2.0 are added together.

set odps.sql.type.system.odps2=true; --打开MaxCompute 2.0数据类型。
set odps.sql.decimal.odps2=true;     --打开Decimal 2.0数据类型。
set odps.sql.hive.compatible=true;   --打开Hive兼容模式。
SELECT xxx FROM xxx;

Therefore, the final solution obtained by turning on Hive compatibility mode is as follows:

set odps.sql.type.system.odps2=true; --打开MaxCompute 2.0数据类型。
set odps.sql.decimal.odps2=true;     --打开Decimal 2.0数据类型。
set odps.sql.hive.compatible=true;   --打开Hive兼容模式。
SELECT FROM_UNIXTIME(1672538400),DATE_FORMAT(FROM_UNIXTIME(1672538400),'yyyyMMdd');

The result is as follows:
image.png

6. In this scenario, can Hive compatibility mode be disabled?

definitely! Looking at the official documentation of MaxCompute, you can see the following snippet:

DATE_FORMAT function link: https://help.aliyun.com/zh/maxcompute/user-guide/date-format

image.png

MaxCompute DATE_FORMAT()functions support a total of 4 types of parameters: DATE, DATETIME, TIMESTAMP and STRING. Three of them: DATE, DATETIME and STRING types can only be used in Hive compatibility mode, and one: TIMESTAMP can be used in non-Hive compatibility mode.

Then use CAST(FROM_UNIXTIME(1672538400)AS TIMESTAMP)converting the data type to TIMESTAMP!

set odps.sql.type.system.odps2=true; --打开MaxCompute 2.0数据类型。
set odps.sql.decimal.odps2=true; 		 --打开Decimal 2.0数据类型。
set odps.sql.hive.compatible=false;  --关闭Hive兼容模式。
select FROM_UNIXTIME(1672538400),DATE_FORMAT(CAST(FROM_UNIXTIME(1672538400) AS TIMESTAMP),'yyyyMMdd');

The return result is as follows:
image.png

7. Why not DATE_FORMAT(datetime, string)?

During the above description process, I wonder if you noticed that there is actually another problem that has not been solved?

FROM_UNIXTIME()function, its return value in Hive compatibility mode is of type STRING, while in the ODPS 1.0 and ODPS 2.0 data type versions, its return value is of type DATETIME. However, when I did not enable Hive compatibility mode, the error returned was that MaxCompute does not support this syntax DATE_FORMAT(string, string), instead DATE_FORMAT(datetime, string).

So does the non-Hive compatibility mode actually return STRING or DATETIME?

In order to answer this question, I did a verification: test to verify FROM_UNIXTIME()the data type returned in different modes.

illustrate:

  • The timestamp 1672538400converted to time format is2023-01-01 10:00:00
  • DATE()It is a syntax that is only available in ODPS 2.0. You need to turn on the ODPS 2.0 switch. Some projects are directly set to turn on ODPS2, but to be on the safe side, set it up here again. You can check whether the relevant configuration of the project is turned on. If it is turned on, no further settings are required. .
  • According to the documentation, FROM_UNIXTIME()STRING is returned in Hive compatibility mode, and DATETIME is returned in non-Hive compatibility mode;

[Test 1] If Hive compatibility mode is turned on, FROM_UNIXTIME(1672538400)will the STRING type be returned?

set odps.sql.type.system.odps2=true; --打开MaxCompute 2.0数据类型。
set odps.sql.decimal.odps2=true;     --打开Decimal 2.0数据类型。
set odps.sql.hive.compatible=true;   --打开Hive兼容模式。
SELECT FROM_UNIXTIME(1672538400),FROM_UNIXTIME(1672538400)='2023-01-01 10:00:00';

The returned results are as follows, in line with expectations.

image.png

[Test 2] Turn off Hive compatibility mode. FROM_UNIXTIME(1672538400)Will the DATETIME type be returned?

set odps.sql.type.system.odps2=true; --打开MaxCompute 2.0数据类型。
set odps.sql.decimal.odps2=true;     --打开Decimal 2.0数据类型。
set odps.sql.hive.compatible=false;  --关闭Hive兼容模式。
SELECT FROM_UNIXTIME(1672538400),FROM_UNIXTIME(1672538400)='2023-01-01 10:00:00';

The results are as follows:

image.png

What happened? Also equal? FROM_UNIXTIME(1672538400)='2023-01-01 10:00:00'Implicit conversion by judgment? Since no matter whether the right side of the equal sign is STRING or DATATIME, it returns true. Change your approach and use DATE()auxiliary judgment.

Note: The date value will be returned only if the value DATE()passed in by the function is 'yyyy-MM-dd'a string in time format or of DATETIME type. If it is a string in time format 'yyyy-MM-dd hh:mi:ss', a null value will be returned.

[Test 3] Use DATE()function auxiliary judgment instead: Turn on Hive compatibility mode, DATE(FROM_UNIXTIME(1672538400))does it return a null value?

set odps.sql.type.system.odps2=true; --打开MaxCompute 2.0数据类型。
set odps.sql.decimal.odps2=true;     --打开Decimal 2.0数据类型。
set odps.sql.hive.compatible=true;   --关闭Hive兼容模式。
SELECT FROM_UNIXTIME(1672538400),DATE(FROM_UNIXTIME(1672538400));

The results are as follows, in line with expectations.

image.png

[Test 4] Use DATE()functions to assist in judgment: turn off Hive compatibility mode, DATE(FROM_UNIXTIME(1672538400))should the date be returned?

set odps.sql.type.system.odps2=true; --打开MaxCompute 2.0数据类型。
set odps.sql.decimal.odps2=true;     --打开Decimal 2.0数据类型。
set odps.sql.hive.compatible=false;   --关闭Hive兼容模式。
SELECT FROM_UNIXTIME(1672538400),DATE(FROM_UNIXTIME(1672538400));

The results are as follows, in line with expectations.

image.png

[Supplementary test] DATE('2023-01-01 10:00:00')And DATE(CAST('2023-01-01 10:00:00' as datetime))are the returned results null values ​​and time fields?

set odps.sql.type.system.odps2=true; --打开MaxCompute 2.0数据类型。
SELECT DATE('2023-01-01 10:00:00'),DATE(CAST('2023-01-01 10:00:00' AS DATETIME));

The returned results are as follows, in line with expectations.

image.png

The above indirectly verifies that in MaxCompute, FROM_UNIXTIME()STRING is returned in Hive-compatible mode, and DATETIME is returned in non-Hive-compatible mode.

In non-Hive compatibility mode, FROM_UNIXTIME(1672538400)the DATETIME type is indeed returned, so when the DATETIME type is passed in DATE_FORMAT(), it will be converted to the STRING type for processing.

8. Summary

DATE_FORMAT(string, string)This article provides two methods to solve the problem that MaxCompute does not support this syntax :

  • Method 1: Enable Hive compatibility mode
set odps.sql.type.system.odps2=true; --打开MaxCompute 2.0数据类型。
set odps.sql.decimal.odps2=true;     --打开Decimal 2.0数据类型。
set odps.sql.hive.compatible=true;   --打开Hive兼容模式。
SELECT FROM_UNIXTIME(1672538400),DATE_FORMAT(FROM_UNIXTIME(1672538400),'yyyyMMdd');
  • Method 2: Explicitly modify FROM_UNIXTIME(1672538400)the data type passed in and returned
set odps.sql.type.system.odps2=true; --打开MaxCompute 2.0数据类型。
set odps.sql.decimal.odps2=true; 		 --打开Decimal 2.0数据类型。
set odps.sql.hive.compatible=false;  --关闭Hive兼容模式。
select FROM_UNIXTIME(1672538400),DATE_FORMAT(CAST(FROM_UNIXTIME(1672538400) AS TIMESTAMP),'yyyyMMdd');

In addition, if the parameter passed DATE_FORMAT()is of DATETIME type, it will be implicitly converted to STRING for processing.





Review of past issues:

Alibaba Cloud Big Data Practical Record 9: MaxCompute RAM Users and Authorization
Alibaba Cloud Big Data Practical Record 8: Unpack each element of json, one per line
Alibaba Cloud Big Data Practical Record 7: How to deal with duplicate data in production environment forms

Guess you like

Origin blog.csdn.net/qq_45476428/article/details/132921653