Hive implements self-increasing columns

1. Use the row_number() function to generate a surrogate key

INSERT OVERWRITE TABLE testTable

select row_number() over (order by a.acc_no) id,

a.acc_no

from ba_pay_out.app_intf_web_cli_his_view a

 

2. Generate surrogate keys with UDFRowSequence

add jar viewfs://hadoop-meituan/user/hadoop-data/user_upload/weichao05_hive-contrib-3.1.0.jar;

create temporary function row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence';

INSERT OVERWRITE TABLE testTable

select row_sequence() id,

a.acc_no

from ba_pay_out.app_intf_web_cli_his_view a

 

    hive-contrib-3.1.0.jar contains a custom function udfrowsequence that generates record sequence numbers. The above statement first loads the JAR package, and then creates a temporary function named row_sequence() as an interface for calling UDF, which can generate an auto-incrementing pseudo column for the query result set. After that, the writing method is similar to row_number(), except that the window function row_number() is replaced by the row_sequence() function.

    Of the above two methods, the performance of the second method is due to the first method. The first method is slow to execute, and when the data exceeds about tens of millions (my experience is more than 40 million), it will report that the memory is insufficient. This may be related to The resource configuration of hadoop is also related, and the second method can still run quickly when the data exceeds 150 million.

The difference between the two methods:

The first is to sort the entire data set. The sequence numbers are consecutive and unique in this sorting. The second is to sort by task. Concurrent tasks will be sorted starting from 1, so the sequence numbers are repeated, so each has its own advantages and disadvantages. , the second method can consider extending the use of redis for serial number logical management, which should also be able to complete continuous unique serial numbers.

Guess you like

Origin blog.csdn.net/weichao9999/article/details/82112930