Summary of problems encountered during Oracle migration to Hive

foreword

Recently, a small partner is doing business migration from Oracleto Hive, and encountered some pitfalls during the migration process. Today, I will summarize these pitfalls to avoid similar problems when other businesses migrate in the future. Even if they do, they can be solved Bring it over for comparison.

问题1:Distinct window functions are not supported: count(distinct position_id#92) windowspecdefinition

insert image description here
Judging from the error log information in the picture above, it means that the window function is not supported count distinct. Obviously Oraclethe writing method supported in , but Spark SQLnot supported in .

解决方案

  • Option 1: Use approx_count_distinct, but it is probability statistics, not precise statistics
  • Option 2: Use collect_setwithsize

Before sql:

count

Guess you like

Origin blog.csdn.net/u011109589/article/details/131937032