foreword
Recently, a small partner is doing business migration from Oracle
to Hive
, and encountered some pitfalls during the migration process. Today, I will summarize these pitfalls to avoid similar problems when other businesses migrate in the future. Even if they do, they can be solved Bring it over for comparison.
问题1:Distinct window functions are not supported: count(distinct position_id#92) windowspecdefinition
Judging from the error log information in the picture above, it means that the window function is not supported count distinct
. Obviously Oracle
the writing method supported in , but Spark SQL
not supported in .
解决方案
- Option 1: Use
approx_count_distinct
, but it is probability statistics, not precise statistics - Option 2: Use
collect_set
withsize
Before sql
:
count