TDH distinct in operation and groupby Analysis
calculated the same order of magnitude faster than distinct Why groupby
select count(1)
from
(
select cust_isn
from
database.table
group by
cust_isn
)
select count(distinct(cust_isn))
from
database.table;
The operation takes 24s distinct, the operation takes 1s groupby
inserted here described image
groupby of the DAG
two stages shuffle
last node takes 1s
dinstinct of the DAG
a shuffle stage
the final stage takes 24 seconds
reasons Summary:
Although more than a shuffle operation groupby than dinstinct, but because there is a task groupby calculated expected, leading to faster groupby