【原创】大叔经验分享（83）impala执行多个select distinct - 代码天地

【原创】大叔经验分享（83）impala执行多个select distinct

其他 2019-09-28 06:27:54 阅读次数: 0

impala在一个select中执行多个count distinct时会报错，比如执行

select key, count(distinct column_a), count(distinct column_b) from test_table group by key

会报错

Query submitted at: 2019-09-28 00:34:20 (Coordinator: http://DataOne-001:25000)
ERROR: AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT column_a);
deviating function: count(DISTINCT column_b)
Consider using NDV() instead of COUNT(DISTINCT) if estimated counts are acceptable. Enable the APPX_COUNT_DISTINCT query option to
perform this rewrite automatically.

这时有几种方法：

1 使用近似值

1.1 set APPX_COUNT_DISTINCT = true
1.2 count distinct改为ndv，即ndv(column_a)
这两种方法底层实现是一样的，设置APPX_COUNT_DISTINCT会自动将count distinct改写为ndv，ndv全称为（number of distinct values），用到
Cardinality（基数计数），底层实现是类似HLLC（Hyper LogLog Counting）这种概率算法，详见参考；

An aggregate function that returns an approximate value similar to the result of COUNT(DISTINCT col), the "number of distinct values". It is much faster than the combination of COUNT and DISTINCT, and uses a constant amount of memory and thus is less memory-intensive for columns with high cardinality.

2 使用精确值

改写为多个子查询然后join，比如

select a.key, a.count_a, b.count_b from
(select key, count(distinct column_a) count_a from test_table group by key) a join
(select key, count(distinct column_b) count_b from test_table group by key) b on a.key = b.key

参考：

ndv

http://impala.apache.org/docs/build/html/topics/impala_ndv.html#ndv

APPX_COUNT_DISTINCT

http://impala.apache.org/docs/build/html/topics/impala_appx_count_distinct.html

其他

https://stackoverflow.com/questions/39236076/impala-all-distinct-aggregate-functions-need-to-have-the-same-set-of-parameters

猜你喜欢

转载自www.cnblogs.com/barneywill/p/11601234.html

【原创】大叔经验分享（83）impala执行多个select distinct

SELECT DISTINCT

SQL SELECT DISTINCT 语句

QL SELECT DISTINCT 语句

SQL SELECT DISTINCT

SQL SELECT DISTINCT 语法

distinct

mysql的 select count(distinct column)

Querydsl distinct 多个字段

MySQL distinct多个字段

SQL语法：SELECT / SELECT DISTINCT / WHERE

关于oracle select distinct order by的问题

MYSQL查询数据（二）SELECT | DISTINCT

Select中DISTINCT关键字的用法?

关于 SELECT DISTINCT ORDER BY 注意点

MySQL Select Distinct什么意思

解决count distinct多个字段的方法

【原创】大叔经验分享（48）oozie中通过shell执行impala

解决count distinct多个字段的方法解决count distinct多个字段的方法

mysql DISTINCT选取多个字段，获取distinct后的行信息

PostgreSQL DISTINCT 和 DISTINCT ON

SQL（一）数据库去除重复值 select distinct

数据库查询语句 select distinct 的查询效率问题

mysql 去除重复 Select中DISTINCT关键字的用法

解决distinct中使用多个字段的方法

SQL语句distinct的多个字段去重问题

distinct 和 group by 选取多个字段问题

distinct、 join on、where、group by、having、order by执行顺序

【MySQL distinct的使用】如果指定了 SELECT DISTINCT，那么 ORDER BY 子句中的项就必须出现在选择列表中

The function of " distinct on"

今日推荐

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

【转】spring中对控制反转和依赖注入的理解

tms webcore 安装和使用

java程序员进阶相关书籍

SpringMVC接受请求参数、

如何保存训练好的机器学习模型

MyEclipse、Eclipse设置项目JDK的三个地方

商超行业微信小程序开发定制一般多少钱（行业技术人员解读）

Markdown编辑器语言——30分钟入门到到精通

Linux系统下MongoDB的简单安装与基本操作

Power Strings

每日归档

更多

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)