Al ejecutar hql hoy, encontré un problema.
Ha estado reportando un error que dice que no se puede reconocer user.user_id.Se estima que el alias de usuario se usa incorrectamente. Posteriormente modifiqué el alias de la tabla a un nombre de tabla que no aparecía y la ejecución fue exitosa.
hive (hive)> select user.user_id,user.date_dt,user.low_carbon
> from
> user_low_carbon user
> join
> (select user_id,date_dt
> from
> (select user_id,date_dt,
> datediff(date_dt,lag2) lag2_diff,
> datediff(date_dt,lag1) lag1_diff,
> datediff(date_dt,lead1) lead1_diff,
> datediff(date_dt,lead2) lead2_diff
> from
> (select user_id ,date_dt,
> lag(date_dt,2,'1970-01-01') over(partition by user_id order by date_dt) lag2,
> lag(date_dt,1,'1970-01-01') over(partition by user_id order by date_dt) lag1,
> lead(date_dt,1,'1970-01-01') over(partition by user_id order by date_dt) lead1,
> lead(date_dt,2,'1970-01-01') over(partition by user_id order by date_dt) lead2
> from
> (select user_id,date_format(regexp_replace(date_dt,'/','-'),'yyyy-MM-dd') date_dt,sum(low_carbon) sum_low_carbon
> from user_low_carbon
> where
> substring(date_dt,1,4)='2017'
> group by user_id,date_dt
> having
> sum_low_carbon>=100)t1)t2)t3
> where
> (lag2_diff=2 and lag1_diff=1)
> or
> (lag1_diff=1 and lead1_diff=-1)
> or
> (lead1_diff=-1 and lead2_diff=-2))t4
> on
> t4.user_id=user.user_id and t4.date_dt=date_format(regexp_replace(user.date_dt,'/','-'),'yyyy-MM-dd');
Código correcto:
select a.user_id,a.date_dt,a.low_carbon
from
user_low_carbon a
join
(select user_id,date_dt
from
(select user_id,date_dt,
datediff(date_dt,lag2) lag2_diff,
datediff(date_dt,lag1) lag1_diff,
datediff(date_dt,lead1) lead1_diff,
datediff(date_dt,lead2) lead2_diff
from
(select user_id ,date_dt,
lag(date_dt,2,'1970-01-01') over(partition by user_id order by date_dt) lag2,
lag(date_dt,1,'1970-01-01') over(partition by user_id order by date_dt) lag1,
lead(date_dt,1,'1970-01-01') over(partition by user_id order by date_dt) lead1,
lead(date_dt,2,'1970-01-01') over(partition by user_id order by date_dt) lead2
from
(select user_id,date_format(regexp_replace(date_dt,'/','-'),'yyyy-MM-dd') date_dt,sum(low_carbon) sum_low_carbon
from user_low_carbon
where
substring(date_dt,1,4)='2017'
group by user_id,date_dt
having
sum_low_carbon>=100)t1)t2)t3
where
(lag2_diff=2 and lag1_diff=1)
or
(lag1_diff=1 and lead1_diff=-1)
or
(lead1_diff=-1 and lead2_diff=-2)) t4
on
t4.user_id=a.user_id and t4.date_dt=date_format(regexp_replace(a.date_dt,'/','-'),'yyyy-MM-dd');
Las ventajas de esta declaración hql anidada de múltiples capas son complejas, ¡y se ejecutan un total de tres mapreducers!