Some of the problems encountered in my work before, let’s briefly share these situations, not to mention more, directly to the problem.
1. Problem background
Convert the data effect of the above figure to the effect of the figure below
manager |
["aa","aa","aa","bb","bb"] |
["cc","cc","dd"] |
["1","2","1","2","3"] |
manager |
aa, bb |
cc,dd |
1,2,3 |
2. Implementation ideas
- The first step is to use the lateral explode() function side view to open the manager column
- The second step is to use wm_concat() function combined with group by to remove duplicate data
3. Code implementation
select approvalid
,subProductTag
,regexp_replace(wm_concat(distinct managers,','),'"','')as manager
from (
SELECT approvalid
,subProductTag
,managers
FROM (SELECT approvalid,subproducttag,manager FROM a WHERE ftime = %(dateFrom)s)tmp
lateral view explode(split(substr(manager,2,length(manager)-2),','))tmp1 as managers)
group by approvalid,subProductTag
4. Summary
The overall idea is to use first to expand and then aggregate. If there are other better implementation methods, please feel free to express your thoughts in the comment section below!