SQL achieves fine-grained deduplication of in-row data

Some of the problems encountered in my work before, let’s briefly share these situations, not to mention more, directly to the problem.

1. Problem background

Convert the data effect of the above figure to the effect of the figure below

manager
["aa","aa","aa","bb","bb"]
["cc","cc","dd"]
["1","2","1","2","3"]
manager
aa, bb
cc,dd
1,2,3

2. Implementation ideas

  • The first step is to use the lateral explode() function side view to open the manager column
  • The second step is to use wm_concat() function combined with group by to remove duplicate data

3. Code implementation

        select approvalid
        ,subProductTag
        ,regexp_replace(wm_concat(distinct managers,','),'"','')as manager
        from (
            SELECT approvalid
            ,subProductTag
            ,managers
            FROM (SELECT approvalid,subproducttag,manager FROM a WHERE ftime = %(dateFrom)s)tmp
                lateral view explode(split(substr(manager,2,length(manager)-2),','))tmp1 as managers)
        group by approvalid,subProductTag

4. Summary

The overall idea is to use first to expand and then aggregate. If there are other better implementation methods, please feel free to express your thoughts in the comment section below!

 

Guess you like

Origin blog.csdn.net/ALIVEE/article/details/107763939