如何通过一个值过滤一个SQL嵌套集合(附实例)

我在Stack Overflow上偶然发现了一个非常有趣的问题,关于如何使用jOOQ的MULTISET 操作符来嵌套一个集合,然后通过该嵌套集合是否包含一个值来过滤结果。

这个问题是针对jOOQ的,但是想象一下,你有一个查询,在PostgreSQL中使用JSON嵌套集合。假设,像往常一样,Sakila数据库。现在,PostgreSQL不支持SQL标准的MULTISET 操作符,但我们可以使用ARRAY ,其工作方式几乎相同:

SELECT
  f.title,
  ARRAY(
    SELECT ROW(
      a.actor_id,
      a.first_name,
      a.last_name
    )
    FROM actor AS a
    JOIN film_actor AS fa USING (actor_id)
    WHERE fa.film_id = f.film_id
    ORDER BY a.actor_id
  )
FROM film AS f
ORDER BY f.title

这就产生了所有的电影和他们的演员,如下所示(为了便于阅读,我把数组截断了,你明白了吧):

title                      |array                                                                                 
---------------------------+--------------------------------------------------------------------------------------
ACADEMY DINOSAUR           |{"(1,PENELOPE,GUINESS)","(10,CHRISTIAN,GABLE)","(20,LUCILLE,TRACY)","(30,SANDRA,PECK)"
ACE GOLDFINGER             |{"(19,BOB,FAWCETT)","(85,MINNIE,ZELLWEGER)","(90,SEAN,GUINESS)","(160,CHRIS,DEPP)"}   
ADAPTATION HOLES           |{"(2,NICK,WAHLBERG)","(19,BOB,FAWCETT)","(24,CAMERON,STREEP)","(64,RAY,JOHANSSON)","(1
AFFAIR PREJUDICE           |{"(41,JODIE,DEGENERES)","(81,SCARLETT,DAMON)","(88,KENNETH,PESCI)","(147,FAY,WINSLET)"
AFRICAN EGG                |{"(51,GARY,PHOENIX)","(59,DUSTIN,TAUTOU)","(103,MATTHEW,LEIGH)","(181,MATTHEW,CARREY)"
AGENT TRUMAN               |{"(21,KIRSTEN,PALTROW)","(23,SANDRA,KILMER)","(62,JAYNE,NEESON)","(108,WARREN,NOLTE)",
AIRPLANE SIERRA            |{"(99,JIM,MOSTEL)","(133,RICHARD,PENN)","(162,OPRAH,KILMER)","(170,MENA,HOPPER)","(185
AIRPORT POLLOCK            |{"(55,FAY,KILMER)","(96,GENE,WILLIS)","(110,SUSAN,DAVIS)","(138,LUCILLE,DEE)"}        
ALABAMA DEVIL              |{"(10,CHRISTIAN,GABLE)","(22,ELVIS,MARX)","(26,RIP,CRAWFORD)","(53,MENA,TEMPLE)","(68,

现在,Stack Overflow上的问题是,如何通过ARRAY (或MULTISET )是否包含一个特定的值来过滤这个结果。

过滤ARRAY

我们不能只是在查询中添加一个WHERE 子句。由于SQL的逻辑操作顺序WHERE 子句 "发生在 "SELECT 子句之前,所以ARRAY 还不能用于WHERE 。然而,我们可以把所有的东西都包在一个派生表里,然后这样做:

SELECT *
FROM (
  SELECT
    f.title,
    ARRAY(
      SELECT ROW(
        a.actor_id,
        a.first_name,
        a.last_name
      )
      FROM actor AS a
      JOIN film_actor AS fa USING (actor_id)
      WHERE fa.film_id = f.film_id
      ORDER BY a.actor_id
    ) AS actors
  FROM film AS f
) AS f
WHERE actors @> ARRAY[(
  SELECT ROW(a.actor_id, a.first_name, a.last_name)
  FROM actor AS a 
  WHERE a.actor_id = 1
)]
ORDER BY f.title

请原谅这个笨重的ARRAY @> ARRAY 操作符。我不知道这里有什么更好的方法,因为在PostgreSQL中很难解除结构类型的RECORD[] 数组的嵌套,如果我们不使用名义类型(CREATE TYPE ...)。如果你知道一个更好的过滤方法,请在评论区告诉我。这里有一个更好的版本:

SELECT *
FROM (
  SELECT
    f.title,
    ARRAY(
      SELECT ROW(
        a.actor_id,
        a.first_name,
        a.last_name
      )
      FROM actor AS a
      JOIN film_actor AS fa USING (actor_id)
      WHERE fa.film_id = f.film_id
      ORDER BY a.actor_id
    ) AS actors
  FROM film AS f
) AS f
WHERE EXISTS (
  SELECT 1 
  FROM unnest(actors) AS t (a bigint, b text, c text) 
  WHERE a = 1
)
ORDER BY f.title

无论如何,这产生了预期的结果:

title                |actors                                                                                           
---------------------+-------------------------------------------------------------------------------------------------
ACADEMY DINOSAUR     |{"(1,PENELOPE,GUINESS)","(10,CHRISTIAN,GABLE)","(20,LUCILLE,TRACY)","(30,SANDRA,PECK)","(40,JOHNN
ANACONDA CONFESSIONS |{"(1,PENELOPE,GUINESS)","(4,JENNIFER,DAVIS)","(22,ELVIS,MARX)","(150,JAYNE,NOLTE)","(164,HUMPHREY
ANGELS LIFE          |{"(1,PENELOPE,GUINESS)","(4,JENNIFER,DAVIS)","(7,GRACE,MOSTEL)","(47,JULIA,BARRYMORE)","(91,CHRIS
BULWORTH COMMANDMENTS|{"(1,PENELOPE,GUINESS)","(65,ANGELA,HUDSON)","(124,SCARLETT,BENING)","(173,ALAN,DREYFUSS)"}      
CHEAPER CLYDE        |{"(1,PENELOPE,GUINESS)","(20,LUCILLE,TRACY)"}                                                    
COLOR PHILADELPHIA   |{"(1,PENELOPE,GUINESS)","(106,GROUCHO,DUNST)","(122,SALMA,NOLTE)","(129,DARYL,CRAWFORD)","(163,CH
ELEPHANT TROJAN      |{"(1,PENELOPE,GUINESS)","(24,CAMERON,STREEP)","(37,VAL,BOLGER)","(107,GINA,DEGENERES)","(115,HARR
GLEAMING JAWBREAKER  |{"(1,PENELOPE,GUINESS)","(66,MARY,TANDY)","(125,ALBERT,NOLTE)","(143,RIVER,DEAN)","(155,IAN,TANDY

现在,所有的结果都保证是'PENELOPE GUINESS'ACTOR 的影片。但是否有更好的解决方案?

使用ARRAY_AGG代替

然而,在本地PostgreSQL中,使用ARRAY_AGG ,我认为会更好(在这种情况下):

SELECT
  f.title,
  ARRAY_AGG(ROW(
    a.actor_id,
    a.first_name,
    a.last_name
  ) ORDER BY a.actor_id) AS actors
FROM film AS f
JOIN film_actor AS fa USING (film_id)
JOIN actor AS a USING (actor_id)
GROUP BY f.title
HAVING bool_or(true) FILTER (WHERE a.actor_id = 1)
ORDER BY f.title

这产生了完全相同的结果:

title                |actors                                                                                          
---------------------+------------------------------------------------------------------------------------------------
ACADEMY DINOSAUR     |{"(1,PENELOPE,GUINESS)","(10,CHRISTIAN,GABLE)","(20,LUCILLE,TRACY)","(30,SANDRA,PECK)","(40,JOHN
ANACONDA CONFESSIONS |{"(1,PENELOPE,GUINESS)","(4,JENNIFER,DAVIS)","(22,ELVIS,MARX)","(150,JAYNE,NOLTE)","(164,HUMPHRE
ANGELS LIFE          |{"(1,PENELOPE,GUINESS)","(4,JENNIFER,DAVIS)","(7,GRACE,MOSTEL)","(47,JULIA,BARRYMORE)","(91,CHRI
BULWORTH COMMANDMENTS|{"(1,PENELOPE,GUINESS)","(65,ANGELA,HUDSON)","(124,SCARLETT,BENING)","(173,ALAN,DREYFUSS)"}     
CHEAPER CLYDE        |{"(1,PENELOPE,GUINESS)","(20,LUCILLE,TRACY)"}                                                   
COLOR PHILADELPHIA   |{"(1,PENELOPE,GUINESS)","(106,GROUCHO,DUNST)","(122,SALMA,NOLTE)","(129,DARYL,CRAWFORD)","(163,C
ELEPHANT TROJAN      |{"(1,PENELOPE,GUINESS)","(24,CAMERON,STREEP)","(37,VAL,BOLGER)","(107,GINA,DEGENERES)","(115,HAR
GLEAMING JAWBREAKER  |{"(1,PENELOPE,GUINESS)","(66,MARY,TANDY)","(125,ALBERT,NOLTE)","(143,RIVER,DEAN)","(155,IAN,TAND

它是如何工作的?

  • 我们通过FILM 进行分组,并将每部影片的内容汇总到一个嵌套的集合中。
  • 我们现在可以用HAVING 来过滤分组。
  • BOOL_OR(TRUE) 是 ,只要 ,就是非空的。TRUE GROUP
  • FILTER (WHERE a.actor_id = 1) 是那个过滤标准,我们把它放在组中

所以,如果至少有一个ACTOR_ID = 1HAVING 谓词就是TRUE ,否则就是NULL ,这与FALSE 的效果相同。如果你是一个纯粹的人,可以把谓词包在COALESCE(BOOL_OR(...), FALSE)

聪明还是整洁,还是两者都有?

用jOOQ做这个

这是jOOQ的版本,可以在任何支持MULTISET_AGG 的RDBMS上使用(ARRAY_AGG 的仿真仍在进行中):

ctx.select(
        FILM_ACTOR.film().TITLE,
        multisetAgg(
            FILM_ACTOR.actor().ACTOR_ID,
            FILM_ACTOR.actor().FIRST_NAME,
            FILM_ACTOR.actor().LAST_NAME))
   .from(FILM_ACTOR)
   .groupBy(FILM_ACTOR.film().TITLE)
   .having(boolOr(trueCondition())
       .filterWhere(FILM_ACTOR.actor().ACTOR_ID.eq(1)))
   .orderBy(FILM_ACTOR.film().TITLE)
   .fetch();

虽然强大的MULTISET 值构造器得到了jOOQ用户的大部分赞誉,但我们不要忘记还有一个功能稍差,但偶尔真的很有用的MULTISET_AGG 聚合函数,它可以用于聚合或作为一个窗口函数使用

猜你喜欢

转载自juejin.im/post/7126037601236025357
今日推荐