INSERT SELECT using a UNION statement in the WHERE condition

Hugo Migneron :

I have a query that goes something like this :

INSERT IGNORE INTO `destination_table` (`id`, `field1`, `field2`, `field3`)
SELECT `id`, `field1`, `field2`, `field3`
FROM `source_table`
WHERE `source_table`.`id` IN (
    SELECT DISTINCT `id` FROM `some_table`
    UNION DISTICT SELECT DISTINCT `id` FROM `some_other_table`
);

This does not work -- the query hangs indefinitely. The size of the tables is definitely not the problem, all tables have a fairly small number of records ( < 100k records). The query is fine and quite fast if I run it without the UNION :

INSERT IGNORE INTO `destination_table` (`id`, `field1`, `field2`, `field3`)
SELECT `id`, `field1`, `field2`, `field3`
FROM `source_table`
WHERE `source_table`.`id` IN (
    SELECT DISTINCT `id` FROM `some_table` -- I tried with `some_other_table` too, same result
);

or

INSERT IGNORE INTO `destination_table` (`id`, `field1`, `field2`, `field3`)
SELECT `id`, `field1`, `field2`, `field3`
FROM `source_table`

both work and are nice and fast (well under a second). So I imagine that the UNION DISTICT SELECT ... is the culprit here, but I don't know why.

What's wrong with that query and why does it hang ?

Using mysql 5.7 is that makes a difference

Tim Biegeleisen :

Your first query seems to have a few typos, but I would suggest using exists logic here:

INSERT IGNORE INTO destination_table (id, field1, field2, field3)
SELECT id, field1, field2, field3
FROM source_table t1
WHERE
    EXISTS (SELECT 1 FROM some_table s1 WHERE s1.id = t1.id) OR
    EXISTS (SELECT 1 FROM some_other_table s2 WHERE s2.id = t1.id);

The possible advantage of using exists in this way is that MySQL can stop searching as soon as it finds the first matching id in either of the subqueries on the two tables. You may find that adding an index on the id columns in the two other would help (assuming that id be not already indexed):

CREATE INDEX some_idx_1 ON some_table (id);
CREATE INDEX some_idx_2 ON some_other_table (id);

This should speed up the lookup of the id in the two dependent tables.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=336650&siteId=1