How to speed up sql with GROUP BY and WHERE?

Does :

I know that using index could optimize sql with GROUP BY or where condition. But how to optimize sql with GROUP BY and WHERE ? Please see my case.

I have a table to store traffic data and use it to draw a network traffic topology. Below is the table structure :

DROP TABLE IF EXISTS `data`;
CREATE TABLE `data` (
  `sip` varbinary(16) DEFAULT NULL,
  `dip` varbinary(16) DEFAULT NULL,
  `app` char(96) DEFAULT NULL,
  `up` bigint(20) DEFAULT NULL,
  `down` bigint(20) DEFAULT NULL,
  `dtime` datetime DEFAULT CURRENT_TIMESTAMP,
  KEY `dtime` (`dtime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

And create index on column dtime.

The simplified SQL is

SELECT
    INET6_NTOA(sip),
    INET6_NTOA(dip),
    app,
    sum(up) AS up,
    sum(down) AS down
FROM
    `data`
WHERE
    `data`.dtime > FROM_UNIXTIME(1583031879)
AND `data`.dtime < FROM_UNIXTIME(1585537477)
GROUP BY
    sip,
    dip,
    app

The table can store about 10,000,000 records for one month and our requirement is to draw a network traffic topology with granularity for Last 30 days, Last 24 hours, Last 1 hour.

Obviously the index dtime is helpful to query data for last 1 hour or last 24 hours. But when query for last 30 days, it's a full table scan.

In extreme case, quering for 24 hours costs 5s, which is acceptable, while quering for 30 days costs 60s+ which is hard to accept.

Create index for sip,dip,app ? Seems not helpful, for I have to filter data by dtime at first. I googled the various indexing solutions, which may not suit for me.

Any ideas to speed up my SQL ? Or Any ideas to improve table design ? Thank you very much.

Rick James :

Simply put, a "range" in the WHERE prevents having the index being useful for the GROUP BY or ORDER BY. You could add INDEX(sip, dip, app) to give the Optimizer a choice.

Every table needs a PRIMARY KEY. Perhaps it could be PRIMARY KEY(sip, dip, app)? Or just (sip, dip)? Note, making this the PK would be better than a simple INDEX.

But the real performance gain for those reports would be to build and maintain a "summary table" with a granularity of 1 hour. One hour would let you efficiently get hour, day, month, etc. See http://mysql.rjweb.org/doc.php/summarytables And, because the summary table would be so much smaller, it won't matter when it needs a table scan.

Are the VARBINARY(16) some form hash? Or a pair of IP addresses? If they are fixed-length, consider CHAR(16).

How big can up and down be? Perhaps you don't need 8-byte BIGINT? (Saving space would help performance some.

MySQL uses only one index per SELECT. The Optimizer looks at each index (including the PRIMARY KEY) to estimate which one would be best. In your case, it would weigh the choice between using INDEX(dtime) for filtering versus using INDEX(sip,dip,app) to avoid sorting.

If the WHERE clause filters down to very few rows; the dtime index would be better, and the Optimizer would probably use it. And vice versa.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=385914&siteId=1