[MySQL performance optimization] Improve the low efficiency of MySQL Order By Rand()

Original link: http://click.aliyun.com/m/9153/

Text:

    Recently, due to the need to study the implementation method of MYSQL random extraction. For example, to randomly extract a record from the tablename table, the general way of writing is:
SELECT * FROM content ORDER BY RAND() LIMIT 1
.
[Piaoyi Note: It takes 0.3745 seconds to query 30,000 records (the same below); it can be seen from the mysql slow query log that "ORDER BY RAND()" has scanned the whole table twice! ] Later, I checked the official MYSQL manual. The hint for RAND() probably means that the RAND() function cannot be used in the ORDER BY clause, because this will cause the data column to be scanned multiple times. But in MYSQL 3.23, random can still be achieved by ORDER BY RAND(). But the actual test found that this efficiency is very low. In a database with more than 150,000 records, it takes more than 8 seconds to query 5 records of data. Looking at the official manual, it also says that rand() will be executed multiple times in the ORDER BY clause, which is naturally inefficient and very low. Search Google, use JOIN, query max(id) * rand() to get data randomly.
SELECT *
FROM `content` AS t1 JOIN (SELECT ROUND(RAND() * (SELECT MAX(id) FROM `content`)) AS id) AS t2
WHERE t1.id >= t2.id
ORDER BY t1.id ASC LIMIT 1;

[The query takes 0.0008 seconds, Piaoyi thinks that this statement can be recommended! ! ] But this will produce 5 consecutive records. The solution can only be to query one at a time, and query 5 times. Even so, it's worth it, because the query only takes less than 0.01 seconds for a table with 150,000 entries. There is a method:
SELECT * FROM `content` AS a JOIN ( SELECT MAX( ID ) AS ID FROM `content` ) AS b ON ( a.ID >= FLOOR( b.ID * RAND( ) ) ) LIMIT 5; 

above This method ensures randomness within a certain range, and the query takes 0.4265 seconds, which is not recommended. The following statement, someone on the mysql forum used
SELECT *
FROM `content`
WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `content` )
ORDER BY id LIMIT 1;
 
[The query took 1.2254 seconds, Piaoyi is strongly not recommended! Because after the actual measurement, this statement will scan 5 million rows of a table with 30,000 rows! ! ] There is still a big gap with the above statement. Always feel that something is wrong. So I rewrote the sentence.
SELECT * FROM `content`
WHERE id >= (SELECT floor(RAND() * (SELECT MAX(id) FROM `content`))) 
ORDER BY id LIMIT 1;

[The query takes 0.0012 seconds] This time, the efficiency is improved again, and the query time is only 0.01 seconds. Finally, improve the statement and add the judgment of MIN(id). When I first tested, because I didn't add the MIN(id) judgment, the result was always querying the first few rows in the table half of the time.
The full query is:
SELECT * FROM `content`
WHERE id >= (SELECT floor( RAND() * ((SELECT MAX(id) FROM `content`)-(SELECT MIN(id) FROM `content`)) + ( SELECT MIN(id) FROM `content`))) 
ORDER BY id LIMIT 1;

[Query took 0.0012 seconds]
SELECT *
FROM `content` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM ` content`)-(SELECT MIN(id) FROM `content`))+(SELECT MIN(id) FROM `content`)) AS id) AS t2
WHERE t1.id >= t2.id
ORDER BY t1.id LIMIT 1 ;
  
[The query takes 0.0008 seconds] Finally, the two statements are queried 10 times in php respectively,
the former takes 0.147433 seconds and
the latter takes 0.015130 seconds
It seems that using JOIN syntax is much more efficient than using functions directly in WHERE. (via) =========================================
[Okay, finally Piaoyi will summarize Bottom]:
The first solution, the original Order By Rand() method:
$sql="SELECT * FROM content ORDER BY rand() LIMIT 12";
$result=mysql_query($sql,$conn);
$n= 1;
$rnds='';
while($row=mysql_fetch_array($result)){
$rnds=$rnds.$n.".<a href='show".$row['id']."-" .strtolower(trim($row['title']))."'>".$row['title']."</a><br />\n";
$n++;
}

30 thousand data search 12 random records take 0.125 seconds, and as the amount of data increases, the efficiency becomes lower and lower. The second solution, the improved JOIN method:
for($n=1;$n<=12;$n++){
$sql="


$result=mysql_query($sql,$conn);
$yi=mysql_fetch_array($result);
$rnds = $rnds.$n.".<a href='show".$yi['id']."- ".strtolower(trim($yi['title']))."'>".$yi['title']."</a><br />\n";
}

Check 12 of 30,000 data Random recording takes 0.004 seconds, and the efficiency is greatly improved, which is about 30 times higher than the first scheme. Disadvantages: Multiple select queries, high IO overhead. The third scheme, the SQL statement first randomizes the ID sequence, and uses IN to query (Piaoyi recommends this usage, with low IO overhead and the fastest speed):
$sql="SELECT MAX(id),MIN(id) FROM content";
$result=mysql_query($sql,$conn);
$yi=mysql_fetch_array($result);
$idmax=$yi[0];
$idmin=$yi[1];
$idlist='';   
for($i= 1;$i<=20;$i++){   
if($i==1){ $idlist=mt_rand($idmin,$idmax); }   
else{ $idlist=$idlist.','.mt_rand($idmin ,$idmax);   


$sql="select * from content where id in ($idlist) order by field($idlist2) LIMIT 0,12";
$result=mysql_query($sql,$conn);
$n=1;
$rnds='' ;
while($row=mysql_fetch_array($result)){
$rnds=$rnds.$n.".<a href='show".$row['id']."-".strtolower(trim($row ['title']))."'>".$row['title']."</a><br />\n";
$n++;
}

30,000 pieces of data to check 12 random records, need 0.001 Second, the efficiency is about 4 times higher than that of the second method, and 120 times higher than that of the first method. Note that order by field($idlist2) is used here to avoid sorting, otherwise IN will be sorted automatically. Disadvantage: It is possible to encounter the situation that the ID is deleted, so you need to select several IDs. Test method:
$t = microtime(true);
//Execute the statement
echo microtime(true) - $t;

 
See more articles: http://click.aliyun.com/m/9153/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326436859&siteId=291194637