SQL query performance tuning - how to make queries faster

Translated from https://mode.com/sql-tutorial/sql-performance-tuning

From here you start? This is part of the data analysis using SQL tutorial, see the beginning of the tutorial .

 

Subqueries That lesson we can realize faster by running to the statement to get the same result set. In this lesson, you will learn to recognize the query optimization points, and how to optimize.

 

The theory behind the query time

 

Is a database software running on the computer, like all software, speed limited by the same "ceiling" - the largest amount of information it can handle the hardware is, it can handle a maximum amount of information. Make a query run faster is to reduce the number of computing software (ie, hardware) that must be processed. To reduce the amount of computing necessary that you need to understand how SQL is calculated. First, let us say a few affect the calculation of the number of factors in turn affect the surface of query time:

  • Table size: If you design a query one or more sheets have millions of rows or more rows of the table, that may affect performance.
  • Merge: If you merge two tables, the combined number of rows in the result set to increase significantly, then your query might be slow. In the subquery course in there is one such example.
  • Polymerization: polymerization to generate a plurality of rows results than simply obtain these lines requires more computation.

Query time also depends on something related with the database itself, these things are not controllable:

  • Other users run queries: The more concurrent database queries, within a given time, the more the database to be processed, all queries will also run more slowly. In particular, people run to meet some of the above situation takes a lot of resources of a query, the query speed will be particularly poor.
  • Database software and optimization: This is perhaps you can not control, but if you are familiar with the software systems you use, you can maximize the use of it to make your queries more efficient.

Now, let's ignore that you can not control, you can control attention.

 

Reduce the number of rows in the table

 

You only need to filter out data can greatly improve query speed. How to filter data entirely depends on the problem you want to solve. For example, if you have time field data, a small window of time limit can make your queries run much faster:

SELECT *
  FROM benn.sample_event_table
 WHERE event_date >= '2014-03-01'
   AND event_date <  '2014-04-01'

Remember that you can complete in the first subset of data exploratory analysis, then remove the restrictions, to run on the entire data set to complete the final query. The final query might want to run for a long time, but at least you can quickly run intermediate steps.

That's why adding Limit Mode default clause --100 line enough for you to decide what the next step of the analysis, and it returns results faster speed.

When used together with a polymerization LIMIT function, the function to perform the polymerization is completed, and then returns the result set is limited specified number of rows, the effect is not the same as with the example above, it would not make sense if so used LIMIT. If you like this all the following, when the line to get the results of the use of aggregate functions, LIMIT 100 has no effect, so the query speed will not speed up.

SELECT COUNT(*)
  FROM benn.sample_event_table
 LIMIT 100

If you want to limit the data set before performing the count function (to speed the query), try to limit in a subquery:

SELECT COUNT(*)
  FROM (
    SELECT *
      FROM benn.sample_event_table LIMIT 100 ) sub

 Note: Use this LIMIT will completely change your results, so you should use it to test your query logic, to get real results not to be so used.

In general, when using sub-queries, you should limit the amount of data is recognized when, in the statement you want to limit the beginning of implementation. Meaning that the LIMIT in a subquery, rather than external queries. Again, just to make the query speed, so you can test logic, not to get the final result.

 

Simplification of the joint inquiry

 

In part, this is a recommendation before the expansion. Like the earlier statement to be executed limit the amount of data, the size of the table would be better to limit prior to the merger. See the examples below, in this case, the university sports teams and athletes table information table jointly with the University of name fields:

SELECT teams.conference AS conference,
       players.school_name,
       COUNT(1) AS players
  FROM benn.college_football_players players
  JOIN benn.college_football_teams teams ON teams.school_name = players.school_name GROUP BY 1,2 

inbenn.college_football_players表中有26,298行。那就意味着对于另一张表的每一行,26,298行都要进行计算比较进行匹配。但是如果事先对benn.college_football_players这张表进行聚合,你可以减少需要匹配的行数。首先,让我们看一下这个聚合操作:

SELECT players.school_name,
       COUNT(*) AS players
  FROM benn.college_football_players players
 GROUP BY 1

The above query returned 252 results. So the above query into a sub-query, and then again to match the query will greatly reduce the consumption of matches in the outer layer:

SELECT teams.conference,
       sub.*
  FROM (
        SELECT players.school_name,
               COUNT(*) AS players
          FROM benn.college_football_players players
         GROUP BY 1
       ) sub
  JOIN benn.college_football_teams teams
  ON teams.school_name = sub.school_name 

In this case, you will not find a big difference, because the 30,000 line for the database to deal with them is not difficult. But if it is hundreds of millions, or even more rows, after polymerization operation before the match, you will see a significant improvement. When you do use subqueries to ensure the reasonableness of your query logic - first consider the accuracy of the work and then consider running speed.

 

EXPLAIN

 

You can add EXPLAIN in front of any (valid) query, you can get an estimate of the query time-consuming. It is not absolutely accurate, but it is a useful tool. Try running the following statement:

EXPLAIN
SELECT *
  FROM benn.sample_event_table
 WHERE event_date >= '2014-03-01' AND event_date < '2014-04-01' LIMIT 100

You'll get this output. It's called the query plan, it shows you the order of query execution (with the other table, the same statement):

One at the bottom of the list is the first to be executed. So, we plan to demonstrate the above date limit the scope of the WHERE clause will be executed first. Then, the database 600 will scan line (this is an approximate figure). You can see the number of the row next to consume - the greater the number means longer run time. These are just a reference, rather than the absolute accurate results. This is the right EXPLAIN Open: Run EXPLAIN, and then modify the cost of expensive steps, run EXPLAIN again, to observe whether consumption is reduced. Finally, LIMIT last statement is executed, and consumption is very small.

For more details refer to Postgres Documentation .

Guess you like

Origin www.cnblogs.com/heisenburg/p/11600847.html