How better to do SQL queries

  Query-based set of methods and procedures

  The reverse model implicit in the fact that there are differences between the methods and procedures based on the establishment of a set of queries.

  The method is a method to query a program very similar to programming: you tell the system what needs to be done and how to do it. For example, a sample of an article, and then by performing one function calls another function to query the database, or containing cyclic, conditions and user-defined function (UDF) a logical way to obtain the final query result. You will find this way, layer by layer has been requested subset of data. This method is also often referred to as progressive or progressive inquiry.

  Another set of methods is based on specifying the operation to be performed. Thing to do is to use this method, you want to specify the conditions and requirements by querying the results obtained. In the process of retrieving data, you do not need to focus on mechanisms to achieve internal query: database engine algorithms and logic will determine the best execution of the query.

  Because SQL is based, so this method is more effective than a set of procedural methods, which explains why, in some cases, SQL can be faster than the code to work.

  Based on a set of query method of analysis it is the data mining industry requires you to master the skills! Because you need skilled to switch between the two methods. If you found the program to check their query, you should consider whether you need to rewrite this part.

  From the query execution plan to

  Reverse mode is not static. In SQL Developer you become in the process, to avoid the reverse model and query rewrite the query may be a very difficult task. So often require the use of tools in a more structured approach to optimize your query.

  Reflections on the performance requires not only a more structured approach, but also need more in-depth approach.

  However, such a structured and thorough approach is mainly based on the query plan. Query plan was first resolved to "parse tree" and a precise definition of what each operation algorithm and how to coordinate the operation.

  Query Optimization

  When optimizing a query, you may need to manually check the optimizer generates a plan. In this case, you will need to analyze the query again to view the query plan.

  To master such a query plan, you need to use some database management system provides the tool for you. You can use some of the following tools:

  Some packages function tool can generate a graphical representation of the query plan.

  Other tools can provide query plan for your text description.

  请注意,如果你正在使用 PostgreSQL,则可以区分不同的 EXPLAIN,你只需获取描述,说明 planner 如何在不运行计划的情况下执行查询。同时 EXPLAIN ANALYZE 会执行查询,并返回给你一个评估查询计划与实际查询计划的分析报告。一般来说,实际执行计划会切实的执行这个计划,而评估执行计划可以在不执行查询的情况下,解决这个问题。在逻辑上,实际执行计划更为有用,因为它包含了执行查询时,实际发生的其它细节和统计信息。

  接下来你将了解 XPLAIN 和 ANALYZE 的更多信息,以及如何使用这两个命令来进一步了解你的查询计划和查询性能。要做到这一点,你需要开始使用两个表: one_million 和 half_million 来做一些示例。

  你可以借助 EXPLAIN 来检索 one_million 表的当前信息:确保已将其放在运行查询的首要位置,在运行完成之后,会返回到查询计划中:

  

复制代码

 

  EXPLAINSELECT *FROM one_million;QUERY PLAN_________________________________________________Seq Scan on one_million(cost=0.00..18584.82 rows=1025082 width=36)(1 row)

  

复制代码

 

  在以上示例中,我们看到查询的 Cost 是0.00..18584.82 ,行数是1025082,列宽是36。

  同时,也可以借助 ANALYZE 来更新统计信息 。

  

复制代码

 

  ANALYZE one_million;EXPLAINSELECT *FROM one_million;QUERY PLAN

  _________________________________________________Seq Scan on one_million(cost=0.00..18334.00 rows=1000000 width=37)(1 row)

  

复制代码

 

  除了 EXPLAIN 和 ANALYZE,你也可以借助 EXPLAIN ANALYZE 来检索实际执行时间:

  

复制代码

 

  EXPLAIN ANALYZESELECT *FROM one_million;QUERY PLAN___________________________________________________Seq Scan on one_million(cost=0.00..18334.00 rows=1000000 width=37)(actual time=0.015..1207.019 rows=1000000 loops=1)Total runtime: 2320.146 ms(2 rows)

  

复制代码

 

  使用 EXPLAIN ANALYZE 的缺点就是需要实际执行查询,这点值得注意!

  到目前为止,我们看到的所有算法是顺序扫描或全表扫描:这是一种在数据库上进行扫描的方法,扫描的表的每一行都是以顺序(串行)的顺序进行读取,每一列都会检查是否符合条件。在性能方面,顺序扫描不是最佳的执行计划,因为需要扫描整个表。但是如果使用慢磁盘,顺序读取也会很快。

  还有一些其它算法的示例:

  

复制代码

 

  EXPLAIN ANALYZESELECT *FROM one_million JOIN half_millionON (one_million.counter=half_million.counter);QUERY PLAN_____________________________________________________________Hash Join (cost=15417.00..68831.00 rows=500000 width=42)(actual time=1241.471..5912.553 rows=500000 loops=1)Hash Cond: (one_million.counter = half_million.counter) - Seq Scan on one_million (cost=0.00..18334.00 rows=1000000 width=37) (actual time=0.007..1254.027 rows=1000000 loops=1) - Hash (cost=7213.00..7213.00 rows=500000 width=5) (actual time=1241.251..1241.251 rows=500000 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 770kB - Seq Scan on half_million (cost=0.00..7213.00 rows=500000 width=5)(actual time=0.008..601.128 rows=500000 loops=1)Total runtime: 6468.337 ms

  

复制代码

 

  我们可以看到查询优化器选择了 Hash Join。请记住这个操作,因为我们需要使用这个来评估查询的时间复杂度。我们注意到了上面示例中没有 half_million.counter 索引,我们可以在下面示例中添加索引 :

  

复制代码

 

  CREATE INDEX ON half_million(counter);EXPLAIN ANALYZESELECT *FROM one_million JOIN half_millionON (one_million.counter=half_million.counter);QUERY PLAN______________________________________________________________Merge Join (cost=4.12..37650.65 rows=500000 width=42)(actual time=0.033..3272.940 rows=500000 loops=1)Merge Cond: (one_million.counter = half_million.counter) - Index Scan using one_million_counter_idx on one_million (cost=0.00..32129.34 rows=1000000 width=37) (actual time=0.011..694.466 rows=500001 loops=1) - Index Scan using half_million_counter_idx on half_million (cost=0.00..14120.29 rows=500000 width=5)(actual time=0.010..683.674 rows=500000 loops=1)Total runtime: 3833.310 ms(5 rows)

  

复制代码

 

  通过创建索引,查询优化器已经决定了索引扫描时,如何查找 Merge join。

  请注意,索引扫描和全表扫描(顺序扫描)之间的区别:后者(也称为“表扫描”)是通过扫描所有数据或索引所有页面来查找到适合的结果,而前者只扫描表中的每一行。

Guess you like

Origin blog.csdn.net/qianfeng_dashuju/article/details/93747466