A little summary of SQL clause execution order and Join

1. Cartesian product

As the name suggests, this concept gets its name from Cartesian. In mathematics, the Cartesian product, also known as the direct product, of two sets X and Y, expressed as X × Y, is the first object of which is X and the second object is all possible ordered pairs of a member of Y.

Assuming set A={a,b}, set B={0,1,2}, then the Cartesian product of the two sets is {(a,0),(a,1),(a,2),( b,0),(b,1), (b,2)}. Can be extended to the case of multiple collections. A similar example is, if A represents the set of students in a certain school, and B represents the set of all courses in the school, then the Cartesian product of A and B represents all possible course selections.


2. Join type  

A cross join is a Cartesian product that is the number of rows in one table multiplied by the number of rows in another table. The
inner join returns only the matches of the join columns of the two tables. The
left join joins the columns of the first table in the second table If there is no match , the value in the second table returns null. The join column of the
right join
in the second table does not match in the first table, and the value in the first table returns null.  The
full join
returns the rows in the two tables left join+right join.

3. When performing various types of joins (cross, left, right, full, inner) on two tables, it is necessary to construct a Cartesian product.

Sometimes it is unbelievable to think about it. If two extra large tables are joined, will the sql directly add the Cartesian product? Isn’t the conditional filtering of on beforehand? How big is the amount of data?

 

4. Check MSDN to understand the execution order of the entire SQL.

http://msdn.microsoft.com/en-us/library/ms189499(v=SQL.100).aspx

Processing Order of the SELECT statement
The following steps show the processing order for a SELECT statement.

1.FROM

2.ON

3.JOIN

4.WHERE

5.GROUP BY

6.WITH CUBE or WITH ROLLUP

7.HAVING

8.SELECT

9.DISTINCT

10.ORDER BY

11.TOP

 

That is to say, the on filter is performed first, and then the join is performed, which avoids the huge data of the Cartesian product of all the data generated by the two large tables. 

When these steps are executed, each step produces a virtual table that is used as input for the next step. These virtual tables are not available to callers (client applications or external queries). Only the table generated in the last step is returned to the caller.

If a clause is not specified in the query, the corresponding step will be skipped.

 

Below is an illustration of the SQL execution order given in the book <<Inside Microsoft SQL Server 2008 T-SQL Querying>>.

 

5. Is it more efficient or less efficient to put the rest of On's filter conditions in Where?

select * from table1 as a

inner join table2 as b on a.id=b.id and a.status=1

 

select * from table1 as a

inner join table2 as b on a.id=b.id

where a.status=1

Check MSDN and it will be clear. http://msdn.microsoft.com/en-us/library/ms189499(v=SQL.100).aspx

There can be predicates that involve only one of the joined tables in the ON clause. Such predicates also can be in the WHERE clause in the query. Although the placement of such predicates does not make a difference for INNER joins, they might cause a different result when OUTER joins are involved. This is because the predicates in the ON clause are applied to the table before the join, whereas the WHERE clause is semantically applied to the result of the join.

 

After the translation, if it is an inner join, putting on and where will produce the same results, but it does not say which is more efficient and faster? If there is an outer join (left or right), there is a difference, because on takes effect first, already Part of the data is filtered in advance, and where takes effect later.

To sum up, it feels more efficient to put it in on, because it is executed before where.

 

I heard that the actual result can be judged through the query plan of sql, and I will study it tomorrow. Experts are welcome to give criticism and correction.

 

********************************************************************************************************

2011/11/21 Latest experience

I just saw that the description of the connection in the book <<Microsoft SQL Server 2008 Technology Insider: T-SQL Query>> is not the same as what I understood earlier;

Itzib said in the book that the Cartesian product is first, and then the on filter is performed. If the join is inner, it continues to go down. If the join is a left join, the data in the left main table filtered by on is added back; then Then execute the filter in where;

On is not the final filter, because the left join may be added back later, and where is the final filter.

Only when the outer join (left, right) is used, there is this difference between on and where. If inner join is used, it is the same wherever it is formulated, because after on is where, and there are no other steps in between.

********************************************************************************************************

References:

SELECT (Transact-SQL)
http://msdn.microsoft.com/en-us/library/ms189499(v=SQL.100).aspx

 

FROM (Transact-SQL)

http://msdn.microsoft.com/en-us/library/ms177634(v=SQL.100).aspx

 

Various stages in SQL Server query processing (SQL execution order)
http://www.cnblogs.com/chinabc/articles/1597198.html 

 

It is more efficient to put the condition in ON or WHERE when INNER JOIN
http://social.msdn.microsoft.com/Forums/zh-CN/sqlserverzhchs/thread/e1198287-96d5-4e9e-b1d0-d2d4f5ba4e20

 

Operation order or principle of join statement
http://social.msdn.microsoft.com/Forums/zh-CN/sqlserverzhchs/thread/6f61bd10-6fb9-4035-bd51-d9cc13f7132a/

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326470659&siteId=291194637