Understanding the SQL execution plan

To understand the execution plan, you have to understand it first, then all kinds of nouns. Given that I don't know very well. This article is intended to be written only if it is understood, but not understood.

  At the beginning, we must explain that the first time we look at the execution plan, we must note that the execution plan of SQL Server is viewed from right to left.

  Noun analysis:

  Scan: traverse the data line by line.

  First create a table and show you what it looks like.

Copy code
  CREATE TABLE Person(
      Id int IDENTITY(1,1) NOT NULL,
      Name nvarchar(50) NULL,
      Age int NULL,
      Height int NULL,
      Area nvarchar(50) NULL,
      MarryHistory nvarchar(10) NULL,
      EducationalBackground nvarchar(10) NULL,
      Address nvarchar(50) NULL,
      InSiteId int NULL
  ) ON [PRIMARY]
Copy code

  The data in the table is about 140,000, which is similar to the following:

  

  There is no index for this table.

1. Data access operations

 1. Table scan

  Table scan: When a heap table occurs and no available indexes are available, a table scan occurs, indicating that the entire table is scanned once.

  Now, let's execute a simple query statement on this table:

  SELECT * From Person WHERE Name = '公子'

  View the execution plan as follows:

  

  Table scan, as the name implies, is to scan the entire table to find the data you need.

 2. Clustered index scan

  Clustered index scan: occurs in a clustered table, which is also equivalent to a full table scan operation, but when the conditions for the clustered column are (WHERE Id> 10), the efficiency will be better.

  Next we add a clustered index to this table in the Id column

  CREATE CLUSTERED INDEX IX_Id ON Person(Id)

  Execute the same query again:

  SELECT * From Person WHERE Name = '公子'

  The execution plan is as follows:

  

  Why does the clustered index built in the Id column affect the scan? Not to mention that it has nothing to do with the Name condition?

  In fact, after you add a clustered index, the table changes from a heap table to a clustered table. We know that the data of the clustered table exists in the leaf-level nodes of the clustered index. Therefore, the aggregation scan is not much different from the table scan. If there is a big difference, it depends on what is in the where condition and the data returned later. As far as this SQL statement is concerned, there is not much difference in efficiency.

  You can look at the I / O statistics:

  Table scan:

  

  Clustered index scan:

  

  This is beyond the scope of this article, and efficiency is not within the scope of this article. This article only considers the differences between various scans and why they occur.

 3. Clustered index lookup

  Clustered index lookup: Scan a specific range of rows in the clustered index.

  Look at executing the following SQL statement:

  SELECT * FROM Person WHERE Id = '73164'

  The execution plan is as follows:

  

 4. Index scan

  Index scan: scan the non-clustered index as a whole.

  Let's add a clustered index and execute a query statement:

  CREATE  NONCLUSTERED  INDEX IX_Name ON Person (Name)     -create a non-clustered index 

  SELECT Name FROM Person

  View the execution plan as follows:

  

  Why choose index scan (non-clustered index) here?

  Because this non-clustered index can cover the required data. What if the non-clustered index cannot be covered? For example, we change SELECT to SELECT * and take a look.

  

  Obviously, there are too many records in the returned result, and it is not cost-effective to use a non-clustered index. Therefore, a clustered index is used.

  If we delete the clustered index at this time, then execute SELECT * to see.

  DROP INDEX Person.IX_Id

  

  There is no clustered index at this time, so only the table scan is used.

 5. Bookmark search

  In the previous study of indexing, we already know that when non-clustered indexes do not cover and contain all the required columns, SQL Server will choose to directly perform a clustered index scan to obtain data, or go to the non-clustered index to find the clustered index key Then use the clustered index to find the data.

  Let's look at an example of bookmark search:

  SELECT  *  FROM Person WHERE Name =  ' Fatty ' --Name column has a non-clustered index

  The execution plan is as follows:

  

  The above process can be understood as follows: First, find the line asked for by the non-clustered index, but the index does not include all the columns, therefore additional to the basic table also find these columns, so to be key lookup, if the base table is Organized by heap, then this key lookup (Key Lookup) will become RID lookup (RID Lookup), key lookup and RID lookup are collectively called bookmark lookup. However, sometimes when the number of rows returned by a non-clustered index is too large, SQL Server may choose to perform a clustered index scan directly.

Second, the flow aggregation operation

 1. Stream aggregation

  Stream aggregation: Calculate the summary value of multiple sets of rows in the corresponding sorted stream.

  All aggregation functions (such as COUNT (), MAX ()) will have stream aggregation, but they will not consume IO, only CPU.

  For example, execute the following statement:

  SELECT MAX(Age) FROM Person

  View the execution plan as follows:

  

 2. Calculate scalar

  Calculate scalar: calculate a new value based on the existing value in the row. For example, COUNT () function, if there is one more line, the number of lines will increase by 1.

  Aggregation functions other than the MIN and MAX functions require stream aggregation operations followed by a calculated scalar.

  SELECT COUNT(*) FROM Person

  View the execution plan as follows:

  

3. Hash aggregation (hash matching)

  For the clause with Group by added, because the data needs to be ordered according to the columns behind group by, Sort is required to ensure sorting. Note that the Sort operation is an operation that occupies memory. When the memory is insufficient, it will also occupy tempdb. SQL Server will always choose the lowest cost among Sort operations and hash matching .

  SELECT Height, COUNT (Id) FROM Person     -find out the height of 
  group  BY Height

  The execution plan is as follows:

  

  For larger data volumes, SQL Server chooses hash matching.

  After the hash table is created in memory, the value following group by will be used as the key, and then each piece of data in the collection will be processed in turn. When the key does not exist in the hash table, an entry will be added to the hash table. When the list exists, the value in the hash table is calculated according to the rules (the rules are aggregate functions, such as Sum, avg, etc.).

 4. Sorting

  When the amount of data is low, for example, execute the following statement to create a new table with only tens of records and the same as Person.

  SELECT * INTO Person2 FROM Person2
  WHERE Id < 100

  Then execute the same query statement:

  SELECT Height, COUNT (Id) FROM Person2     -but the table is replaced with a table with a smaller amount of data 
  GROUP  BY Height

  The execution plan is as follows:

  

3. Connection

  When multi-table joins (including bookmark search and joins between indexes), SQL Server uses three different types of joins: loop nested joins, merge joins, and hash joins. These several connection formats have their own scenarios, and there is no better way to say.

  Create two new tables as follows

  

  This is a simple news, column structure.

 1. Nested loop

  First look at a simple Inner Join query statement

  SELECT * FROM Nx_Column AS C
  INNER JOIN Nx_Article AS A
  ON A.ColumnId = C.ColumnId

  The execution plan is as follows:

  

  The icon of the loop nesting connection is also very vivid, which is on the upper outer input (Outer input), which is also a clustered index scan. And the inner input (Inner Input) below, which is the clustered index lookup. The external input is executed only once, and the internal input is searched according to each line where the external input meets the Join condition. Since there are 7 lines here, the internal input is executed 7 times.

  

  According to the principle of nested loops, it is not difficult to see that because the external input is a scan and the internal input is a search, when the external input result set of the two Join tables is relatively small, and the internal input looks for a very large table, the query optimizer is more inclined To select the loop nesting method.

 2. Merge connection

  Unlike nested loops, merge joins perform only one access from each table. From this principle, merging joins is much faster than loop nesting.

  It is not difficult to imagine from the principle of merging connections. First, merging connections requires both parties to be in order. And the condition for joining is equal to sign. Because the two input conditions are already ordered, take a row from each input set for comparison, return equal, discard unequal, and it is not difficult to see why Merge join only allows Join after the equal sign. We can see this principle from the icon in Figure 11.

  SELECT * FROM Nx_Column AS C
  INNER JOIN    Nx_Article AS A
  ON A.ColumnId = C.ColumnId
  OPTION(MERGE join)

  The execution plan is as follows:

  

  If both sides of the input data are out of order, the query analyzer will not choose a merge connection. We can also use the merge prompt to force the merge connection. To achieve this, the execution plan must add a sorting step to achieve order. This is why the above SQL statement should add OPTION (MERGE join). The ColumnId column of the Article table is sorted as described above.

 3. Hash connection

  Hash connection also only needs to access the data of both parties only once. Hash join is achieved by creating a hash table in memory. This is more memory-intensive, and tempdb can also be used if there is insufficient memory. But it is not as orderly as the merged connection.

  To perform the following two implementations, the clustered index of the two columns must not be built in the ColumnId column, otherwise hash join will not be used.

  ALTER  TABLE PK_Nx_Column DROP  CONSTRAINT PK_Nx_Column     -delete the primary key 
  DROP  INDEX Nx_Column.PK_Nx_Column    -delete the clustered index 
  CREATE  CLUSTERED  INDEX IX_ColumnName ON Nx_Column (ColumnName)-create the     clustered index 
  -here you can set the primary key back, with the clustered index, It cannot be built by default with the primary key

  Also delete the clustered index of another table Article.

  Then execute the following query:

  SELECT * FROM Nx_Column AS C
  INNER JOIN    Nx_Article AS A
  ON A.ColumnId = C.ColumnId

  The execution plan is as follows:

  

    To delete the clustered index, otherwise the two ordered input SQL Server will choose a lower cost merge connection. SQL Server uses the two above inputs to generate a hash table, and the following inputs to detect, you can see this information in the properties window, as shown in Figure 15.

    Generally speaking, when the requested data is fulfilled when one or both parties are not ordered, hash matching will be used.

Four, parallel

  When multiple tables are connected, SQL Server also allows query parallelism in the case of multiple CPUs or multiple cores, which undoubtedly improves efficiency.

Reprinted the explanation of a great god, the original text comes from: http://www.cnblogs.com/kissdodog/p/3160560.html

 

 

Published 22 original articles · praised 7 · 100,000+ views

Guess you like

Origin blog.csdn.net/qyx0714/article/details/70161400