Analysis of SQL Server Blocking Causes

Here, by connecting the combination of field values ​​in sysprocesses to analyze the source of blocking, blocking can be divided into the following five common types (see table). waittype, open_tran, status, are all values ​​in sysprocesses. The meaning of the "self-repair?" column means whether the blockage can disappear automatically.

 5 Common Types of Blockages

Types of waittype open_tran status self-healing Reason/Other Characteristics
1 not 0 >=0 runnable Yes, when the statement finishes running The statement takes a long time to run, and it needs to wait for some system resources (such as hard disk read and write, CPU or memory, etc.).
2 0x0000 >0 sleeping No, but the link can easily be terminated if a KILL statement is run Maybe the client encountered a statement execution timeout, or actively canceled the execution of the previous statement, but did not roll back the opened transaction, you can see an Attention event in SQL Trace
3 0x0000
0x0800
0x0063
>=0 runnable can not. Knowing that the client has taken all results actively, or actively disconnected, you can run the KILL statement to terminate it, but it may take up to 30 seconds The client did not take all the results in time. At this time, open_tran may be 0, and the transaction isolation level is also the default (READ COMMITTED), but the connection will still hold lock resources.
4 0x0000 >0 rollback Yes In SQL Trace, you can see that this SPID has sent an Attention event, indicating that the client has encountered a timeout, or actively requested to roll back the transaction.
5 Various values ​​are possible >=0 runnable No, until the client cancels the statement running or actively disconnects. It is possible to run a KILL statement to terminate it, but it may take up to 30 seconds A deadlock occurs during the running of the application, which is manifested in the form of blocking in SQL Server. The blocked and blocked connection hostname values ​​in Sysprocesses are the same

 

The reasons for these types of occurrence, and how to solve them, are described in detail below.

Type 1: Blocking caused by the statement running for too long. The statement itself is running normally and only needs to wait for some system resources.

Solution:

To solve this type of blocking, database administrators need to work with database application designers to solve the following problems.

  1. Is there room for optimization in the statement itself?
    This includes modifying the statement itself to reduce the complexity, modifying the table design, adjusting the index, etc.
  2. How is the overall performance of SQL Server? Is there a resource bottleneck that affects the execution speed of the statement?
    When SQL Server encounters resource bottlenecks such as memory, hard disk read and write, and CPU, statements that can be completed quickly may take a long time.
  3. If the statements are inherently complex and cannot be tuned (which is the case with many statements that process reports), you must consider how to isolate this type of application (generally a data warehouse application) from the OLTP system.

 

Type 2: Blocking due to a transaction that did not commit as expected

The characteristic of this type of blocking is that the connection in question has already entered an idle state (sysprocesses.status='sleeping' and sysprocesses.cmd='AWAITING COMMAND'), but if you check sysprocesses.open_tran, you will find that it is not 0 , and the transaction is not committed. Many of these problems are caused by the application side encountering an execution timeout, or for other reasons, the statement executed at that time was terminated early, but the connection is still retained. The application did not follow the incoming transaction commit or rollback instruction, resulting in a transaction being left behind in SQL Server.

When encountering such problems, many users will mistakenly think that something is not handled properly on the SQL Server side. In fact, the execution timeout (command timeout) is completely a client behavior. When a client application sends a statement execution request to SQL Server, it will have an execution timeout setting. General ADO or ADO.NET connection timeout period is 30 seconds. If SQL Server does not complete the statement and returns any results within 30 seconds, the client sends an Attention message to SQL Server, telling SQL Server that it does not want to wait any longer. After SQL Server receives this message, it will terminate the currently running statement (or batch). However, in order to maintain the logic of the client, SQL Server will not automatically roll back or commit the transaction that the connection has been opened by default, but wait for the client's subsequent decision. If the client does not send a rollback or commit command, SQL Server will keep the transaction forever until the client disconnects.

The following experiment can be used to simulate this problem. Create a connection to SQL Server in Management Studio and run the following batch statement:

copy code
use sqlnexus
        go
BEGIN TRAN
SELECT *
FROM ReadTrace.tblInterestingEvents
WITH(HOLDLOCK) SELECT
    *
FROM sysobjects s1,sysobjects
    s2 COMMIT TRAN
copy code

Due to the use of the HOLDLOCK parameter, the first SELECT sentence will maintain a TAB S lock on the table after the operation is completed. If the batch completes, the lock will be released when the transaction is committed. But the SELECT of the second sentence will be executed for a long time. Please cancel the execution after waiting for 3 to 4 seconds. Then run the following statement to check for open_tran and locks.

SELECT @@TRANCOUNT
GO sp_lock GO

From the results (see figure) it can be known that:

(1) When the batch is canceled, the "COMMIT TRAN" statement is not executed. SQL Server doesn't do anything with the transaction opened by "BEGIN TRAN", it just keeps it active.

(2) The lock brought by the first sentence of SELECT is still maintained because the transaction is not over (objID=85575343, Type=TAB, Mode=IS).

Now, if any other connection wants to modify the ReadTrace.tblInterestingEvents table, it will be blocked.

Solution:

1. The application itself must be aware of the possibility of unexpected termination of the audit statement, and do a good job of error handling. These jobs include

  a) When making SQL Server calls, you must add error capture and processing statements

  SQL Server client drivers (including ODBC and OLE DB) return error messages to the application when execution of a statement encounters an unexpected termination (including a timeout). When the client catches an error message. In addition to logging (which is very helpful for problem identification), run the following sentence to roll back uncommitted transactions.

IF @@TRANCOUNT>0 ROLLBACK TRAN

 

  Some programmers will ask, I have already written T-SQL level error capture and processing statement (IF @@ERROR<>0 ROLLBACK TRAN) in T-SQL batch, is it necessary to make the application do it again? Be aware that some exceptions (such as timeouts) terminate the execution of the entire T-SQL batch, not just the current statement. So when these exceptions occur, the T-SQL level error catching and handling statements are likely to be canceled together. They don't work as imagined. Error catching and handling statements in an application are essential.

  b) Set the connection property "SET SACT_ABORT ON"

  When SET SACT_ABORT is ON, if the execution of the T-SQL statement generates a runtime error, the entire transaction will be terminated and rolled back

  When SET SACT_ABORT is OFF, the processing method is not unique. Sometimes only the T-SQL statement that produced the error is rolled back, and the transaction continues to be processed. If the error is serious, even if SET SACT_ABORT is OFF, the entire transaction may be rolled back. OFF is the default setting.

  If there is no way to quickly standardize your application's error-catching and handling statements, one of the quickest ways is to run "SET XACT_ABORT ON" after each connection is established, or at the beginning of a trouble-prone stored procedure, and let SQL Server help The application rolls back the transaction.

  c) Consider whether to close the connection pool

  General SQL Server applications will use connection pooling to get good performance. If there is a connection that forgets to close the transaction and pushes the connection, the connection will be returned to the connection pool, but the transaction will not be cleaned up at this time. The client driver will send a sp_reset_connection command to clean up all objects left over from the last connection of the current connection, including rolling back uncommitted transactions, when the connection is reused next time (and a new user wants to establish a connection). If the connection is not reused for a long time after it is returned to the connection pool, its transaction will continue for a long time, causing blocking. Some Java programs use drivers that provide connection pooling, but do not provide transaction cleanup when connections are reused. Such connection pools have high requirements on the quality of application development and are more prone to blocking.

    If recommendations a) and b) cannot be implemented quickly, closing the connection pool can shorten the duration of the transaction and alleviate the blocking problem to a certain extent.

2. Analyze why the connection encounters abnormal termination

  Here we have to talk about error message logging. With the error message, you can determine whether it is a timeout problem or other SQL Server error. If it is a timeout problem, it can be processed according to the first blocking.

    There is also a source of orphan transactions, which is that the connection opens an implicit transaction (implicit transaction) without adding a mechanism for timely committing the transaction. If the connection is in implicit transaction mode (SET IMPLICIT_TRANSACTIONS ON) and the connection is not currently in a transaction, executing any of the following statements will start a new transaction.

ALTER TABLE FETCH REVOKE
CREATE GRANT SELECT
DELETE INSERT TRUNCATE_TABLE
DROP OPEN UPDATE

For transactions that are automatically opened because this setting is ON, SQL Server will automatically open the transaction for you, but will not automatically commit it for you. The user must explicitly commit or roll back the transaction after it ends. Otherwise, when the user disconnects, the transaction and all data changes it contains will be rolled back. After the transaction is committed, executing any of the above statements will start a new transaction. Implicit transaction mode will always be in effect until the connection executes the SET IMPLICIT_TRANSACTIONS OFF statement to return the connection to autocommit mode. In autocommit mode, all individual statements will be committed upon successful completion, and no transactions will be left behind.

Why is there a connection to open an implicit transaction? In addition to the intention of the programmer, many of them are driven by the client database connection, or the space chooses this mechanism in order to realize its transaction function (note that it is not directly provided by SQL Server through T-SQL statements). If there is an accident in the application, or the script is not handled properly, there will be the phenomenon that the application layer transaction is not committed. It is also reflected as an orphan transaction in SQL Server. Strictly restricting the use of transactions by the application layer and directly using transactions in SQL Server is a good way to avoid this problem.

 

Type 3: The statement runs for a long time because the client does not fetch the result set in time.

         The total execution time of the statement in SQL Server includes not only the execution time of SQL Server, but also the time to send the result set to the client. If the result set is relatively large, SQL Server will package and send it several times, and each time it is sent, it must wait for the confirmation of the client. Only after confirmation, SQL Server will send the next result set package. After all the results are sent, SQL Server considers that the statement has been executed and releases the resources (including lock resources) requested for execution.

         If for some reason, the client application processes the results very slowly or even does not respond, or simply ignores the request of SQL Server to send a result set, SQL Server will wait patiently, which will cause the statement to be executed for a long time and block.

Solution:

  1. When designing a program, be careful to return large result sets. This behavior not only places a heavy burden on SQL Server and the network, but also takes a lot of resources to process the result set for the application itself. If the end user only needs part of the result set, it should be specified when sending the SQL Server command. It is necessary to avoid living in regardless of all the data, and the result set only takes the first part to show that such behavior occurs.
  2. If the application does have to return large result sets, such as some reporting systems, consider separating the reporting database from the production database.
  3. If 1 and 2 cannot be implemented in the short term, you can negotiate with the end user to use the READ UNCOMMITTED transaction isolation level for connections that return large result sets. In this way, the query statement will not apply for the S lock.

 

Type 4: The blocked origin connection has been in the rollback state.

This situation is often derived from the first type of situation. Sometimes the database administrator finds that a connection is blocking others. In order to solve the problem, the connection will be actively quit or forced to quit (lightly quit the application, or KILL the connection directly on the SQL Server side). For most cases, these measures will eliminate blocking. But it should be remembered that, whether it is exiting on the client side or KILL on the server side, in order to maintain the consistency of database transactions, SQL Server will roll back the transactions that the connection has not had time to complete the commit. SQL Server finds all records modified by the current transaction and changes them back to their original state. So, if a DELETE, INSERT or UPDATE has been running for an hour, it may take an hour to roll back, and in the process, the blocking will continue, and we can only wait.

Some users may not wait and restart SQL Server directly. When SQL Server shuts down, the rollback action will be interrupted, SQL Server will be shut down soon, but the rollback action will be restarted the next time SQL Server restarts (when the database is being restored). If the rollback cannot be completed quickly during restart, the entire database will be unavailable, which may bring more serious consequences.

Solution:

The best way is to try not to make such large modifications during working hours. Try to arrange these operations to be completed in the middle of the night or on weekends. If the operation has been done for a long time, it is best to wait patiently for it to finish. If it must be done when there is a workload, it is best to divide a large operation into several small operations and complete them in steps.

 

Type 5: Deadlock occurs during application operation, which is manifested in the form of blocking in SQL Server.

A client application will use many resources during the running process, including thread resources, semaphore resources, memory resources, IO resources, etc. SQL Server is also one of the resources. If both ends of the deadlock are not all SQL Servers, the deadlock judgment mechanism of SQL Server may not work. At this time, if the application side does not handle it properly, it may wait forever. And SQL Server internal performance may just be a blocking. But this blockage will not be automatically removed. Such blocking can have a great impact on the performance of SQL Server.

Below we give two examples of this application-side deadlock.

1) Deadlock caused by opening more than one database connection in one thread of the application (see figure).

Suppose the application has a thread with this logic:

    ● start running

    ● Establish database connection A and call stored procedure ProcA. Open result set A.

    ● Establish database connection B and call stored procedure ProcB. Open result set B.

    ● Read the result set A and B in turn, and integrate the final result.

    ● Close result sets A and B, and shut down connections A and B.

    ● end the run

Such a design looks fine under normal circumstances, but is actually quite fragile. Because inside the thread, this logic is executed by the thread. Suppose the stored procedure ProcA is a transaction. Before returning the result set, some exclusive locks are applied for some operations, and ProcB uses these locks in order to return the result. What will happen?

What happens is that connection A is waiting for the thread to read the results on connection B before processing result set A, while connection B waits for connection A to complete the transaction before releasing the lock. The two sides waited for each other and thought.

1) Deadlock between two threads (see figure).

If the application has two threads, each of which opens a database connection, the above logic will not be a problem. Because the thread running ProcA will finish first, release the lock blocking connection B, so that B can also finish running. But suppose there is the following logic:

Thread A: establishes database connection A, continuously reads table A, fetches records one by one, and sends it to the input cache of thread B after certain processing.

Thread B: establishes database connection B, reads data from the input cache, and modifies table A according to the received records.

What's wrong with this logic? We know that table modification will apply some exclusive lock on the table. If thread A is reading this record, the modification action will be blocked. At this time, thread B will enter the waiting state. But thread A needs thread B's input buffer to be emptied before writing. If thread B has not had time to clear it, it also has to wait, which will also cause a deadlock (a blocking in SQL Server).

Solution:

Complex programs may also exhibit other forms of deadlock. In order to avoid this kind of deadlock, set the execution timeout when the application calls SQL Server, and write the error handling mechanism (see blocking reason 2). Once a deadlock occurs, the operation of SQL Server will be abandoned due to timeout after waiting for a period of time, and the internal resources of SQL Server will be released to solve the deadlock.

Summary: The blocking problem should be solved more from the program design

Many users have a misconception that blocking is a database problem. When the blocking problem occurs, they all hope to find a way from the database level to solve the problem once and for all. However, blocking itself is to complete the isolation of the transaction, which is the request made by the application to SQL Server. So many times, just working hard from the database side cannot solve the blocking problem. There is also a lot of work to do at the application level. For example, what kind of isolation level the application chooses when making connections, the selection of time points for the start and end of transactions, the establishment and recovery mechanism of connections, and the control of instruction complexity. The application should also take into account the control result set size, and timely fetch data from the SQL Server side. Also consider the control of the execution time of SQL Server instructions, and the error handling mechanism after timeout or other unexpected events. Especially for critical business systems with high concurrency and high response requirements, these key factors must be considered when designing applications. For key business logic, it must be reviewed one by one to ensure that the application selects the lowest isolation level that can meet the business requirements, and the size of the transaction has been controlled to the smallest granularity. The running statement should also have a good database design to ensure that it will not occupy more resources and running time with the increase of the database and the number of users. If you can't do these points, it will easily happen that the performance of the application is good in the initial stage when the number of users is relatively small, or the database is relatively small, but when the number of users increases or the amount of data increases, the performance becomes slower and slower.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325115471&siteId=291194637