Distributed system concept and design - (transaction and concurrency control)

Distributed system concept and design

Transactions and Concurrency Control

Introduction

  • The goal of a transaction is to ensure that all objects managed by the server are always maintained in a consistent state when multiple transactions access objects and the server is facing a crash
    . An indivisible unit, executed by the server
    The server must ensure that the entire transaction is executed and store the execution results, or be able to eliminate the impact of these operations in the event of a failure

affairs

  • Clients view a transaction as a set of operations that make up a step that transforms server data from one consistent state to another.
  • Transactions can be provided as part of middleware.
  • Transactions are always applied to recoverable objects and are atomic, and transactions are often treated as atomic transactions.
    • failure atomicity
      • The effect of a transaction on a server crash is atomic
    • Persistence
      • Once the transaction is complete, all effects are saved to persistent storage.
      • Storage media is immune to server crashes.
    • isolation
      • The execution of each transaction is not affected by other transactions, and the intermediate effects during transaction execution are not visible to other transactions

insert image description here

  • Server actions when a process crashes
    • If the server process crashes unexpectedly, it will eventually be replaced by a new server process.
    • The new server process will discard all uncommitted transactions and go through a process to restore the value of the object to the value resulting from the most recently committed transaction
    • Checkpoints are designed to handle crash recovery to the most recent state (CHECK POINT)
    • In order to deal with the client process crashing during the transaction, the server sets an expiration time for each transaction, and the server will abandon all transactions that have not been submitted before the expiration time.
  • Client actions when server process crashes
    • If the server crashes between executing transactions, the client will receive an exception after the timeout.
    • If, in the middle of a transaction, the server crashes and is replaced by a new server, the outstanding transaction will no longer be valid and the client will receive an exception when it initiates a new operation.
concurrency control
update lost

The update of transaction U is covered by transaction T, and the data read before the update of the two transactions are all old data.

inconsistent search

When transaction W is summarized, other transactions have completed the data update.

serial equivalence

Each transaction knows the correct effect of its individual execution, so it can be inferred that the combination of these transactions executing one transaction at a time in some order is also correct. serial equivalence measure

Recovery when transaction aborts

The server must record the effects of all committed transactions and not save the effects of abandoned transactions.

The server must ensure that after the transaction is abandoned, its update function is completely canceled without affecting the operation of other concurrent transactions.

read dirty data

Transaction isolation requires that the state of an uncommitted transaction is invisible to other transactions.

This interaction can cause read dirty data problems if a transaction reads data written by other uncommitted transactions.

transaction recoverability

If a transaction accesses the updated results of an abandoned transaction and has committed, then the state of this transaction is unrecoverable.

The solution to the unrecoverable problem is: All transactions that have read dirty data must be deferred.

A recoverable strategy is to defer the commit of a transaction until the other transactions that it reads the result of the update have committed.

chain abandonment

Assuming that transaction U has been postponing the commit until transaction T gives up, then transaction U should also give up at this time to avoid dirty reads.

At the same time, other transactions that observe the results of the U transaction must also be abandoned, and the abandonment of these transactions may lead to the abandonment of the transactions that observe the results of these transactions, resulting in chain abandonment.

The solution is: only allow transactions to read data that has been committed for writing.

In order to guarantee this: the operation of reading an object must be postponed until the transaction writing the data commits or aborts.

Preventing chain reactions is a stronger condition than guaranteeing transaction recoverability.

write too early

Consider another possibility of transaction abandonment;

At the same time, two transactions interact with writing operations on an object.

When a transaction is aborted, the value of the variable is restored to the before image of all write operations by that transaction.

insert image description here

In order to ensure correct results for transaction recovery using the before image, write operations must wait until other transactions that previously modified the same object commit or abort.

Strict execution of business

To avoid premature writes and dirty reads, transactions are usually required to defer read and write operations.

If both read and write operations are deferred until other transactions writing to the same object commit or abort, this is a strict execution condition.

Strict execution of transactions can truly guarantee the isolation characteristics of transactions.

interim version

For a recoverable object server participating in a transaction, it must ensure that all object updates can be cleared after the transaction is aborted.

To achieve this, all update operations in the transaction are to the temporary version of the object in volatile storage.

Each transaction has a temporary version set of objects that have been changed by the transaction.

All update operations of a transaction store the value in its own temporary version.

Transactional access operations take values ​​from the temporary version if possible, or from the object if they fail.

Only when the transaction is updated and submitted, the temporary version of the data should actually update the object data, and at the same time persist the data to the storage. This process is an atomic step, and the update is guaranteed to succeed or fail.

No other transactions will be allowed to access the object during this time, and the temporary version is deleted if the transaction fails.

ACID

To implement concurrency control in a transaction, the following conditions usually need to be considered:

  1. Atomicity: The operations in a transaction should be executed as an atomic unit, that is, either all of them are executed successfully, or all of them are rolled back. If an operation in the transaction fails, the entire transaction should be rolled back to the previous state to ensure data consistency.
  2. Consistency: During transaction execution, if data is modified, the system should maintain consistency when the transaction ends. This means that the system should be in a consistent state when a transaction starts and when it ends.
  3. Isolation: In concurrency, multiple transactions may access the database at the same time. To ensure that transactions do not interfere with each other, the system should implement isolation, i.e. each transaction should feel unique and be able to access and modify data independently.
  4. Durability: After the transaction ends, the system must ensure that the modified data is persisted to the database, even if the system is interrupted or fails, the data should not be lost.

These conditions are often referred to as ACID properties, are very important in transactional systems, and are widely adopted.

transaction recovery

Recovery is very important when a transaction aborts, as this ensures the consistency and integrity of the database. Here are some key design notes:

  1. Recovery when a transaction aborts should be automatic, not manual. This means that when a failure occurs, the system should be able to automatically detect transaction abandonment and recover.
  2. The recovery operation should be able to detect where the system was interrupted and restore the database to its pre-interruption state. If multiple transactions are in progress at the same time, the system should prioritize the committed transactions and should be able to properly coordinate the recovery of all transactions.
  3. When performing recovery operations, the conditions for concurrency control should be carefully considered. During recovery, the database should remain in an "uncommitted" state to allow transactions to be rolled back and recovered.
  4. During recovery, all transaction commits should be disabled to ensure smooth transaction rollback and recovery. After recovery is complete, the system should allow new transactions to commit.
  5. During recovery, the system should record all modification operations in log files. This way, if the restore operation fails, all changes can be undone to restore the state before the interruption.
    These are some key design notes for recovery on transaction abort that can help ensure the consistency and integrity of the database.

nested transaction

insert image description here

In terms of concurrent access and fault handling of transactions, subtransactions are atomic to parent transactions.

Sub-transactions at the same level can be executed concurrently, but access to common objects is serialized.

Each subtransaction can fail independently of other transactions.

When a subtransaction fails, its parent may choose another subtransaction to do the work

  • Key advantages of nested transactions:
    • Sub-transactions at the same level can run concurrently, improving the concurrency within the transaction
    • Sub-transactions can be submitted or abandoned independently. Compared with a single transaction, several sub-transactions may be more robust - reducing the load of the current transaction
  • Commit rules for nested transactions:
    • A transaction cannot be committed or abandoned until its sub-transactions are completed
    • When a sub-transaction is completed, it can independently decide whether to temporarily commit or give up. If waived, then the decision is final.
    • When the parent transaction aborts, all child transactions abort. Even subtransactions may have been temporarily committed.
    • If the top-level transaction commits, then all provisionally committed subtransactions are finally committed.
    • Only after the top-level transaction is committed, the effect of the sub-transaction can be persisted.

Lock

Transactions must be scheduled so that the execution effects on shared data are serially equivalent.

Servers can achieve serial equivalence of transactions by serializing object access.

A simple serialization mechanism is to use a mutex. The server tries to lock the object accessed by the client. If a client accesses the object and it is already locked, the access object will be suspended, or the acquisition of the lock will fail, and the object can only be obtained after it is unlocked.

  • Serial equivalence requires that one transaction's access to an object is serialized relative to other transactions, and all conflicting operations of two transactions must be executed in the same order.
    • A transaction is not allowed to apply for a new lock after releasing any lock. Serial equivalence is guaranteed.
    • One stage of each transaction is the continuous application lock and growth stage.
    • In the second phase, the transaction continuously releases the lock, shrinking phase.
  • All locks acquired during the execution of a transaction must be released after the transaction is committed or abandoned. Known as strict two-phase locking.
  • Locks prevent other transactions from reading and writing objects. When the transaction is committed, in order to ensure recovery, the lock must be released after all updated objects are written to persistent storage.
Implementation of the lock
  • ID of the locked object
  • The transaction ID of the transaction that currently owns the lock
  • type of lock
  • Lock methods are all synchronized methods
deadlock

Both transactions are waiting and dependent on each other, and the lock can only be acquired if the other party releases the lock.

It is easier to avoid deadlocks and reduce the granularity by locking on the child items of the object.

Timeouts are an effective way to resolve deadlocks.

Increase concurrency in the locking mechanism

Locking rules are based on conflicts between read and write operations, and locks are applied at a finer granularity.

Spatial approach to increasing concurrency

  • double version lock
    • The setting of the mutex is postponed until the transaction commits
  • hierarchical lock
    • Use mixed-grained locks

Hierarchical locks and dual-version locks are both implementations of optimistic locking.

Hierarchical locks are an implementation of optimistic locks based on timestamps. It maintains a version number for each data item, and checks whether the version number has changed every time the data is read. If the version number changes, it means that the data has been modified and the data needs to be read again. Hierarchical locks can also set different version numbers for different data items, thereby achieving finer-grained lock control.

Dual-version lock is an optimistic lock implementation based on CAS (Compare and Swap) operation. It maintains two version numbers for each data item, one for read operations and one for write operations. The read version number is used for read operations, and the write version number is used for write operations. Every time the data is updated, it will first check whether the read version number has changed. If there is no change, the CAS operation will be performed, and the write version number will be incremented by 1, and the data will be updated. If the read version number changes, it means that the data has been modified and the data needs to be read again.

Hierarchical locks and dual-version locks are both implementations of optimistic locks. Compared with pessimistic locks, they have higher performance, but they also have some disadvantages. Hierarchical locks need to maintain version numbers, which adds additional overhead; dual-version locks require CAS operations. If there are many concurrent conflicts, the failure rate of CAS operations will increase, thereby affecting performance.

optimistic concurrency control

Disadvantages of locking
  • Overhead from lock maintenance
  • cause a deadlock.
  • Reduce potential concurrency and avoid chain abandonment. Locks can only be released at the end of the transaction.

Timestamp sort

Timestamp-based concurrency control, each operation of the transaction must be verified before execution. Each transaction has a unique timestamp.

If the verification fails, the transaction is abandoned immediately, and then the client may re-initiate a new transaction.

The timestamp defines the order of the transaction in the transaction order and cannot solve the distributed problem.

Guess you like

Origin blog.csdn.net/weixin_38233104/article/details/130967209