Comparison of memory database analysis and mainstream products (1)

Author: laboratory Chen / Big Data Laboratory
August 26, Star Central PhD students invited from East China Normal University's Software Engineering Institute Professor of Science tutor Palace to celebrate bring "cutting-edge database technology lecture series," share a database in the industry forefront of the development And research hotspots. Now I will share with you the first lecture of Gong Xueqing's training: the development of in-memory database technology.

— Disk-based database management system —

Traditional database management systems (DBMS) usually use disk-based designs. The reason is that the early design of database management systems was restricted by hardware resources such as single CPU, single core, and small available memory. The entire database was put in memory. It is unrealistic and can only be placed on disk. Since the disk is a very slow storage device (relative to the speed of the CPU), the database management system developed by academia and industry must adapt to the hardware conditions at the time in architecture. The database management systems such as Oracle and MySQL are still used today. This architecture design is still used.

With the development of technology, memory has become cheaper and the capacity has become larger. The memory of a single computer can be configured to hundreds of GB or even TB level. For a database application, such a memory configuration is enough to load all business data into memory for use. Although the amount of data processed by big data may be petabytes, those data are generally unstructured data. Generally speaking, the scale of structured data is not particularly large. For example, the transaction data of a bank for 10 to 20 years may only be tens of terabytes. If structured data of this scale is placed in a disk-based DBMS, in the face of large-scale SQL queries and transaction processing, it is limited by the I/O performance of the disk, and in many cases the database system will become the performance bottleneck of the entire application system.

If we configure enough memory for the database server, can we still use the original architecture and load all the structured data into the memory buffer to solve the performance problem of the database system? Although this method can improve the performance of the database system to a certain extent, it is still limited by the read and write speed of the disk in terms of log mechanism and update data placement, and it is far from playing the advantages of a large memory system. In-memory database management systems and traditional disk-based database management systems still have obvious differences in architecture design and memory usage.

— Buffer management mode —

In traditional database management systems, the main storage medium for data is disk. For example, a logical table is usually mapped to a file on the disk, and the file is stored on the disk in the form of a data block (also known as a Page). For structured data, a record will be saved in a data block on the disk, and the specific location of the record can be indicated by the data block ID and Offset/offset. This form of data block is also called Slotted Page. As the name implies, the data block is divided into many slots, and then a Record is placed on a certain slot. When processing a record, the record can be obtained from the disk through the Page ID + Offset representing the address of the record; then the system will read the data block storing the record from the disk to the buffer (Buffer Pool is divided into Multiple Frames, each frame can save a disk block), and then read the record from the buffer to the working area of ​​the thread or transaction for processing; after processing, write the updated record back to the data block in the buffer, and then The database management system writes the modified data block back to the disk.
Comparison of memory database analysis and mainstream products (1)
Data access example in a disk-based database management system
In a disk-based database management system, the entire index is usually loaded into memory when processing a query, and the size of an index node in a B+ tree index is usually a data block. Each indexed key value has a corresponding index item in the index leaf node, and the index item contains the storage location (Page ID + Offset) of the record corresponding to the key value; when a data block is loaded into the buffer in the memory During zone time, DBMS maintains the conversion between Page ID + Offset address and memory buffer address through the Page Table structure. When accessing data, first look for the corresponding Page ID + Offset in the Page Table. If not, it means that the record is still on the disk. You need to read the data block on the disk into the buffer first, and then in the Page Table Maintain a good address mapping relationship. The specific implementation process is that the DBMS will first look for available frames in the buffer, and if not, select the dirty page (Dirty Page) to replace it according to the buffer replacement algorithm; if a dirty page is selected for replacement, it needs to be replaced. Latch lock is added to the position to ensure that the position will not be accessed by other transactions during the replacement process (Latch will be introduced later). After the dirty page is written back to the disk, the system can read the target data block into that position in the buffer, and then write its address in the buffer to the Page Table to maintain the address mapping relationship; after these operations are completed Then release the Latch lock on the Frame.
Comparison of memory database analysis and mainstream products (1)
Memory address mapping
in traditional DBMS For traditional disk-based DBMS, even if the memory buffer is large enough to load all data into the memory, the address mapping and conversion in the process of accessing the data still exist, but the The overhead of loading data blocks from disk to memory. Even if all the data has been loaded into memory, there is still a big gap between the performance of a disk-based DBMS and an in-memory database. This is one of the important reasons.

In summary, an important difference between the implementation technology of disk-based DBMS and memory database is: when accessing data, the disk-based DBMS needs to convert the address of the data on the disk to the address in memory through address mapping, while the memory database In the design, the address of the data in the memory is directly used.

— Transaction ACID attribute guarantee —

In a database management system, it is necessary to ensure the ACID properties of transactions in concurrent access scenarios, that is, the atomicity, consistency, isolation, and durability of transactions. The ACID properties of transactions are mainly realized by two mechanisms in the database management system, one is concurrency control, and the other is Logging/Recovery mechanism.

  • Concurrency control

Most of the traditional disk-based DBMS adopt lock-based pessimistic concurrency control, that is, the transaction first locks when accessing the data, and then unlocks it after it is used up. If there is a conflict when other transactions access the data, they need to wait for the lock to be held. The transaction releases the lock. Traditional DBMS generally maintains a separate data structure in memory-Lock Table to store all locks, which are managed uniformly by the Lock Manager module, so that the data in the memory locks and buffers are stored and managed separately. When a transaction accesses data, it first applies to the Lock Manager for the lock corresponding to the data, and then accesses the data; after the execution is completed, the lock is released through the Lock Manager. The Lock Manager can ensure that all transaction applications and releases follow a strict two-stage lockout protocol (Strict 2 phase locking protocol). At the same time, the overhead brought by the concurrency control mechanism is not directly related to the user's actual business processing, and is an additional overhead used to ensure transaction consistency and isolation.

In-memory databases also need locks when accessing data, but unlike disk-based DBMSs, locks and data are stored together in memory, usually the lock information is stored in the data record Header. Why does the disk-based DBMS put the lock information in the Lock Table separately, while the memory database can store the lock information and data together? Because in a disk-based DBMS, the data block is likely to be replaced by the system from the memory buffer to the disk. If the lock information and data are put together, once the data block is replaced, Lock Manager and all transactions cannot obtain information The lock information of the data. Therefore, for traditional disk-based DBMS, the lock must be separately maintained in the memory, and must be kept in the memory at all times, and cannot be replaced. For in-memory databases, such a scenario does not exist.

In fact, there are two lock mechanisms in the database management system, called Lock and Latch respectively, to protect data consistency from being destroyed by concurrent access. The Lock mechanism protects the logical content of the database. Generally speaking, it has a long duration, usually the entire process of transaction execution; and the Lock mechanism should support the rollback of the transaction to undo the data modification by the transaction. The Latch mechanism is to ensure that specific data structures in memory will not cause errors due to concurrent access. For example, when inserting, deleting, etc. occurs in a shared queue during multithreaded programming, Latch is required to ensure that the queue during the operation is not affected by other operations. Thread interference. The hold time of the Latch is related to the operation, and the operation ends when it is done, and there is no need to support the rollback of data modification.

Therefore, if the traditional DBMS wants to operate on a Page in the buffer, it needs to add Latch; if it is to modify the content of the database, it needs to add Lock, which is maintained and managed separately in the Lock Table. The figure below is a simple comparison of Lock and Latch.
Comparison of memory database analysis and mainstream products (1)
* In the Logging and Recovery
database management system, the Logging and Recovery mechanism is a way to log to ensure the atomicity and durability of transactions. Atomicity means that all operations in a transaction must be successful or undone at the same time, and can be rolled back according to the log when half of the execution cannot be done; persistence means that if the data is lost, it can be recovered according to the log.

In the Logging and Recovery of traditional DBMS, the most important concept is WAL (Write-Ahead Log)-write-ahead log. WAL means that all update operations in the system have corresponding logs, and before the log is placed, the modification of the data is not allowed to be placed. Each log in the system has an LSN (Log Sequence Number), and all LSN numbers increase monotonically. The process of logging the log is continuous writing to the disk (sequential writing). However, if the system strictly follows a log corresponding to an operation, and immediately after the log is placed, the data update result of the operation is placed on the disk, then the system performance will be greatly affected. Therefore, most DBMS will adopt Steal + No Force buffer management strategy. Steal means that the DBMS can flush the updates of uncommitted transactions to disk without waiting for the transaction to be submitted before flushing the updates to disk, which improves the flexibility and performance of the system flushing; if a crash occurs when the transaction is not committed, the update may be possible It has been written to disk, and then it needs to be rolled back through the undo operation of the log. No Force means that after the transaction has been committed, the update of the data can still be stored in the memory buffer without being written to the disk, and then written to the disk at one time after merging the updates of other transactions to provide optimized space for the system. But the possible risk of No Force is: if the transaction has been successfully committed but the update is not written to the disk, and a crash occurs at this time, the data update still in the memory will be lost. It needs to be based on the log that has been written to the disk (transaction successful The prerequisite for submission is that all its logs must have been placed) for redo operation.

With the WAL and Steal + No Force mechanism, the disk-based DBMS can provide maximum flexibility to optimize disk I/O. But for in-memory databases, all data is stored in memory. Is this mechanism still needed? What is clear is that the in-memory database still needs logging, but it is different from the disk-based DBMS. Only the information required for redo operations is recorded in the log, and the information required for undo is not recorded. Everyone can think about why this is? On the other hand, the in-memory database does not record the index update during the Logging process, but only records the update to the basic table, so the content that needs to be written to the disk during the Logging process is much less. When the in-memory database fails and needs to be restored, first restore the basic table from the Check Point data and logs saved on the disk, and then reconstruct the index in the memory.

— Disk-oriented DBMS performance overhead —

In 2008, a paper by SIGMOD analyzed the performance cost of disk-oriented database and divided the cost of the entire database system. The analysis found that: assuming that the total cost of a business processing is 100%, in fact only less than 7% of the resources are actually processing business logic; 34% are used for buffer management such as buffer loading and replacement, address conversion, etc.; 14% Deal with Latching; 16% deal with Locking; then 12% deal with Logging; and the last 16% is used for processing the B-tree index. In other words, after the machine resources run at full capacity, only 7% are actually used to process business logic.
Comparison of memory database analysis and mainstream products (1)
Disk database system performance overhead
So can the expensive part be removed to increase the proportion of business logic resources? If the database is single-user, there is no concurrent contention conflict, then the overhead of Locking and Latching can be saved. There are also some single-threaded solutions in history, such as dividing the database into multiple Partitions, each Partition being processed by a thread, etc. But such a scheme has obvious disadvantages: each Partition is processed serially. If there is a long transaction in execution, serial processing will cause all subsequent transactions to be blocked until the end of the transaction. Moreover, disk I/O is the bottleneck of disk-oriented systems when performing large-scale transaction processing. If single-threaded execution is performed, the CPU will be idle when reading data from the disk. But for in-memory databases, all data is stored in memory, and disk I/O is not the main bottleneck of the system, so the technology used is very different from before. Of course, technology has also undergone various attempts in the development process. Some technological developments are not suitable for the real background, and they are slowly forgotten.

It can be seen that the disk-based database management system has done a lot of additional management work. Although these tasks do not deal with business logic, they are indispensable in ensuring the correctness of business logic. For in-memory databases, the question is what optimizations should be done to get the best performance. Compared with a disk-based system, the main storage of an in-memory database is memory, but a disk is still needed for Check Point and Logging. In the event of a failure, checkpoint data and logs on the disk are used to restore the entire in-memory database.

— Historical development of memory database technology —

The development of in-memory databases can be roughly divided into three stages: 10 years from 1984 to 1994; 10 years from 1994 to 2005; and after 2005 to the present. In the first stage, memory-related processing technologies appeared; in the second stage, some in-memory database systems appeared; and the third stage is the scenario we are now facing.

  • 1984-1994

From 1984 to 1994, academia put forward many hypotheses for memory data management, such as memory buffers that can hold all data, group submission and fast submission optimization techniques. At the same time, a memory-oriented data access method is also proposed, which no longer uses the Page ID + Offset approach like disk-based DBMS to access, but directly uses memory addresses in all data structures. There is also a memory-oriented T-tree index structure and the system is divided into multiple processing engines according to functions. Some specialize in transaction processing and some specialize in recovery, which is equivalent to having two cores, one is responsible for transaction processing, and the other is responsible for Log processing. In addition, there is a main memory database related to Partition, which divides the database into many Partitions, each Partition corresponds to a core (or node), and there is no competition between processes. It can be seen that the development of database technology during this period has been considering which technologies can be used if all the data is stored in memory. However, due to the hardware conditions at the time, these technologies have not been applied on a large scale.

Comparison of memory database analysis and mainstream products (1)

  • From
    1994 to 2005, some commercial in-memory database systems appeared between 1994 and 2005, such as Dali developed by Bell Labs, and Smallbase, the predecessor of Oracle Times Ten. At the same time, some multi-core-oriented optimization systems such as P*-Time (now SAP-HANA transaction processing engine) have appeared. At that time, some Lock-free implementation technologies were applied to memory database systems, namely, lock-free programming technologies and data structures.
    Comparison of memory database analysis and mainstream products (1)
  • Summary of the
    first two stages The technologies of the first two stages can be roughly divided into the following categories:
    1. Solve the In-Direction access of the Buffer Pool : replace the indirect access with direct memory address access; the leaf nodes of the index no longer put Page ID and Offset are directly the memory address.
    2. Data Partition : A technology that divides data and does not perform concurrent access control.
    3. Lock-free and Cache-Conscious : Compared with disk-oriented database management systems that store an index node in a data block, an index node in a memory database is the length of one or several Cache Lines.
    4. Coarse-grained locks : Lock one table or one Partition at a time instead of one record, but this technique is now used less, because multi-core scenarios are highly competitive for access, and coarse-grained locks may reduce the degree of concurrency. (Currently used less)
    5. Functional Partition : The system is divided according to functions, and each thread is responsible for specific functions. (Currently used less)
    Comparison of memory database analysis and mainstream products (1)
    DBMS historical technology summary

    — The modern development of database systems —

    In the current environment, hardware conditions basically have three characteristics: 1. Large and cheap memory; 2. Multi-core CPU (from the increase in frequency to increase in the number of cores); 3. Multi-Socket, which means multi-core and multi-CPU, means processing The degree of concurrency can be higher and higher. These are the current situations faced by database system research and development.
    Comparison of memory database analysis and mainstream products (1)
    In modern hardware environments
    , CPU and disk I/O are no longer the main bottlenecks for in-memory databases. Therefore, optimization techniques are currently mainly considered from the following perspectives:

  • Remove the traditional buffer mechanism : The traditional buffer mechanism is not applicable in memory databases. Locks and data do not need to be stored in two places, but concurrency control is still required, which is different from traditional lock-based pessimistic concurrency control. Concurrency control strategy.
  • Minimize runtime overhead : Disk I/O is no longer a bottleneck. The new bottleneck lies in aspects such as computing performance and function calls, and runtime performance needs to be improved.
  • Adopting the method of compilation and execution : Traditional databases mostly use the volcano model execution engine. Each Operator is implemented as an iterator, providing three interfaces: Initial, Get-Next, and Closed, which are called sequentially from top to bottom. The call overhead of this execution engine does not account for a major proportion in a disk-based database management system (disk I/O is the main bottleneck), but it may constitute a bottleneck in a memory database. If you want to read 1 million records, you need to call 1 million times, and the performance will become unbearable. This is the reason why a large number of compilation and execution methods are used in memory databases. Directly call the compiled machine code, no longer need runtime interpretation and pointer calls, performance will be effectively improved.
  • Scalable high-performance index construction : Although the in-memory database does not read data from the disk, the log still needs to be written to the disk, and the problem that the log writing speed cannot keep up needs to be considered. You can reduce the content of the log. For example, remove the undo information and write only the redo information; write only the data but not the index update. If the database system crashes, after loading data from the disk, you can rebuild the index in a concurrent way. As long as the underlying table is present, the index can be rebuilt, and the index can be rebuilt in memory faster.
    — Summary of
    this article— This article mainly introduces the main similarities and differences between the disk-based database management system and the in-memory database management system in several aspects, as well as the technical development of the in-memory database from 1984 to the present. Later, I will continue to share the development of in-memory database technology, from the perspective of data organization, indexing, concurrency control, compiled query, and persistence, introduce and compare the implementation technologies of several mainstream in-memory database products.

Note: Part of the material in this article comes from:

  1. Modern Main-Memory Database Systems Tutorial at the VLDB 2016 conference
  2. CMU (Carnegie Mellon University) Professor Andy Pavlo's Advanced Database Systems (Advanced Database Systems) course

Guess you like

Origin blog.51cto.com/15015752/2554353