NVM as the main memory on the effect on the database management system

Effect of NVM as database management systems hosting the
implications of non-volatile memory as primary storage for database management systems
summary of
traditional database management systems using disk storage relational data. Hard drive features: low-cost, durability, large capacity. However, the cost is very high reading data from the disk. To eliminate this delay, we need to DRAM as an intermediary. DRAM features: faster than disk speed, but capacity low, and persistence. NVM memory is a new technology, has a large capacity, byte addressable, storage speed comparable to DRAM, non-volatile Xing.
, NVM affect this paper we review the main memory as relational database management systems. That is, the study of how to modify the traditional relational database management systems to take full advantage of the characteristics of NVM. PostgreSQL storage engine modified to make it fit NVM, and describes how to modify and amend the challenges in detail. Finally, to test evaluated by a comprehensive simulation platform. The results show, the data storage disk: PG modified query time 40% smaller than the PG native; data stored in the NVM, may be reduced by 14.4%. Decreased average 20.5% and 4.5%.
Introduction
general database management systems are memory plus disk architecture, data sets will ultimately persisted to disk. Disk with low-cost, non-volatile characteristics, suitable for large-scale data storage. However, when reading data from the disk, a long time. To reduce data access latency, CPU and disk in a DRAM directly added as an intermediate storage medium. DRAM access speeds orders of magnitude faster than disk. In addition, with the increase in DRAM memory chip density and lower prices, it has large memory systems become more common.
For these reasons, the traditional memory-based relational database becoming increasingly popular. Important part of the relational database, such as index structure, recovery mechanisms, etc. are submitted to the process for the main memory as the storage medium and customizable. But relational databases when dealing with non-critical data or redundant data still needs to persistent storage medium, such as a large number of disks.
DRAM is an important factor affecting the efficiency of the database service. Database when performing a query, 59% of power consumed in the main memory. In addition, there is associated with the drain voltage of the built-in material and limit further expansion DRAM. Therefore, DRAM as the main memory medium, can not keep up with current and future data sets.
NVM is a new hardware storage media, along with some of the characteristics of the disk and DRAM. Projecting NVM technology products: PC-RAM, STT-RAM and R-RAM. Because NVM persistent on the device level, there is no need to DRAM refresh cycles to maintain the same data state. Thus NVM and DRAM compared to consume less energy per bit. In addition, the NVM than the hard disk have a smaller delay, and DRAM read latency even quite; byte addressable; greater density than DRAM.
We need to fully consider the characteristics of NVM DBMS hardware design to release its dividend. The easiest method is to design an alternative NVM disk, using its low latency for performance gains. However, the characteristics of the DBMS adapter NVM, far more than its low-latency characteristics.
This paper studied how to deploy in the design of DBMS NVM. First, to discuss how to contain NVM memory structure of the current system; and by modifying the PostgreSQL NVM storage engine to maximize dividends. We aim to bypass the slow disk interface while ensuring the robustness of the DBMS.
We evaluated after two modified PG storage engines through the use of simulation platform and TPC-H benchmark test. At the same time, test the unmodified PG scenes on the SSD and the NVM. The results showed that the modified storage engine kernel execution time can be reduced (the position of the file occurs IO): average of from 2.6% to 10%. PG modified properties on the hard disk can increase 20.5%, 4.5% can improve the NVM. Further, demonstrated performance bottleneck modified PG: NVM due to get direct access to the data, when the query requires the data, the data is not changed and close CPU. When a user-level cache is not the data, resulting in long delays, fail to reflect the benefits brought by the new hardware.

Background
This section details the characteristics of NVM technology and the impact on DBMS involved. Then introduces the NVM management system software.
1, NVM characteristic
data access latency: NVM read latency is much smaller than the disk. Since the NVM is still in the development stage, different sources of different delays. STT-RAM delay 1-20ns. Still, his delay is also very close to the DRAM.
PC_RAM and R-RAM write latency than DRAM. But the write latency is not very important, because it can be mitigated by buffer.
Density: NVM a higher density than a DRAM, a main memory can be used as alternatives, especially in embedded systems. For example, with two to four times to provide the capacity for DRAM, PC-RAM, easy expansion.
Durability: the maximum number of times each memory cell to write. The most competitive is the PC-RAM and STT-RAM, providing access to DRAM durability. More precisely, NVM durability is 1015 and 1016 is DRAM. In addition, NVM greater durability than flash memory technology.
Energy consumption: NVM not like the DRAM periodically flashed as to maintain the data in memory, consuming less energy. PC-RAM significantly less energy consumption than DRAM, the other closer.
In addition, byte addressing, persistence. Interl and Micron Technology have launched 3D XPoint, while Interl developed new instructions to support lasting memory usage.
2, NVM system software
when used as the main memory NVM, not only need to change the application software but also to modify system software, in order to give full play to the advantages of the NVM. Traditional file systems accessible storage medium block layer. If you just replace the disk NVM, without making any changes, then the NVM storage also needs to read and write data through the block layer. Thus NVM byte addressing characteristics can not give full play to its advantages.
Therefore, the file system support has been some progress on a lasting memory. PMFS is a Interl and developed by the open source POSIX file system. It provides two key features for ease of use NVM.
First, PMFS not NVM maintains an independent address space. In other words, NVM and unified memory addressing. This means that no data is copied from NVM applications for access to the DRAM. NVM process can directly access the data in a byte granularity.
Secondly, the traditional database access in two blocks: File IO; memory mapped IO. PMFS in a similar way to achieve the traditional FS file IO. However, different memory mapped IO implementation. Traditional file system memory mapped IO first pages copied to DRAM. PMFS then do this step, it is directly mapped pages directly into the address space of the process. Figure 1 is a comparison with a traditional file system PMFS.
NVM as the main memory on the effect on the database management system
Design choices
This section discusses the design of the hierarchical memory system contains existing at the time and to take full advantage of NVM NVM, disk-oriented DBMS how to modify.
1, DBMS NVM memory hierarchical design based on
NVM as the main memory on the effect on the database management system
a variety of methods in the NVM memory hierarchy in the current DBMS. Figure 2 shows three common ways to use the NVM. Pictured wherein a conventional manner, the intermediate state includes the log currently in use, data cache, partial query status stored in DRAM, the data stored in the main disk.
Based on the characteristics of NVM, you can replace DRAM and disk. B As shown in FIG. However, such changes need to redesign the current operating system and application software. Further, as a substitute for the DRAM, the NVM technology is not mature in terms of durability. Therefore, we advocate the platform still contains DRAM memory, disk whole or in part is replaced by NVM. FIG c (NVM-Disk) FIG.
In this embodiment, the current is still remaining in the system DRAM-layer, whereby the use of DRAM fast read and write temporary data structures and application code. Further, the application allows the file system to access the data PMFS database system, properties of NVM API avoids the overhead byte addressing this traditional file system. This method does not require the deployment of a large number of DRAM, because of the small amount of data temporarily. We believe that this scenario is to deploy integrated NVM rational use: the NVM placed next to DRAM to store temporary data structure or use traditional cold storage disk data.
2, change the traditional DBMS point of
time of traditional disk-oriented database system is deployed directly on the NVM, can not fully play the bonus NVM new hardware brings. When used as the primary storage medium NVM important DBMS member needs to be changed or removed.
Avoid block-level access: traditional DBMS using disk as the primary storage medium. Since disk sequential access faster, so as to read data blocks to balance the disk access delays.
Unfortunately, access data in chunks of data movement will cause additional costs. For example, if a transaction updates a byte of a record, you still need to brush the entire block is written to disk. In other words, a block-level access providing better data prefetching. Since NVM is byte addressable, you can access the data in bytes. However, this will reduce the particle size to a data byte level, there is no preheating of the data. A better way to balance the advantages of both.
Remove DBMS internal buffer cache: DBMS typically maintains an internal buffer cache. When accessing a record, his first calculate the disk address. If the data block is not in the corresponding buffer cache, it is necessary to read from the disk buffer cache.
NVM-based database does not require such a method. If the NVM address space can be seen by other processes, so long you do not need to copy the block action. Record direct access to the NVM will be more efficient. However, this requires a supported operating system NVM, such as PMFS, NVM can be directly exposed to the process address space.
Remove redo log: In order to ensure the ACID properties of database, DBMS requires two logs: undo and redo. Undo log to roll back uncommitted transactions, redo for playback has been submitted but the data written to disk. DBMS NVM based, if not deployed inside the buffer cache, all written directly to write NVM, you do not need to redo log, but still need to undo log.
Case: POSTGRESQL
Postgresql is an open source relational database that supports the completion of ACID, and runs on all major operating systems, including Linux environment. In this section we study the postgresql storage engine and make some modifications to make it fit NVM. First introduced the read-write architecture PG, and then explained what had been done to modify.
1, PG write architecture of
NVM as the main memory on the effect on the database management system
FIG. 3a shows the architecture of the original document read operation PG. FIG left column shows the operation executed PG software layer, and the right column shows the corresponding data movement. Note that the operating system uses PMFS. Figure 3a NVM using the replacement disk to store data.
PG performance depends heavily on the read and write data files IO. Because of PMFS the same file IO API and traditional file systems, so use a particular file system for PG who do not make any changes.
PG server calls Buffer Layer of service for maintenance inside the buffer cache. Buffer cache maintained that PG page is about to be accessed. If there is no idle slot buffer cache for disk reads a page comes in, it will perform the replacement policy, select a data page that is expulsion from the management list for the use of buffer cache, if the data page is dirty page, you need to first it flushed to disk.
PG Upon receiving a request to read a data page from disk, Buffer Layer will find a free slot in the buffer cache and get his hands. And FIG. 3a pg Buffer PgBufPtr are free buffer slot and a corresponding pointer. Buffer Layer this pointer is transmitted to File Layer. PG's File Layer final wake-up file read and write, read and write depends on the file system to complete.
For read operations, PMFS NMV data block copy from the kernel buffer, and then copy it to the idle kernel buffer cache slot PgBufPtr points. Write words is two copies, but in the opposite direction.
Thus, when a miss buffer cache, the native PG storage engine operation will cause two copies. When the data set is very large, it will be a lot of overhead. Since PMFS possible to directly map the address to the memory NVM may be stored by modifying the engine, to avoid the overhead of copying. Here's how to change.
2, SE1: the use of IO memory map mode
using the first step NVM characteristics: PG will replace the File Layer, named MemMapped Layer. 3b, this layer still receive pointer Buffer Layer free buffer slot is transmitted. However, by using the memory-mapped PMFS input-output interface, no longer generate a file IO. Such storage is called SE1.
Read: When read access to the file, you first need to call the open () opens the file, and then need to use mmap () to map the file into memory. The use of PMFS, mmap () returns a pointer to the file mapping NVM. This can be applied directly access files on the NVM.
Thus, no copy of the requested page data to the kernel buffer. 3b, you can call memcpy () copies the requested page data directly to the buffer PG. When the request is complete, no longer need to access the file, the file can be closed. After that, you can call munmap () function to cancel the mapping.
Write: Read and similar. First you need to open the file is going to change, then mmap mapping. Use memcpy () directly to the dirty data to be copied from the PG buffer NVM.
SE1, do not have to copy data to the kernel buffer, a reduced copy of the data.
3, SE2: direct file access mapping
second modification is to replace the MemMapped Layer SE1 is PtrRedirection Layer 3c of FIG. And MemMapped different Layer, he received a pointer pointing PgBufPtr (P2PgBufPtr).
Read: When accessing a file for reading, call open () to open the file, and then use mmap () is mapped into memory. The original pointer to the internal buffer cache PgBufPtr idle slot. Because mmap can be mapped to the NVM memory that process can see this address, PtrRedirection Layer PgBufPtr will point to the address on file in the NVM. Read pointers shown in Figure 3c redirect the "Read" tab.
Eliminating the need for copying data read operation. In large data queries, this method has greatly enhanced the performance.
Writes: PMFS can be applied directly access files on the NVM. Since PG is a multi-process system changes directly on the NVM file it is very dangerous and may leave the database in an inconsistent state. To avoid this problem, SE2 modify data in the page and marked dirty page needed before step 2: if the page in the NVM, then the data is copied to an internal page buffer cache, i.e., Pg-Buffer; then releasing PgBufPtr pointer redirection, re point buffer cache idle slot. FIG. 3c "Write" process. In this way, SE2 will be able to ensure that every process change only its local copy of the data page.
Related work
Before work is mainly divided into two categories: The NVM will replace the entire database storage medium; NVM storage deployment logs. "Nvram-aware logging in transaction systems " and "High performance database logging using storage class memory" of things to reduce the impact of disk IO throughput, as well as the logs are written to NVM directly to disk rather than a brush to reduce the response time. The use of multi-core multi-socket hardware NVM write distributed logging, reduce competition centralized logging when the system load increases: "Scalable logging through emerging nonvolatile memory ". Two DRAM and NVM storage, study different recovery methods.
Conclusions
studied in the design of DBMS, deployment NVM its impact. We talked about several situations DBMS NVM memory hierarchy will join. The NVM completely or partially replace the disk is a typical application scenario. Under this method, without modifying the principle of the system, and allow direct access to the data set on the NVM. Introduced two variants of PG storage engine: SE1 and SE2.
The results show that for native PG, will be deployed in the NVM database on disk performance than the highest increase 40% average increase of 16%. SE1 and SE2 relative to the disk to reduce the execution time of nearly 20.5%. However, the current design is that the biggest obstacle to the database system will maximize performance. Compare our benchmarks and SE2, can enhance the maximum read performance of 14.4%, an average of 4.5%.
Limiting factor is the data from the CPU quite far, this is the negative impact of direct access to data on the NVM. This will weaken the benefits NVM. Therefore, the development of adaptation NVM library is necessary.

Guess you like

Origin blog.51cto.com/yanzongshuai/2447211