In layman's language computer composition principle study notes: SSD hard disk (under) - how to do performance optimization of KPI? (Lecture 47)

First, the primer

1. Why Windows operating system with the SSD system disk, the disk can not be used shredded papers finishing function?

If you normally use a Windows computer, you will find, with the SSD system disk, disk defragmentation function can not be used. This is because, once the initiative to run a disk defragmentation, erasing a block occurs,
the life of the corresponding block will drop a little. The life of the SSD to erase the problem will not only affect such as disk defragmentation function, in fact, it is affecting our daily use.

2, read many scenes

On our operating system, not the individual blocks on the SSD rewritable paper now and life situation, so it treats the SSD and conventional mechanical hard drive is no different.

We daily use PC software development time, will first upload on your hard disk operating system and common software, such as Office, or engineers will install the VS Code, WebStorm such an integrated development environment.
These software block is located, write once, it is not erased, so it only needs to read.

3, write more scenes

Once you start to develop, we will continue to add new code file, will continue to modify the already existing code files. Because SSD drives feature no override (Override), so this process, in fact, we are repeatedly written to the new file, and then mark the original document to the state logically deleted. And other SSD inside an empty block less,
we will use the way of "garbage collection", erase. In this way, we will erase repeatedly found in these places used to store data.

 

One day, the number of erased blocks to become a bad block. However, where we install the operating system and the software has not been bad, but the hard drive can be used capacity is smaller.

Second, wear leveling

1, wear leveling

So, we have no way to prevent these bad blocks so long to happen? Can we, the uniform number of erase blocks some of the storage operating system to store data to these places it?

I believe you must think, in fact, what we want is to think of a way to allow the SSD to erase the number of times each block, evenly spread over the various blocks. This strategy does, it is called wear leveling (Wear-Leveling). Measures to achieve this core technology, and we talked about earlier, like virtual memory, is to add a layer of indirection. This indirection is above tell us that you sell off the child, FTL is the flash translation layer .

 

Like when memory management, we passed a page table maps virtual memory pages and physical pages as in the FTL inside, store the logical block address (Logical Block Address, abbreviated LBA) to a physical block address (Physical Block Address, referred to as PBA ) mapping.

2, FTL flash translation layer

Hard disk operating system access address, in fact, are the logical address. Only after passing FTL conversion, it will become the actual physical address to find the corresponding block access. The operating system itself does not need to consider the degree of wear of the block, and as long as operation of the mechanical hard enough to read and write data.

All requests for an operating system to read and write SSD hard drive, go through the FTL. FTL which physical block corresponding to the logical block, it is possible to note a rewritable FTL small number of physical blocks.

This is our idea of ​​a typical design zoomed in type system, that is, between the layers are isolated from the operating system does not need to consider what the underlying hardware is entirely up to the control circuit inside the FTL hardware to manage the actual written to the physical hardware.

Third, support the TRIM command

1, what problems exist in the use of SSD hard drive?

However, operating systems do not focus on what the actual underlying hardware is in the use of SSD hard drive, it will also have a problem. The problem is that the operating system logic and logical layers in the block status of the SSD, mismatched

2, file deletion routine, just an operating system level tombstone

We go inside the operating system deletes a file, in fact, not really physical level to delete this file, but the file system inside, to clean up the meta-information corresponding inode inside out,
which represents the inode can continue to use, you can write new data. This time, the level corresponding to the actual physical storage space, is marked in the operating system may be written inside.

So, in fact, our daily file is deleted, the operating system level are just a tombstone. This is why, many times we do not accidentally delete the corresponding file, we can restore through a variety of software,
data back. Again, this is why, if we want to remove and clean the data, you need to use a variety of "pink crushed pieces," the functions of the job.

3, delete the data on the SSD What is the process

This delete logic level in a mechanical hard drive no problem, because the file is marked as can be written to, subsequent write directly overwrite this position. However, in the SSD hard disk is not the same.
I am here to put a detailed schematic. We take a look at specifically how going children.

 

At first, there are several operating system files, different files I marked out in different colors. The following logic blocks occupied SSD inside pages, we marked the corresponding page file uses the same color.

When we are inside the operating system, delete the file you just downloaded, such as marked yellow openjdk.exe such a jdk installation file, the operating system inside the corresponding inode inside, there is no file meta information.

But this time, our SSD logical block level, in fact, did not know this thing. So, the logical block level, openjdk.exe still occupy the corresponding space. The corresponding physical page, it is still considered to be occupied by the.

4, the operating system for deleted files, SSD hard drive do not really know

Therefore, in the case of the SSD hard disk, you will find that the operating system for deleted files, SSD hard drive do not really know. This leads, we balanced to wear,
a lot of time at all in handling a lot of data has been deleted. This will yield ⽣ many unnecessary read, write and erase data, not only consumes SSD performance, but also shorten the life of the SSD's instruction.

5, TRIM command is what to do with?

To solve this problem, the current operating system and master core SSD still pictures are ⽀ support TRIM command. This command can be when files are deleted, allowing the operating system to inform the SSD,
the corresponding logic block has been marked as deleted. Now SSD disks have been ⽀ support the TRIM command. No matter if in Linux, Windows or MacOS, these operating systems have also been ⽀ support the TRIM commands.

Fourth, write amplification effect

In fact, the invention TRIM command, a problem also reflects the use of SSD hard drive, that is, the SSD easy to slow down over time.

When the SSD hard disk storage space is occupied more and more, every time new data is written, we may not have enough space. We may have to carry out garbage collection, merging some blocks inside pages,
in order to evenly space to some

This time, from the application layer or the operating system level, we may just written a 4KB or 4MB of data. However, after the actual via FTL, we may go to carry 8MB, 16MB or more of data.
1, how to solve the write amplification

We by "the amount of the data writing of the actual amount of flash memory data / system is writing by FTL = write amplification" can get, the more the write amplification factor, means that the actual SSD performance is also worse, it will be much far ⽐ not on the actual SSD drive nominal indicators.

Write amplification and resolve, we need regular garbage collection in the background, in the hard drive more free time, put the data handling, data erasure, leaving a blank piece of work done, rather than waiting for the actual data when writing , then such operations

Five, AeroSpike: how to maximize the efficient use of SSD?

In talking about this, I believe you have discovered, you want to make good use of SSD hard drive, in fact, not so simple. If we simply take ⼀ block SSD hard drive to replace the original HDD hard drive, rather than from the application level to consider any SSD drive characteristics, then most of us still can not get the desired performance.

However, since various characteristics of SSD drives is clear, we can based on these features to design our applications. Next, I'll take a look at you together, AeroSpike this column for the SSD design characteristics of Key-Value database (key database), it is how to use these physical properties.

First, AeroSpike operation SSD hard drive, and not through the operating system's file system. The operation of the SSD but directly blocks and pages. Because the operating system inside the file system for KV databases, just let us more than a layer of indirection, it will only reduce performance, there is no practical effect on us.

1, AeroSpike when reading and writing data, made two optimization

Secondly, AeroSpike when reading and writing data, made two optimization. When writing data, AeroSpike possible to write a large block of data, rather than frequently to write many small blocks of data.
In this way, it is not easy to hard disk fragmentation occurs frequently. And, a write-once large block of data, and easier to write the performance advantage of the good order. AeroSpike writing a data block is 128KB, 4KB page is much greater than a.

Further, when the read data, AeroSpike touches can read 512 bytes (Bytes) such a small data. Because good random read SSD performance, unlike the write data as a wipe life issues. Moreover,
the data we read very often the key data values inside these data to be transmitted over the network. If you have a one-time data read out relatively large, it will lead to our network bandwidth is not enough.

2, how to improve response time

Because AeroSpike response time is a demanding real-time database KV, if a serious write amplification effect occurs, the response time will lead to significantly longer to write data. So AeroSpike done so several actions:

The first is a continuous disk defragmentation. AeroSpike with the so-called high water level (High? Watermark) algorithm. In fact, this algorithm is very simple, is that once a physical block of data fragments which more than 50%,
put the physical block handling compressed, and then erase the data, make sure there is always enough disk space to write.

The second is in the best practices AeroSpike given in order to protect the performance of the database, it is recommended that you only Using the SSD drive to ⼀ nominal capacity of the half. In other words, we have to artificially SSD hard drive set aside 50% of reserved space, to ensure that the SSD write amplification effect as small as possible without affecting the performance of the database access.

It is because of all these optimizations done, when nascent NoSQL databases, performance AeroSpike of the Cassandra, MongoDB these databases far behind, and the performance gap between these databases
, sometimes reach an order of magnitude. This also allows AeroSpike then became a benchmark for high-performance KV database. You can look at this Benchmark InfoQ out of there in 2013, when these differences in performance NoSQL database giant zoomed in

VI Summary extension

Okay, now let's work together to summarize the contents of today.

Because the life of SSD drives is limited by the number of erase blocks, so we need to wear a balanced strategy to manage the SSD drive
frequency of each block erased. We pass between the logical block addresses and physical block addresses, the introduction of FTL mapping layer, so that the operating system does not
care about the number of physical block erase, but by FTL in software algorithms, to coordinate in the end every time write what should wear piece.

In addition to the wear leveling, operating systems and hardware SSD also features a mismatch place. That is, the operating system when you delete data
waiting, not really delete the data screen for the physical layer, but only modified the inode data inside. This "pseudo-delete", making SSD hard drives in
the logical and physical level, in fact, do not realize that some blocks have been deleted. This results in time garbage collection, it will waste a lot of unnecessary
reading and writing resources.

The SSD characteristics required for garbage collection, so when we writing data, the encounter is writing to zoom. Obviously we only is writing a
4MB of data, possibly at the hardware level SSD, the actual written 8MB, 16MB or even more data.

For these characteristics, AeroSpike, this database specifically for KV-drive features of SSD, a lot of design optimization points, including skipping file
system write-hard disk, read write small chunks, continuous disk defragmentation algorithm with high water levels, and use only half the space SSD hard drive. That
these strategies, the performance of AeroSpike, in the early years far more than the other Cassandra NoSQL database.

It can be seen for hardware features software designed to maximize the performance of our hardware to play

Guess you like

Origin www.cnblogs.com/luoahong/p/11395273.html