How to calculate the additional write and optimize engine in the GC write amplification

Glossary

  • append written
    based on the sequence of the additional write journal data recording mode is similar to ZIL ZFS, ext4 the journal.

  • The GC
    Garbage colletion of recovery process means write append process to cover the write data generated waste.

Calculated by GC write amplification process

Different levels of write amplification

  • journal-level
    write front-end user statistics valid data total_user_io, and contains this part of the data and a docking distributed consensus agreement, the total amount of journal writing support index designed total_journal_io, calculated total_journal_io / total_user_io. Here the main user package cost is IO, it organized into a respective journal head and tail of the log and the added overhead.

  • Storage Engine level

Writing user statistics distal valid data total_user_io, and after the total amount of the external storage engine writes total_engine_io, calculated total_enigne_io / total_user_io. Here the main expenses include:

  1. GC overhead storage engine;
  2. Index data off the disk overhead;
  • Disk level
    write statistics distal end user valid data total_user_io, and to record the data, host terminal sent the total disk write requests total_host_io, calculated total_host_io / total_user_io. total_host_io can be obtained by ioutil tools or / proc / diskstat file. In the distributed scenario, where the main overhead costs in addition to the above further comprises:

    1. Distributed consensus-based agreement, all the index information regularly place orders overhead;
    2. Recording the overhead metadata mappings logical disk and copy off group;
  • SSD-level
    write statistics distal end user valid data total_user_io, and sent to the internal SSD of the total amount of suction NAND total_nand_io, calculated total_nand_io / total_user_io. Here the main overhead costs in addition to the above, further comprising an enlarged internal write SSD overhead. The latter can to nvmecli or by its native library statistics.

GC design considerations

Optimization point GC main purpose is to provide to the user as much as possible effective IO, IO while reducing latency. Write amplification at all levels of the above statistics are an important indicator of the storage engine design. GC actual design needs to consider the size of the currently available capacity, data fragmentation degree, SSD wear static factors also need to take into account the user IO mode, SSD internal task scheduling and other dynamic factors. The following other cases assume the same adjustment, each of the factors considered individually.

The current available capacity

In the large-scale distributed systems, the available capacity have different design considerations at different levels.

  • Cluster level
    if the whole cluster adequate available capacity can slow the GC; on the contrary GC as soon as possible. Lack of available space for individual nodes, the nodes should be balanced by the capacity between nodes to release the space as soon as possible, and should not be accelerated GC. Because in the run out of space, so the GC write amplification is quite large.

  • Host / disk level
    Similarly, if sufficient capacity can do high-priority debris in the process of finishing GC, GC vice versa should free up space as soon as possible.

Data fragmentation degree

Based on the degree of statistical data can index fragmentation within all sections of the disk. The more fragmented the region should do first GC defragmentation process.

IO user mode

Users coverage write, the more you should avoid GC process to move data, but should allow users to overlay your own data before writing invalidated, then deleted.

Users random write, the more should be done defragmentation.

In the case of a sufficient available capacity: more current user's request, the GC should be avoided more, give the user the bandwidth; conversely, the GC positive available, load shifting using disk bandwidth.

SSD degree of wear

In extreme cases, if the SSD badly worn, to avoid data transfer; conversely, may be required GC.

SSD internal task scheduling

Because the total available bandwidth is a certain internal SSD, so you can try to avoid the GC cycle time to SSD internal task execution.

Optimization of GC

This is discussed in its own other blog, reference: http://xiaqichao.cn/wordpress/?p=172
here simply summarized a few points.

GC is usually based on sections do often need to consider what to do GC slice select, and how do GC.

Selection strategy

We may consider the following factors into account what needs to be done slice GC:

  • Proportion of valid data slice

  • Condition of the slice data

  • GC done within data slice of history (number)

  • The degree of fragmentation within the slice data

Guess you like

Origin blog.51cto.com/xiamachao/2484873