Learn Deduplication

How deduplication works?

Duplicate data in Windows Server delete created using the following two principles:

After deduplication is enabled for the volume, deduplication will run in the background, do the following:

This action will perform the following four steps:

Sending files read optimized file, the file system will have to reparse point deduplication file system filter (Dedup.sys). Filter the reading operation will be redirected to the respective blocks, the blocks constituting the file in the stream storage block. Modify the deleted range of duplicate files written to disk in case of non-optimized, and the next time you run the optimization operation optimization.

Type of use

The following types of use provide reasonable workload for the common deduplication configuration:

Type of use Ideal workloads the difference
The default value General-purpose file server:
  • Team Share
  • Working Folder
  • Folder Redirection
  • Software Development Share
  • Background optimization
  • The default optimization strategy:
    • The shortest document retention time = 3 days
    • Optimize file in use = No
    • Fixes file = No
Hyper-v Virtual Desktop Infrastructure (VDI) server
  • Background optimization
  • The default optimization strategy:
    • The shortest document retention time = 3 days
    • Optimize the file is in use =
    • Optimization is part of the file =
  • "Background" Hyper-V adjustment interoperable
Backup Virtualized backup application, such as  Microsoft Data Protection Manager (DPM)
  • Priority Optimization
  • The default optimization strategy:
    • The shortest document retention time = 0 days
    • Optimize the file is in use =
    • Fixes file = No
  • Use DPM / DPM solutions of the "background" interoperable adjustment

operation

Duplicate data space efficiency and maintenance strategies to optimize the volume of post-use deletion process.

Job Name Job Description The default plan
Search Engine Optimization Optimization of operation provided by the policy according to the volume of data on the volume is divided into blocks (optional) compression of these blocks in the block and the memory block storing uniquely to deduplication. How deduplication works? Detailed in the optimization process deduplication uses. Hourly
Garbage Collection Garbage collection operation by removing unnecessary blocks (recently modified or deleted files are no longer refer to these blocks) to reclaim disk space. Every Saturday early morning 2:35
The integrity of the clean-up The integrity of the clean-up work to determine the damage caused by the block storage disk failure or damage caused by the sector. If possible, deduplication may automatically use volume features (such as mirror or parity storage space on the volume) to reconstruct corrupted data. Further, when the number of blocks in a reference area (called hot spots) exceeds 100 times, it indicates that the block is used, deduplication retain their copies. Every Saturday early morning 3:35
Cancel Optimization Cancel optimization work is a special job that can only be run manually, it will undo the deduplication complete optimization and disable duplicate data for the volume removed. Only on demand

Deduplication term

the term definition
Blocks Block is the block deduplication algorithm portion of the selected file (may appear in other similar document).
Block storage Tile memory is used for data deduplication series in a unique way the container file system volume information memory blocks organized in folders.
Dedup Deduplication abbreviations often used in PowerShell, Windows Server API and components as well as Windows Server community.
File metadata Each file contains metadata, attributes the main interesting content metadata description file is not relevant. For example, creation date, date last read, author, etc.
File stream File stream is the main contents of the file. This is part of the deduplication to optimize file.
File system File system data structures on the disk and software that allows the operating system to store files on the storage media. Support deduplication on NTFS-formatted volumes.
File System Filter File system filters is to modify the default behavior of the file system plug-ins. To retain access semantics, the file system deduplication filter (Dedup.sys) reads the contents of the optimization will be completely transparent to a read request to redirect the user / application.
Search Engine Optimization If a file has been tiled, and its only blocks stored in the storage block, then the file is deemed to have been deleted by the optimized duplication (duplicated or deleted).
Optimization Strategy Optimization strategies specify which files should be considered for deduplication. For example, if the file is new, open, in a path on the volume or part of a file type, it may be considered noncompliant.
Reparse point 重新分析点是一种特殊标记,它通知文件系统将 IO 传递给指定的文件系统筛选器。 当某个文件的文件流被优化后,重复数据删除会将此文件流替换为重新分析点,使重复数据删除保留该文件的访问语义。
卷是逻辑存储驱动器的 Windows 构造,它可以在一个或多个服务器上跨多个物理存储设备。 基于卷在卷上启用删除重复。
均衡 工作负荷是在 Windows Server 上运行的应用程序。 示例工作负荷包括常规用途文件服务器、Hyper-V 和 SQL Server。

 警告

不要试图手动修改区块存储,除非由授权的 Microsoft 技术支持人员指示。 执行此操作可能导致数据损坏或丢失。

常见问题解答

重复数据删除与其他优化产品有何不同?
重复数据删除与其他常见的存储优化产品之间有几个重要区别:

能否更改所选使用类型的重复数据删除设置?
是。 尽管重复数据删除为建议的工作负荷提供合理的默认值,用户仍可能希望对重复数据删除设置进行调整,充分利用存储。 此外,其他工作负荷将需要进行一些调整,以确保重复数据删除不会干扰工作负荷

我可以手动运行重复数据删除作业吗?
可以,所有重复数据删除作业均可手动运行。 当计划作业由于没有足够的系统资源或由于错误而不能运行时,此操作是可取的。 此外,取消优化作业只能手动运行。

能否监视重复数据删除作业的历史结果?
可以,所有重复数据删除作业都会在 Windows 事件日志中生成条目

能否更改系统上的重复数据删除作业的默认计划?
可以,所有计划都是可配置的。 修改默认重复数据删除计划尤为可取,以确保重复数据删除计划作业有时间完成且不会与工作负荷争用资源。

  1. 优化不应以写入磁盘的方式获得
    重复数据删除通过使用后处理模型来优化数据。 所有数据都在未优化的情况下写入磁盘,然后再通过重复数据删除进行优化。

  2. 优化不应更改访问语义
    访问优化卷上的数据的用户和应用程序完全不知道他们所访问的文件已删除了重复。

  • 确定该卷上各文件间的重复模式。
  • 使用指向该区块唯一备份的特殊指针(称为重新分析点),无缝移动这些分区或区块。
  1. 扫描文件系统中的文件是否符合优化策略。
    扫描文件系统
  2. 将文件分为可变大小的区块。
    将文件分为区块
  3. 标识唯一区块。
    标识唯一区块
  4. 将区块置于区块存储中并进行压缩(可选)。
    移动到区块存储
  5. 将当前优化文件的原始文件流替换为区块存储的重新分析点。
    将文件流替换为重新分析点
  • 重复数据删除与单一实例存储有何不同?
    单实例存储 (SIS) 是重复数据删除的前身,在 Windows Storage Server 2008 R2 中首次引入。 单实例存储通过标识完全相同的文件,并将其替换为指向存储在 SIS 公用存储中的单个文件副本的逻辑链接,对卷进行优化。 与单实例存储的区别在于,重复数据删除可从并不相同但共享很多常见模式的文件,以及从文件本身包含很多重复模式的文件节省空间。 单实例存储已在 Windows Server 2012 R2 中被弃用并在支持重复数据删除的 Windows Server 2016 中被删除。

  • 重复数据删除与 NTFS 压缩有何不同?
    NTFS 压缩是可在卷级别启用(可选)的 NTFS 功能。 使用 NTFS 压缩,每个文件都将在写入时通过压缩单独优化。 与 NTFS 压缩的区别在于,重复数据删除可为卷上的所有文件节省空间。 这与 NTFS 压缩相比更具优势,因为文件可能同时具有内部复制(由 NTFS 压缩解决)且与卷上的其他文件具有相似之处(不由 NTFS 压缩解决)。 此外,重复数据删除采用后处理模型,这意味着新文件或修改的文件将在未优化的情况下写入磁盘,并将在稍后由重复数据删除进行优化。

  • 重复数据删除与存档文件格式(如 zip、rar、7z、cab 等)有何不同?
    存档文件格式(如 zip、rar、7z、cab 等)对一组指定的文件执行压缩。 与重复数据删除相同,会优化文件内的重复模式或文件间的重复模式。 但是,必须选择要包含在存档中的文件。 访问语义也将有所不同。 若要访问存档内的特定文件,必须打开存档、选择特定的文件,并解压缩该文件以供使用。 重复数据删除对用户和管理员是透明运行的,且不需要手动启动。 此外,重复数据删除保留了访问语义,文件在优化后看起来没有任何更改。

发布了942 篇原创文章 · 获赞 35 · 访问量 16万+

Guess you like

Origin blog.csdn.net/allway2/article/details/104447652