EMC Isilon (OneFS) Detailed data recovery case

[Fault Description]
    a university because of ******, leading to "teaching system" of important data is deleted. These include "teaching system" in the MSSQL database, as well as a large number of MP4, ASF and TS types of video teaching documents. The overall high-end network storage architecture by EMC NAS (Isilon S200), the number of nodes is 3, each node is configured 3T STAT hard disk 12, no SSD. All data altogether in two parts, part of the data for the vmware virtual machine (WEB server), to share ESX hosts via the NFS protocol, another part of the data for the video teaching documents, shared virtual machine (WEB server) via CIFS protocol. *** only delete all the data NFS share (that is, all virtual machine), and CIFS shared data is not deleted.
[Data Backup]
By taking into account data security, to avoid causing secondary damage to the data, the need for full backup of all hard drives. However, due to too many disks (disk single node 12, the node 36 3 disc), and the single disk capacity is too large (single disk 3TB, total 108TB), so the backup cycle will be longer. End customers decided only to store existing data backup, and once again backed up by the customer back up North, to ensure that existing data security.
EMC Isilon (OneFS) Detailed data recovery case
[Data analysis]
After you back up all data in the web management interface Isilon Isilon will be shut down properly. Then label all hard on all nodes, and then successively taken into the data recovery provided by North platform, all start analyzing data in the hard disk.
EMC Isilon (OneFS) Detailed data recovery case
 At this point briefly explain the structure of Isilon's storage, Isilon internal use of distributed file system OneFS. In the Isilon storage cluster, each node is a single OneFS file system, Isilon scale-out support, and will not affect the data being used. When working storage cluster, all nodes in the same function, there is no backup master node before the node points. When the user into the storage cluster storage file, the file will be divided into fragments of OneFS layer 128K are stored into different nodes, and the node layer and 128K will fragment into smaller fragments were 8K stored to the hard disk node different in. Indoe information and user files, directory entries and data are stored in the MAP will all nodes, ensuring that users can access regardless of all the data from that node. Isilon when initializing let the user selecting the corresponding storage redundancy mode, the security level of data provided by different redundancy modes are not the same (default node 3 using N + 2: 1 mode).
EMC Isilon (OneFS) Detailed data recovery case
 Since customer data is deleted, so do not give much thought to the level of redundancy storage, focusing on needs analysis file deletion, file Indoe and MAP data is changed or not. After and customer communication, deleted virtual disk files are 64G or more, and no other type of store large files. Indoe write scan all files of the program, the file size in line with or above Indoe 64G are scanned. A closer analysis of the scanned Indoe, MAP discovery data recorded position Indoe its index points to the content data is no longer normal and Indoe on all nodes is the same situation. A closer analysis Inode, found that large data files of MAP will be a multi-layer (tree structure), and a unique ID MAP data is logged in a file, so you can try to find the bottom of the file data MAP. Take chances on the bottom of the file data do traverse MAP tracking operation, we found that the lowest level of data MAP really still.
[Data recovery analysis step]
    1, programming, remove the document from the unique ID Inode file, and then aggregates the data for all matches the ID of the MAP. And do according to the MAP data sorting VCN number found before 17088 MAP data for each file does not exist, which means the data of each file before 17088 is really no way to recover (the mood at the drop trough) .
   2, found a bit careful in terms of lost data MAP entry contains a total of just less than 1G of data, and delete files are all virtual machines vmdk files, which are NTFS file system, MFT NTFS file system is basically in 3G position, that is just a fake MBR and DBR in the head manually each vmdk vmdk files can explain the data inside the (really do not know to be a coincidence! or a coincidence!). Quickly write code to scan and explained MAP data, and export the data according to the sequence number VCN, the MAP is not zero reservations.
    3, through continuous testing, and finally compiled a good program, a pilot vmdk file to look at. The results surprised me, exported vmdk files are smaller than the actual situation, and vmdk position in the MFT itself also does not match the description. The problem is the program? MAP itself or the data is corrupted? Manual random verification MPA found a few can point to the data area, and the MAP program to explain the way also no problem. As I puzzled, I suddenly thought of such a high-end Isilon storage sparse file can not be without it! Otherwise, how much space is wasted ah! According to MAP data immediately verify it and found the file really is sparse.
   4, modify the code, re-export just vmdk, vmdk size in line with the actual size and location of the MFT is also appropriate location. Hand forged a MBR, partition table and DBR, then the file system developed by North interpretation tools to explain the success of its file system, database and export vmdk inside the video file.
5, after verifying this vmdk in the database and video files no problem, bulk export all important vmdk file, and then manually one by one to modify each vmdk file.
EMC Isilon (OneFS) Detailed data recovery case
[Data] acceptance
     The entire recovery process takes a long time, though encountered some problems during the recovery process of research, but fortunately the data can be restored to normal. After all important customer data recovery is complete, by the customer to arrange for all data recovery engineers do completeness and accuracy of detection, after more than one day of validation. Data finalized no problem, so far the data recovery success.

Guess you like

Origin blog.51cto.com/sun510/2426602