Synology NAS mistakenly deleted 30T data, all of which were successfully recovered and available for the whole process of actual combat

A colleague in the company is about to leave the company. I don’t know whether it was accidental or maliciously deleted 30T of data on the company’s 200T shared storage space Synology NAS. The computer used the SMB protocol to mount and start up and delete it. It was deleted for three days and three nights. It was not until the third day that my colleagues were using the data at work and suddenly disappeared. Only then did I find that someone was deleting the test data in the entire NAS. Open the group immediately. Hui's log found the colleague's ID and kept deleting data, so he immediately forcibly shut down his computer.

The terrible thing is that because data often needs to be collected, written, deleted and updated, it is not that important. The data administrator of the test group has not enabled the recycle bin function for this directory. All the data was directly deleted from the hard disk, even if I contacted Synology's official technology at the first time, it could not be recovered. Fortunately, the employee's authority is not high, and no core data is involved. Although the 30T data is test data, it is more important to the company. After all, it will take more than a month to re-collect, so I contacted a friend to provide solutions and remote assistance, and started data recovery. .

Synology NAS model is Synology DS2422+ plus expansion cabinet DS 2419+, a total of 24 pieces of 16T disk group RAID5, storage space 200T in btrfs format.

The first step is to prepare the hardware. First of all, I immediately purchased 24 WD 18T helium disks on Jingdong on the same day, and received them at noon the next day. On the same day, I opened a 24-bay Huawei storage server. At least 96G of memory needs to be prepared, the windows server 2012 system is installed, and the automatic update function is turned off to clone images and restore data. Prepare a large network storage. I have prepared a 100T Synology NAS and mounted it on the Huawei server windows server 2012 system to store the recovered data. Because the data is randomly deleted, only the entire For large folders, the recovered data will be much larger than the deleted ones.

The second step is to use the weekend time to clone the image of each disk on the NAS to a new hard disk through the WINHEX software on the server. It took 5 days to clone the data of these 24 16T hard disks. Immediately after cloning, restore the original Synology NAS to use, without affecting the work of other colleagues in the entire company, and minimize the impact and loss as much as possible.

The third step is to install the cloned new hard disks on Huawei’s 24-bay storage server. Note that the server’s memory should not be less than 64G, otherwise problems will occur due to insufficient memory. Install the UFS PRO version, note that only the PRO version can restore the RAID disk group.

 Step 4 Start scanning disk data. The Synology NAS with 200T storage space (minus the deleted 30T, actually used 130T) was scanned with this software for 15 days, and a total of 300T data was scanned.

The fifth step is to restore the scanned data to the new 100T Synology NAS. The UFS EXPLORER PRO software occupies a maximum of 64G of memory. It took 15 days to complete the previous scan. Finally, the software crashed because the memory was too small, so the second scan was performed again with more memory, and the data was successfully recovered.

Fortunately, the recovered data can be used after testing these days, helping the company recover a large amount of losses. After this incident, I personally gained a lot of experience, which I will summarize and share.

1. Prepare the Synology anti-deletion strategy in advance and shrink permissions. Check whether the mounted log permissions are checked, this is the most critical clue to retrieve and trace the behavior of NAS users. Regularly check the log. Our deletion lasted for 3 days and 3 nights. If we check the log every day, we can definitely prevent it to the greatest extent. Unfortunately, Synology does not have a reminder function to delete large-capacity data within a unit time. For example, a user deleted 1T of data in one day. , send an email reminder to the administrator. Check whether the Synology Recycle Bin is open. The Recycle Bin is the last guarantee of Synology data. If it is possible, do not close the Recycle Bin at any time. Ordinary users should never assign delete permissions, just use custom read and write, and delete permissions are given to department heads to clarify responsibilities. Reduce the number of Synology administrators. The more administrators there are, the greater the probability of problems. This time, the recycle bin was opened by the data administrator of the test group.
 

2. Be sure to install the snapshot package in the package center to take regular snapshots of the entire folder to prevent malicious deletion and ransomware, and restore all files through snapshots.

4. When deletion behavior is found, stop writing to Synology as soon as possible. From the moment we discovered the loss of a huge amount of data, we sent a notice to the entire company to stop all write operations, and closed all write permissions on Synology, and kept the read to minimize the impact on the company's business. This is also derived from the experience of several computer hard disk data loss and data being retrieved. As long as the lost data is not re-covered, the probability of retrieval is still very high.

5. Immediately prepare to purchase the required hardware. First prepare the same number of hard drives, with a larger capacity than before. Also prepare a storage server with the same bays. My Synology has 24 bays, so I found a Huawei storage server with 24 bays. To prepare a large-capacity storage that can store recovery data, we installed a new 100T Synology, and the network mount recovery.

6. Clone disk images to reduce business interruption time. In order to minimize the impact of using the read-only Synology NAS for the entire company, use the weekend to remove all the hard drives and mark them, put 12 original hard drives and 12 new hardware on the 24-bay storage server, one by one Corresponds to the clone image. It took 36 hours to clone the 16T hard drive once, and it took almost 4 days to complete the two clones. The original 200T Synology NAS was immediately restored, and the write function was turned on, so that all the original services of the NAS can be used normally. It took 4 days for the entire NAS to find a fault, turn off the write function, turn off the NAS clone mirror on weekends, and turn on Synology to restore the read and write functions, which minimized the impact on the original business.

7. Choose the right software. There are a lot of data recovery software on the Internet, but few can really restore the Synology BRTFS format raid5, and I chose UFS after the recommendation of the master. The previous version of UFS PRO 8.1 was stuck with a black screen during the scanning process, so I immediately upgraded to version 9.11. At the same time, seeing that the memory of the original server was full, I installed 320G memory immediately, and monitored the CPU and memory usage of the system. The CPU usage was not large, but the maximum memory usage was 64G. It took 15 days to scan the 200T hard disk raid group, which was really tormenting. Faced with the unknown results and the urging of test colleagues, the unknown waiting process was very tormenting. Fortunately, all the waiting was worth it. In the end, all the data was rescued and all available after delivery to the test group for testing.


Data recovery is a matter full of unknowns, and the estimated probability of success is the same as the probability of winning or losing in gambling. Several gigabytes of data on the hard disk were accidentally deleted several times before, and most of the recovered data were garbled characters through some data recovery software. This time, I did not expect that all of them were clear directories and usable original files. I think the biggest reason is the copy-on-write feature of the BRTFS format used by Synology (similar to snapshots, although there is no special snapshot, but it is equivalent to retrieving the last snapshot through software) and the RAID5 multi-disk data verification mechanism. This is the biggest contributor to the successful recovery and availability of all the data, because all the data is distributed on 24 different disks, and the re-writing in small batches did not immediately overwrite the large-capacity data before the deletion. Secondly, the 31T data volume is huge, and it is difficult to be overwritten and destroyed in a short period of time, because the newly written data of our NAS is only a few hundred G in a day. More importantly, when it was discovered that it was deleted, all writing operations were stopped immediately, and correct recovery measures were taken in an orderly manner. I hope that the lessons learned this time can help all IT management and maintenance personnel who encounter this kind of situation in the future. In the face of disasters, they don’t have to panic and be at a loss. Calm down and use my experience to give you some reference and confidence.
 

Guess you like

Origin blog.csdn.net/qq_24946447/article/details/128752545