The offline data recovery process of another disk when the RAID disk array replaces the disk

【Fault description】

The customer equipment model is IBM V7000 (78REAFN, 2076-124) storage, the architecture is P740+AIX+Sybase+V7000 storage array cabinet, the data to be restored is mainly stored in the array cabinet, and there are a total of 12 SAS mechanical hard disks with a capacity of 600G (one of which is as a hot spare).
IBM V7000 (78REAFN, 2076-124) disk failure , when the data synchronization of the replacement disk reaches about 50%, another disk also has a problem, resulting in the logical disk cannot be attached to the minicomputer, and the business is temporarily interrupted. Viewed from the storage management interface, two hard disks are faulty and offline. Among them, the faulty hard disk in slot 10 is a hot spare, and the faulty hard disk in slot 3 is shown in the following figure:
The offline data recovery process of another disk when the RAID disk array replaces the disk
The offline data recovery process of another disk when the RAID disk array replaces the disk

A total of 2 sets of Mdisks have been created in the customer's array cabinet and added to one pool. Now the customer's main data pool cannot be loaded, and all three general volumes cannot be mounted. The specific situation is shown in the following figure:
The offline data recovery process of another disk when the RAID disk array replaces the disk

【Mirror disk】

In order to prevent secondary damage to the original disk due to misoperation during data recovery, use data recovery tools to mirror 10 of the disks, and use PC3000 to mirror the faulty hard disk in slot 3 (there may be more bad sectors) , all subsequent data recovery operations are performed on the mirror disk, and will not affect the original disk.

【Recovery process】

Recovery plan 1. Perform forced online operation on the storage and
analyze the offline sequence of the faulty hard disk in the faulty storage.
Failed hard drive offline after repair.
Insert the repaired hard disk back into the storage and perform the forced online operation.
Recovery plan 2. Analyze the storage structure and restore server data
1. Mdisk analysis and reorganization
A. According to some configuration information given by the customer, classify the hard disks according to Mdisk groups.
B. Analyze all hard disks in each group of Mdisks to obtain relevant raid information.
C. Use professional data recovery software to perform virtual reorganization of Mdisk.
2. Pool analysis
A. Analyze all Mdisks to obtain pool-related information.
B. Analyze the distribution of pool on Mdisk.
3. LUN structure analysis
A. Analyze the stripe size in the pool.
B. Analyze the LUN bitmap and analyze the distribution of each LUN in the pool.
C. Write a program to extract the LUN.
According to the characteristics of raid5, a maximum of one member disk is allowed to be offline, that is, it can be used normally even if one member disk fails. The customer storage device has failed, and only one hard disk in each group of Mdisks is offline.
Extract the logs stored on the V7000, and analyze the logs to obtain the offline sequence of each faulty hard disk.
The offline data recovery process of another disk when the RAID disk array replaces the disk

【Data recovery result migration】

Random sampling is performed on the generated data, and there is no problem with the data. Create LUNs of the same size as the original environment on the new storage device, copy the extracted data LUN image file to the LUN created on the storage using dd, and the data is normal. The data recovery work has been successfully completed.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325272140&siteId=291194637