A thrilling server accidentally deleted file recovery process, very nervous!

Experienced two days of unremitting efforts, finally resumed production server data misuse once deleted. The process of the accident and solutions in this record, alert himself and others suggesting Mo made this mistake. I hope the problems encountered friends can find a trace of inspiration to solve the problem.

Accident background

Arrange a sister installed on a production server Oracle, sister edge research side installation, the installation does not feel ready to uninstall reinstall. Find uninstall from the Internet, where the command line to be executed remove the Oracle installation directory, the command is as follows:

A thrilling server accidentally deleted file recovery process, very nervous


If ORACLE_BASE variable is not assigned, then the command becomes

A thrilling server accidentally deleted file recovery process, very nervous


And so on, sister used but root account ah.

In this way, the entire disk deletes all the files, including the application of Tomcat, MySQL database and so on. . . .

mysql database is not running? linux can delete the file being executed? because it is completely removed, and finally left a tomcat log file, it is estimated that the file is too large, and sometimes not deleted successfully

Watching sister remorse eyes, is this thing that I arranged for her to do, did not tell her the stakes clear, without any training, liability can only be one person back, and besides how to make it beautiful bear this responsibility ?

A call to the room, hang the disk to another server, ssh upswing view all files are cleared, this server is running, but a customer's production system, ah, has been running for six months, was restored as soon as possible ah. Then sent for offline database backup, backup files found only 1kb, which is only a few lines of familiar mysqldump comment (Is the backup script crontab execution of a problem), then make the most of the backup is in December 2013 of a really house seemingly endless rain ah. Think of it a leader said the case: When a production system hang, find all backups have questions, DVD also has scratches, tape drives is also bad (a senior industry, earlier estimates also do a backup CD-ROM ), I did not expect to really fulfilled my body, how to do ??

After the department heads aware of the situation, the worst has been done Plan B: AA and product leadership personally led Sunday where customers rushed to the city Monday to leadership communication; BB and CC go over there to find ways to customer administrator convince customers. . .

Straw / ext3grep

To the Internet to find information quickly be mistakenly deleted data recovery, find a really ext3grep can recover files deleted by rm -rf, we also ext3 disk format, and there are many online success stories. So kindled a glimmer of hope, as soon as possible on the disk umount, to prevent re-written to make deleted files sector. Download ext3grep, install (compile and install the painstaking process for the time being is not the table). First perform a scan file name command:

A thrilling server accidentally deleted file recovery process, very nervous


Print out all the deleted files and paths, rejoicing, without performing a Plan B, the files are in it. This software can restore files by directory, all the commands can be executed only recovery:

A thrilling server accidentally deleted file recovery process, very nervous


The results of the current lack of disk space, no way can only restore files, try a few files, but still partially successful partial failure

A thrilling server accidentally deleted file recovery process, very nervous


You can not help a cold, is it was written to delete files on the disk? Little chance of recovery, ah, it can restore a few count a few, maybe just important data files MYD file can be restored. So first of all file names redirected to a file file

A thrilling server accidentally deleted file recovery process, very nervous


Filter out all the mysql database file name to save, mysqltbname.txt

Scripting recover files:

A thrilling server accidentally deleted file recovery process, very nervous


Execution, run about 20 minutes, more than 40 file recovery, but not enough ah, we have nearly 100 tables, each table frm, myd, myi three documents, how to say there are about more than 300 ah! ! Will come back files attached to an existing database, but also to the file permissions to 777, restart mysql, can be considered part of the data back, but the customer is important to sign the attendance data, the phone side reporting of data (these data are said by customer do employee performance) not to come back ah. Are we supposed to? Middle and tried another tool extundelete, with ext3grep grammar basically the same principle should be the same, but is said to restore by directory, well give it a try.

A thrilling server accidentally deleted file recovery process, very nervous


Right on cue, the recovery does not come out! ! ! ! ! ! ! ! Those files have been destroyed. With the leadership of the report, the implementation of Plan B now. . . Desperation home from work (weekend, go back and take a break, think of a way)

Inspiration / binlog

第二天早晨一早就醒了(心里有事啊),背上电脑,去公司(这个周末算是报销了,不挨批,通报,罚款,开除就不错了,还过什么周末啊)。依旧运行ext3grep,extundelete,也就那几招啊,把系统架到测试服务器上,看看数据能不能想办法补一补吧。在测试服务器上进行mysqldump,恢复文件,覆盖恢复回来的文件,给文件加权限,重启mysql。wait,wait,不是有binlog吗?我们服务都要求开启binlog,说不定能通过binlog里恢复数据呢?

于是从dump出来的文件名里找到binlog的文件,一共三个,mysql-binlog0001,mysql-bin.000009,mysql-bin.000010,恢复一下0001

A thrilling server accidentally deleted file recovery process, very nervous


居然失败了。。。。。。再看另两个文件,mysql-bin.000010大概几百MB,应该靠谱一点,执行还原命令,居然成功了!!!!!!!!!!!!!赶快scp到测试服务器。执行binlog还原。mysqlbinlog /usr/mysql-bin.000010 | mysql -uroot -p

输入密码,卡住了(好现象),经过漫长的等待,终于结束了。打开应用,哦,感谢cctv,mtv,数据回来了!!!!!!!!!!!!!!!

后记

经过此次事故,虽然数据很幸运能找回来了,但是过程却是惊心动迫。也为自己的错误所带来的后果,给同事和领导带来的连带责任而后怕。也希望谨记此次事故,以后不再犯同样的错误。事故反思如下:

  • 本次安排MM进行服务器维护时没有提前对她进行说明厉害情况,自己也未重视,管理混乱,流程混乱。一个在线的生产系统,任何一个改动一定要先谋而后动。

  • 自动备份出现问题,没有任何人检查。脱机备份人员每次从服务器上下载1k的文件却从未重视。需要明确大家在工作岗位上的责任。

  • 事故发生后,没有及时发现,造成部分数据写入磁盘,造成不可恢复问题。需要编写应用监控程序,服务一旦有异常,短信告警相关责任人。

  • According to comments reminders, plus a: You can not use the root user to operate. You should offer different levels of user privileges on the server.

Through this incident, several accidents with this project and his colleagues did not have any relationship, take the initiative to come to help, to find information to help test, a colleague also help to 13:00 o'clock in data recovery testing. At the same time product manager in the case expect customer-facing tremendous pressure, did not panic and blame developers and specific operator, but so that we can stop and think solutions. Department heads also actively help to find ways to accompany us to work overtime test, do real-time tracking process. Through our concerted efforts, finally something relatively successful conclusion next Monday morning collectively reflect on lessons learned, such incidents must try to avoid big efforts.

Share an interview book "Java Core knowledge finishing .pdf" ", covering the JVM, locks, high concurrency, reflection, Spring principle, micro-services, Zookeeper, databases, data structures, etc.," as well as Java208 pavement questions ( with answers) to join the group (Java fill the pit road) 659 655 594 to free access to!

Guess you like

Origin blog.51cto.com/13399166/2414989